Weights & Biases

We show how LaminDB can be integrated with W&B to track the training process and associate datasets & parameters with models.

# !pip install 'lamindb[jupyter]' torchvision lightning wandb
!lamin init --storage ./lamin-mlops
!wandb login
Hide code cell output
wandb: WARNING Using legacy-service, which is deprecated. If this is unintentional, you can fix it by ensuring you do not call `wandb.require('legacy-service')` and do not set the WANDB_X_REQUIRE_LEGACY_SERVICE environment variable.
wandb: Currently logged in as: felix_lamin (lamin-mlops-demo) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
import lamindb as ln
import wandb
import lightning

from torch import utils
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
from autoencoder import LitAutoEncoder

ln.track()
Hide code cell output
 connected lamindb: anonymous/lamin-mlops
 created Transform('6IQwWpzNQMSW0000'), started new Run('UGSAcUVv...') at 2025-05-08 07:31:33 UTC
 notebook imports: autoencoder lamindb==1.5.0 lightning==2.5.1.post0 torch==2.7.0 torchvision==0.22.0 wandb==0.19.11
 recommendation: to identify the notebook across renames, pass the uid: ln.track("6IQwWpzNQMSW")

Define a model

We use a basic PyTorch Lightning autoencoder as an example model.

Code of LitAutoEncoder
Simple autoencoder model
import torch
import lightning

from torch import optim, nn


class LitAutoEncoder(lightning.LightningModule):
    def __init__(self, hidden_size: int, bottleneck_size: int) -> None:
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, bottleneck_size),
        )
        self.decoder = nn.Sequential(
            nn.Linear(bottleneck_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, 28 * 28),
        )
        self.save_hyperparameters()

    def training_step(
        self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int
    ) -> torch.Tensor:
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = nn.functional.mse_loss(x_hat, x)
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self) -> optim.Optimizer:
        optimizer = optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

Query & download the MNIST dataset

We saved the MNIST dataset in curation notebook which now shows up in the Artifact registry:

ln.Artifact.filter(kind="dataset").df()
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux _branch_code
id
1 el0Ue7hTdH5aEOoo0000 testdata/mnist None dataset None 54950048 amFx_vXqnUtJr0kmxxWK2Q 4 None md5-d True True 1 1 None None True 1 2025-05-08 07:31:13.510000+00:00 1 None 1

You can also find it on lamin.ai if you were connected your instance.

instance view

Let’s get the dataset:

artifact = ln.Artifact.get(key="testdata/mnist")
artifact
Hide code cell output
Artifact(uid='el0Ue7hTdH5aEOoo0000', is_latest=True, key='testdata/mnist', suffix='', kind='dataset', size=54950048, hash='amFx_vXqnUtJr0kmxxWK2Q', n_files=4, space_id=1, storage_id=1, run_id=1, created_by_id=1, created_at=2025-05-08 07:31:13 UTC)

And download it to a local cache:

path = artifact.cache()
path
Hide code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/el0Ue7hTdH5aEOoo')

Create a PyTorch-compatible dataset:

dataset = MNIST(path.as_posix(), transform=ToTensor())
dataset
Hide code cell output
Dataset MNIST
    Number of datapoints: 60000
    Root location: /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/el0Ue7hTdH5aEOoo
    Split: Train
    StandardTransform
Transform: ToTensor()

Monitor training with wandb

Train our example model and track the training progress with wandb.

from lightning.pytorch.loggers import WandbLogger

MODEL_CONFIG = {"hidden_size": 32, "bottleneck_size": 16, "batch_size": 32}

# create the data loader
train_loader = utils.data.DataLoader(
    dataset, batch_size=MODEL_CONFIG["batch_size"], shuffle=True
)

# init model
autoencoder = LitAutoEncoder(
    MODEL_CONFIG["hidden_size"], MODEL_CONFIG["bottleneck_size"]
)

# initialize the logger
wandb_logger = WandbLogger(project="lamin")

# add batch size to the wandb config
wandb_logger.experiment.config["batch_size"] = MODEL_CONFIG["batch_size"]
Hide code cell output
wandb: Currently logged in as: felix_lamin (lamin-mlops-demo) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.11
wandb: Run data is saved locally in ./wandb/run-20250508_073134-o5edzo43
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run stellar-violet-219
wandb: ⭐️ View project at https://wandb.ai/lamin-mlops-demo/lamin
wandb: 🚀 View run at https://wandb.ai/lamin-mlops-demo/lamin/runs/o5edzo43
from lightning.pytorch.callbacks import ModelCheckpoint

# store checkpoints to disk and upload to LaminDB after training
checkpoint_callback = ModelCheckpoint(
    dirpath=f"model_checkpoints/{wandb_logger.version}",
    filename="last_epoch",
    save_top_k=1,
    monitor="train_loss",
)

# train model
trainer = lightning.Trainer(
    accelerator="cpu",
    limit_train_batches=3,
    max_epochs=2,
    logger=wandb_logger,
    callbacks=[checkpoint_callback],
)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)
Hide code cell output
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
  | Name    | Type       | Params | Mode 
-----------------------------------------------
0 | encoder | Sequential | 25.6 K | train
1 | decoder | Sequential | 26.4 K | train
-----------------------------------------------
52.1 K    Trainable params
0         Non-trainable params
52.1 K    Total params
0.208     Total estimated model params size (MB)
8         Modules in train mode
0         Modules in eval mode
/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/site-packages/lightning/pytorch/loops/fit_loop.py:310: The number of training batches (3) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
Training: |          | 0/? [00:00<?, ?it/s]
Training:   0%|          | 0/3 [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/3 [00:00<?, ?it/s] 
Epoch 0:  33%|███▎      | 1/3 [00:00<00:00, 45.09it/s]
Epoch 0:  33%|███▎      | 1/3 [00:00<00:00, 43.67it/s, v_num=zo43]
Epoch 0:  67%|██████▋   | 2/3 [00:00<00:00, 67.68it/s, v_num=zo43]
Epoch 0:  67%|██████▋   | 2/3 [00:00<00:00, 66.41it/s, v_num=zo43]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 82.08it/s, v_num=zo43]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 80.82it/s, v_num=zo43]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 79.43it/s, v_num=zo43]
Epoch 0:   0%|          | 0/3 [00:00<?, ?it/s, v_num=zo43]        
Epoch 1:   0%|          | 0/3 [00:00<?, ?it/s, v_num=zo43]
Epoch 1:  33%|███▎      | 1/3 [00:00<00:00, 105.58it/s, v_num=zo43]
Epoch 1:  33%|███▎      | 1/3 [00:00<00:00, 99.28it/s, v_num=zo43] 
Epoch 1:  67%|██████▋   | 2/3 [00:00<00:00, 121.92it/s, v_num=zo43]
Epoch 1:  67%|██████▋   | 2/3 [00:00<00:00, 117.84it/s, v_num=zo43]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 129.71it/s, v_num=zo43]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 125.41it/s, v_num=zo43]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 121.33it/s, v_num=zo43]
`Trainer.fit` stopped: `max_epochs=2` reached.
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 96.79it/s, v_num=zo43] 

wandb_logger.experiment.name
Hide code cell output
'stellar-violet-219'
wandb_logger.version
Hide code cell output
'o5edzo43'
wandb.finish()
Hide code cell output
wandb:                                                                                
wandb: 🚀 View run stellar-violet-219 at: https://wandb.ai/lamin-mlops-demo/lamin/runs/o5edzo43
wandb: ⭐️ View project at: https://wandb.ai/lamin-mlops-demo/lamin
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20250508_073134-o5edzo43/logs

See the training progress in the wandb UI:

Wandb training ui

Save model in LaminDB

# save checkpoint as a model in LaminDB
artifact = ln.Artifact(
    f"model_checkpoints/{wandb_logger.version}",
    key="testmodels/wandb/litautoencoder",  # is automatically versioned
    type="model",
).save()

# create a label with the wandb experiment name
experiment_label = ln.ULabel(
    name=wandb_logger.experiment.name, description="wandb experiment name"
).save()

# annotate the model artifact
artifact.ulabels.add(experiment_label)

# define the associated model hyperparameters in ln.Param
for k, v in MODEL_CONFIG.items():
    ln.Param(name=k, dtype=type(v).__name__).save()
artifact.params.add_values(MODEL_CONFIG)

# look at Artifact annotations
artifact.describe()
artifact.params
Hide code cell output
! `type` will be removed soon, please use `kind`
Artifact 
├── General
│   ├── .uid = 'pPNLhgEFzb4ZWlCr0000'
│   ├── .key = 'testmodels/wandb/litautoencoder'
│   ├── .size = 636736
│   ├── .hash = 'JL6n2EDlxpROdVLNreCgKg'
│   ├── .n_files = 1
│   ├── .path = /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/pPNLhgEFzb4ZWlCr
│   ├── .created_by = anonymous
│   ├── .created_at = 2025-05-08 07:31:35
│   └── .transform = 'Weights & Biases'
└── Labels
    └── .ulabels                    ULabel                     stellar-violet-219                       
Artifact 
└── Params
    └── batch_size                  int                        32                                       
        bottleneck_size             int                        16                                       
        hidden_size                 int                        32                                       

See the checkpoints:

Wandb check points

If later on, you want to re-use the checkpoint, you can download it like so:

ln.Artifact.get(key="testmodels/wandb/litautoencoder").cache()
Hide code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/pPNLhgEFzb4ZWlCr')

Or on the CLI:

lamin get artifact --key 'testmodels/litautoencoder'
ln.finish()
Hide code cell output
! cells [(10, 12)] were not run consecutively
 finished Run('UGSAcUVv') after 2s at 2025-05-08 07:31:36 UTC
! calling anonymously, will miss private instances