Weights & Biases

We show how LaminDB can be integrated with W&B to track the training process and associate datasets & parameters with models.

# !pip install 'lamindb[jupyter]' torchvision lightning wandb
!lamin init --storage ./lamin-mlops
!wandb login
Hide code cell output
 resetting django module variables
 connected lamindb: anonymous/lamin-mlops
wandb: Currently logged in as: felix_lamin (lamin-mlops-demo) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
import lamindb as ln
import wandb
import lightning

from torch import utils
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
from autoencoder import LitAutoEncoder

ln.track()
Hide code cell output
 connected lamindb: anonymous/lamin-mlops
 created Transform('X2USQ8mJCDrj0000'), started new Run('uhudDd4g...') at 2025-09-18 06:39:33 UTC
 notebook imports: autoencoder lamindb==1.11.0 lightning==2.5.5 torch==2.8.0 torchvision==0.23.0 wandb==0.21.4
 recommendation: to identify the notebook across renames, pass the uid: ln.track("X2USQ8mJCDrj")

Define a model

We use a basic PyTorch Lightning autoencoder as an example model.

Code of LitAutoEncoder
Simple autoencoder model
import torch
import lightning

from torch import optim, nn


class LitAutoEncoder(lightning.LightningModule):
    def __init__(self, hidden_size: int, bottleneck_size: int) -> None:
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, bottleneck_size),
        )
        self.decoder = nn.Sequential(
            nn.Linear(bottleneck_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, 28 * 28),
        )
        self.save_hyperparameters()

    def training_step(
        self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int
    ) -> torch.Tensor:
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = nn.functional.mse_loss(x_hat, x)
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self) -> optim.Optimizer:
        optimizer = optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

Query & download the MNIST dataset

We saved the MNIST dataset in curation notebook which now shows up in the Artifact registry:

ln.Artifact.filter(kind="dataset").df()
/tmp/ipykernel_3628/2387445862.py:1: DeprecationWarning: Use to_dataframe instead of df, df will be removed in the future.
  ln.Artifact.filter(kind="dataset").df()
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux branch_id
id
1 POw8RFlB3HLuJX5e0000 testdata/mnist None dataset None 54950048 amFx_vXqnUtJr0kmxxWK2Q 4 None md5-d True True 1 1 None None True 1 2025-09-18 06:39:16.191000+00:00 1 {'af': {'0': True}} 1

You can also find it on lamin.ai if you were connected your instance.

instance view

Let’s get the dataset:

artifact = ln.Artifact.get(key="testdata/mnist")
artifact
Hide code cell output
Artifact(uid='POw8RFlB3HLuJX5e0000', is_latest=True, key='testdata/mnist', suffix='', kind='dataset', size=54950048, hash='amFx_vXqnUtJr0kmxxWK2Q', n_files=4, branch_id=1, space_id=1, storage_id=1, run_id=1, created_by_id=1, created_at=2025-09-18 06:39:16 UTC)

And download it to a local cache:

path = artifact.cache()
path
Hide code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/POw8RFlB3HLuJX5e')

Create a PyTorch-compatible dataset:

dataset = MNIST(path.as_posix(), transform=ToTensor())
dataset
Hide code cell output
Dataset MNIST
    Number of datapoints: 60000
    Root location: /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/POw8RFlB3HLuJX5e
    Split: Train
    StandardTransform
Transform: ToTensor()

Monitor training with wandb

Train our example model and track the training progress with wandb.

from lightning.pytorch.loggers import WandbLogger

MODEL_CONFIG = {"hidden_size": 32, "bottleneck_size": 16, "batch_size": 32}

# create the data loader
train_loader = utils.data.DataLoader(
    dataset, batch_size=MODEL_CONFIG["batch_size"], shuffle=True
)

# init model
autoencoder = LitAutoEncoder(
    MODEL_CONFIG["hidden_size"], MODEL_CONFIG["bottleneck_size"]
)

# initialize the logger
wandb_logger = WandbLogger(project="lamin")

# add batch size to the wandb config
wandb_logger.experiment.config["batch_size"] = MODEL_CONFIG["batch_size"]
Hide code cell output
wandb: Currently logged in as: felix_lamin (lamin-mlops-demo) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: creating run
wandb: Tracking run with wandb version 0.21.4
wandb: Run data is saved locally in ./wandb/run-20250918_063934-1fm41eok
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run electric-water-275
wandb: ⭐️ View project at https://wandb.ai/lamin-mlops-demo/lamin
wandb: 🚀 View run at https://wandb.ai/lamin-mlops-demo/lamin/runs/1fm41eok
from lightning.pytorch.callbacks import ModelCheckpoint

# store checkpoints to disk and upload to LaminDB after training
checkpoint_callback = ModelCheckpoint(
    dirpath=f"model_checkpoints/{wandb_logger.version}.ckpt",
    filename="last_epoch",
    save_top_k=1,
    monitor="train_loss",
)

# train model
trainer = lightning.Trainer(
    accelerator="cpu",
    limit_train_batches=3,
    max_epochs=2,
    logger=wandb_logger,
    callbacks=[checkpoint_callback],
)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)
Hide code cell output
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
  | Name    | Type       | Params | Mode 
-----------------------------------------------
0 | encoder | Sequential | 25.6 K | train
1 | decoder | Sequential | 26.4 K | train
-----------------------------------------------
52.1 K    Trainable params
0         Non-trainable params
52.1 K    Total params
0.208     Total estimated model params size (MB)
8         Modules in train mode
0         Modules in eval mode
/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/lightning/pytorch/loops/fit_loop.py:310: The number of training batches (3) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
Training: |          | 0/? [00:00<?, ?it/s]
Training: |          | 0/? [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/3 [00:00<?, ?it/s]
Epoch 0:  33%|███▎      | 1/3 [00:00<00:00, 49.75it/s]
Epoch 0:  33%|███▎      | 1/3 [00:00<00:00, 47.80it/s, v_num=1eok]
Epoch 0:  67%|██████▋   | 2/3 [00:00<00:00, 67.26it/s, v_num=1eok]
Epoch 0:  67%|██████▋   | 2/3 [00:00<00:00, 65.78it/s, v_num=1eok]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 79.26it/s, v_num=1eok]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 78.11it/s, v_num=1eok]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 76.63it/s, v_num=1eok]
Epoch 0:   0%|          | 0/3 [00:00<?, ?it/s, v_num=1eok]        
Epoch 1:   0%|          | 0/3 [00:00<?, ?it/s, v_num=1eok]
Epoch 1:  33%|███▎      | 1/3 [00:00<00:00, 98.35it/s, v_num=1eok]
Epoch 1:  33%|███▎      | 1/3 [00:00<00:00, 91.36it/s, v_num=1eok]
Epoch 1:  67%|██████▋   | 2/3 [00:00<00:00, 111.14it/s, v_num=1eok]
Epoch 1:  67%|██████▋   | 2/3 [00:00<00:00, 105.49it/s, v_num=1eok]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 116.93it/s, v_num=1eok]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 114.61it/s, v_num=1eok]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 111.61it/s, v_num=1eok]
`Trainer.fit` stopped: `max_epochs=2` reached.
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 104.26it/s, v_num=1eok]

wandb_logger.experiment.name
Hide code cell output
'electric-water-275'
wandb_logger.version
Hide code cell output
'1fm41eok'
wandb.finish()
Hide code cell output
wandb: updating run metadata; uploading requirements.txt; uploading wandb-metadata.json; uploading data
wandb: 🚀 View run electric-water-275 at: https://wandb.ai/lamin-mlops-demo/lamin/runs/1fm41eok
wandb: ⭐️ View project at: https://wandb.ai/lamin-mlops-demo/lamin
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20250918_063934-1fm41eok/logs

See the training progress in the wandb UI:

Wandb training ui

Save model in LaminDB

# save checkpoint as a model
artifact = ln.Artifact(
    f"model_checkpoints/{wandb_logger.version}.ckpt",
    key="testmodels/wandb/litautoencoder.ckpt",
    kind="model",
).save()

# create a label with the wandb experiment name
experiment_label = ln.ULabel(
    name=wandb_logger.experiment.name, description="wandb experiment name"
).save()

# annotate the model artifact
artifact.ulabels.add(experiment_label)
Hide code cell output
! calling anonymously, will miss private instances

See the checkpoints:

Wandb check points

If later on, you want to re-use the checkpoint, you can download it like so:

ln.Artifact.get(key="testmodels/wandb/litautoencoder.ckpt").cache()
Hide code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/N0Ep8HI1tudZ88Y7.ckpt')

Or on the CLI:

lamin get artifact --key 'testmodels/litautoencoder'
ln.finish()
Hide code cell output
! cells [(10, 12)] were not run consecutively
 finished Run('uhudDd4g') after 3s at 2025-09-18 06:39:37 UTC