Weights & Biases¶
We show how LaminDB can be integrated with W&B to track the training process and associate datasets & parameters with models.
# !pip install 'lamindb[jupyter]' torchvision lightning wandb
!lamin init --storage ./lamin-mlops
!wandb login
Show code cell output
wandb: WARNING Using legacy-service, which is deprecated. If this is unintentional, you can fix it by ensuring you do not call `wandb.require('legacy-service')` and do not set the WANDB_X_REQUIRE_LEGACY_SERVICE environment variable.
wandb: Currently logged in as: felix_lamin (lamin-mlops-demo) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
import lamindb as ln
import wandb
import lightning
from torch import utils
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
from autoencoder import LitAutoEncoder
ln.track()
Show code cell output
→ connected lamindb: anonymous/lamin-mlops
→ created Transform('6IQwWpzNQMSW0000'), started new Run('UGSAcUVv...') at 2025-05-08 07:31:33 UTC
→ notebook imports: autoencoder lamindb==1.5.0 lightning==2.5.1.post0 torch==2.7.0 torchvision==0.22.0 wandb==0.19.11
• recommendation: to identify the notebook across renames, pass the uid: ln.track("6IQwWpzNQMSW")
Define a model¶
We use a basic PyTorch Lightning autoencoder as an example model.
Code of LitAutoEncoder
Simple autoencoder model¶
import torch
import lightning
from torch import optim, nn
class LitAutoEncoder(lightning.LightningModule):
def __init__(self, hidden_size: int, bottleneck_size: int) -> None:
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(28 * 28, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, bottleneck_size),
)
self.decoder = nn.Sequential(
nn.Linear(bottleneck_size, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, 28 * 28),
)
self.save_hyperparameters()
def training_step(
self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int
) -> torch.Tensor:
x, y = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
self.log("train_loss", loss)
return loss
def configure_optimizers(self) -> optim.Optimizer:
optimizer = optim.Adam(self.parameters(), lr=1e-3)
return optimizer
Query & download the MNIST dataset¶
We saved the MNIST dataset in curation notebook which now shows up in the Artifact registry:
ln.Artifact.filter(kind="dataset").df()
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
1 | el0Ue7hTdH5aEOoo0000 | testdata/mnist | None | dataset | None | 54950048 | amFx_vXqnUtJr0kmxxWK2Q | 4 | None | md5-d | True | True | 1 | 1 | None | None | True | 1 | 2025-05-08 07:31:13.510000+00:00 | 1 | None | 1 |
You can also find it on lamin.ai if you were connected your instance.

Let’s get the dataset:
artifact = ln.Artifact.get(key="testdata/mnist")
artifact
Show code cell output
Artifact(uid='el0Ue7hTdH5aEOoo0000', is_latest=True, key='testdata/mnist', suffix='', kind='dataset', size=54950048, hash='amFx_vXqnUtJr0kmxxWK2Q', n_files=4, space_id=1, storage_id=1, run_id=1, created_by_id=1, created_at=2025-05-08 07:31:13 UTC)
And download it to a local cache:
path = artifact.cache()
path
Show code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/el0Ue7hTdH5aEOoo')
Create a PyTorch-compatible dataset:
dataset = MNIST(path.as_posix(), transform=ToTensor())
dataset
Show code cell output
Dataset MNIST
Number of datapoints: 60000
Root location: /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/el0Ue7hTdH5aEOoo
Split: Train
StandardTransform
Transform: ToTensor()
Monitor training with wandb¶
Train our example model and track the training progress with wandb
.
from lightning.pytorch.loggers import WandbLogger
MODEL_CONFIG = {"hidden_size": 32, "bottleneck_size": 16, "batch_size": 32}
# create the data loader
train_loader = utils.data.DataLoader(
dataset, batch_size=MODEL_CONFIG["batch_size"], shuffle=True
)
# init model
autoencoder = LitAutoEncoder(
MODEL_CONFIG["hidden_size"], MODEL_CONFIG["bottleneck_size"]
)
# initialize the logger
wandb_logger = WandbLogger(project="lamin")
# add batch size to the wandb config
wandb_logger.experiment.config["batch_size"] = MODEL_CONFIG["batch_size"]
Show code cell output
wandb: Currently logged in as: felix_lamin (lamin-mlops-demo) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.19.11
wandb: Run data is saved locally in ./wandb/run-20250508_073134-o5edzo43
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run stellar-violet-219
wandb: ⭐️ View project at https://wandb.ai/lamin-mlops-demo/lamin
wandb: 🚀 View run at https://wandb.ai/lamin-mlops-demo/lamin/runs/o5edzo43
from lightning.pytorch.callbacks import ModelCheckpoint
# store checkpoints to disk and upload to LaminDB after training
checkpoint_callback = ModelCheckpoint(
dirpath=f"model_checkpoints/{wandb_logger.version}",
filename="last_epoch",
save_top_k=1,
monitor="train_loss",
)
# train model
trainer = lightning.Trainer(
accelerator="cpu",
limit_train_batches=3,
max_epochs=2,
logger=wandb_logger,
callbacks=[checkpoint_callback],
)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)
Show code cell output
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
| Name | Type | Params | Mode
-----------------------------------------------
0 | encoder | Sequential | 25.6 K | train
1 | decoder | Sequential | 26.4 K | train
-----------------------------------------------
52.1 K Trainable params
0 Non-trainable params
52.1 K Total params
0.208 Total estimated model params size (MB)
8 Modules in train mode
0 Modules in eval mode
/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
/opt/hostedtoolcache/Python/3.13.3/x64/lib/python3.13/site-packages/lightning/pytorch/loops/fit_loop.py:310: The number of training batches (3) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
Training: | | 0/? [00:00<?, ?it/s]
Training: 0%| | 0/3 [00:00<?, ?it/s]
Epoch 0: 0%| | 0/3 [00:00<?, ?it/s]
Epoch 0: 33%|███▎ | 1/3 [00:00<00:00, 45.09it/s]
Epoch 0: 33%|███▎ | 1/3 [00:00<00:00, 43.67it/s, v_num=zo43]
Epoch 0: 67%|██████▋ | 2/3 [00:00<00:00, 67.68it/s, v_num=zo43]
Epoch 0: 67%|██████▋ | 2/3 [00:00<00:00, 66.41it/s, v_num=zo43]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 82.08it/s, v_num=zo43]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 80.82it/s, v_num=zo43]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 79.43it/s, v_num=zo43]
Epoch 0: 0%| | 0/3 [00:00<?, ?it/s, v_num=zo43]
Epoch 1: 0%| | 0/3 [00:00<?, ?it/s, v_num=zo43]
Epoch 1: 33%|███▎ | 1/3 [00:00<00:00, 105.58it/s, v_num=zo43]
Epoch 1: 33%|███▎ | 1/3 [00:00<00:00, 99.28it/s, v_num=zo43]
Epoch 1: 67%|██████▋ | 2/3 [00:00<00:00, 121.92it/s, v_num=zo43]
Epoch 1: 67%|██████▋ | 2/3 [00:00<00:00, 117.84it/s, v_num=zo43]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 129.71it/s, v_num=zo43]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 125.41it/s, v_num=zo43]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 121.33it/s, v_num=zo43]
`Trainer.fit` stopped: `max_epochs=2` reached.
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 96.79it/s, v_num=zo43]
wandb_logger.experiment.name
Show code cell output
'stellar-violet-219'
wandb_logger.version
Show code cell output
'o5edzo43'
wandb.finish()
Show code cell output
wandb:
wandb: 🚀 View run stellar-violet-219 at: https://wandb.ai/lamin-mlops-demo/lamin/runs/o5edzo43
wandb: ⭐️ View project at: https://wandb.ai/lamin-mlops-demo/lamin
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20250508_073134-o5edzo43/logs
See the training progress in the wandb
UI:

Save model in LaminDB¶
# save checkpoint as a model in LaminDB
artifact = ln.Artifact(
f"model_checkpoints/{wandb_logger.version}",
key="testmodels/wandb/litautoencoder", # is automatically versioned
type="model",
).save()
# create a label with the wandb experiment name
experiment_label = ln.ULabel(
name=wandb_logger.experiment.name, description="wandb experiment name"
).save()
# annotate the model artifact
artifact.ulabels.add(experiment_label)
# define the associated model hyperparameters in ln.Param
for k, v in MODEL_CONFIG.items():
ln.Param(name=k, dtype=type(v).__name__).save()
artifact.params.add_values(MODEL_CONFIG)
# look at Artifact annotations
artifact.describe()
artifact.params
Show code cell output
! `type` will be removed soon, please use `kind`
Artifact ├── General │ ├── .uid = 'pPNLhgEFzb4ZWlCr0000' │ ├── .key = 'testmodels/wandb/litautoencoder' │ ├── .size = 636736 │ ├── .hash = 'JL6n2EDlxpROdVLNreCgKg' │ ├── .n_files = 1 │ ├── .path = /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/pPNLhgEFzb4ZWlCr │ ├── .created_by = anonymous │ ├── .created_at = 2025-05-08 07:31:35 │ └── .transform = 'Weights & Biases' └── Labels └── .ulabels ULabel stellar-violet-219
Artifact └── Params └── batch_size int 32 bottleneck_size int 16 hidden_size int 32
See the checkpoints:

If later on, you want to re-use the checkpoint, you can download it like so:
ln.Artifact.get(key="testmodels/wandb/litautoencoder").cache()
Show code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/pPNLhgEFzb4ZWlCr')
Or on the CLI:
lamin get artifact --key 'testmodels/litautoencoder'
ln.finish()
Show code cell output
! cells [(10, 12)] were not run consecutively
→ finished Run('UGSAcUVv') after 2s at 2025-05-08 07:31:36 UTC
! calling anonymously, will miss private instances