Weights & Biases¶
We show how LaminDB can be integrated with W&B to track the training process and associate datasets & parameters with models.
# !pip install -q 'lamindb[jupyter,aws]' torch torchvision lightning wandb
!lamin init --storage ./lamin-mlops
!wandb login
Show code cell output
→ connected lamindb: anonymous/lamin-mlops
wandb: Currently logged in as: felix_lamin (lamin-mlops-demo). Use `wandb login --relogin` to force relogin
import lamindb as ln
import wandb
ln.context.uid = "tULn4Va2yERp0000"
ln.context.track()
Show code cell output
→ connected lamindb: anonymous/lamin-mlops
→ created Transform('tULn4Va2yERp0000'), started new Run('0dDDrlRy...') at 2025-01-20 07:34:37 UTC
→ notebook imports: lamindb==1.0.2 lightning==2.5.0.post0 torch==2.5.1 torchvision==0.20.1 wandb==0.19.4
Define a model¶
Define a simple autoencoder as an example model using PyTorch Lightning.
from torch import optim, nn, utils
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
import lightning
class LitAutoEncoder(lightning.LightningModule):
def __init__(self, hidden_size, bottleneck_size):
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(28 * 28, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, bottleneck_size)
)
self.decoder = nn.Sequential(
nn.Linear(bottleneck_size, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, 28 * 28)
)
# save hyper-parameters to self.hparams auto-logged by wandb
self.save_hyperparameters()
def training_step(self, batch, batch_idx):
x, y = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
self.log("train_loss", loss)
return loss
def configure_optimizers(self):
optimizer = optim.Adam(self.parameters(), lr=1e-3)
return optimizer
Query & download the MNIST dataset¶
We saved the MNIST dataset in curation notebook and it now shows up in the artifact registry:
ln.Artifact.filter(type="dataset").df()
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
1 | wkIiXOIhbZI0RaLK0000 | testdata/mnist | None | dataset | None | 54950048 | amFx_vXqnUtJr0kmxxWK2Q | 4 | None | md5-d | True | True | 1 | 1 | None | None | True | 1 | 2025-01-20 07:34:26.888000+00:00 | 1 | None | 1 |
You can also see it on lamin.ai if you connected your instance.
Let’s get the dataset:
artifact = ln.Artifact.get(key="testdata/mnist")
artifact
Show code cell output
Artifact(uid='wkIiXOIhbZI0RaLK0000', is_latest=True, key='testdata/mnist', suffix='', kind='dataset', size=54950048, hash='amFx_vXqnUtJr0kmxxWK2Q', n_files=4, space_id=1, storage_id=1, run_id=1, created_by_id=1, created_at=2025-01-20 07:34:26 UTC)
And download it to a local cache:
path = artifact.cache()
path
Show code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/wkIiXOIhbZI0RaLK')
Create a pytorch-compatible dataset:
dataset = MNIST(path.as_posix(), transform=ToTensor())
dataset
Show code cell output
Dataset MNIST
Number of datapoints: 60000
Root location: /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/wkIiXOIhbZI0RaLK
Split: Train
StandardTransform
Transform: ToTensor()
Monitor training with wandb¶
Train our example model and track the training progress with wandb
.
from lightning.pytorch.loggers import WandbLogger
MODEL_CONFIG = {
"hidden_size": 32,
"bottleneck_size": 16,
"batch_size": 32
}
# create the data loader
train_loader = utils.data.DataLoader(dataset, batch_size=MODEL_CONFIG["batch_size"], shuffle=True)
# init model
autoencoder = LitAutoEncoder(MODEL_CONFIG["hidden_size"], MODEL_CONFIG["bottleneck_size"])
# initialize the logger
wandb_logger = WandbLogger(project="lamin")
# add batch size to the wandb config
wandb_logger.experiment.config["batch_size"] = MODEL_CONFIG["batch_size"]
Show code cell output
wandb: Currently logged in as: felix_lamin (lamin-mlops-demo). Use `wandb login --relogin` to force relogin
wandb: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
wandb: Tracking run with wandb version 0.19.4
wandb: Run data is saved locally in ./wandb/run-20250120_073440-a56lp4zu
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run unique-pine-162
wandb: ⭐️ View project at https://wandb.ai/lamin-mlops-demo/lamin
wandb: 🚀 View run at https://wandb.ai/lamin-mlops-demo/lamin/runs/a56lp4zu
from lightning.pytorch.callbacks import ModelCheckpoint
# store checkpoints to disk and upload to LaminDB after training
checkpoint_callback = ModelCheckpoint(
dirpath=f"model_checkpoints/{wandb_logger.version}",
filename="last_epoch",
save_top_k=1,
monitor="train_loss"
)
# train model
trainer = lightning.Trainer(
accelerator="cpu",
limit_train_batches=3,
max_epochs=2,
logger=wandb_logger,
callbacks=[checkpoint_callback]
)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)
Show code cell output
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
| Name | Type | Params | Mode
-----------------------------------------------
0 | encoder | Sequential | 25.6 K | train
1 | decoder | Sequential | 26.4 K | train
-----------------------------------------------
52.1 K Trainable params
0 Non-trainable params
52.1 K Total params
0.208 Total estimated model params size (MB)
8 Modules in train mode
0 Modules in eval mode
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:425: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py:310: The number of training batches (3) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
Training: | | 0/? [00:00<?, ?it/s]
Training: 0%| | 0/3 [00:00<?, ?it/s]
Epoch 0: 0%| | 0/3 [00:00<?, ?it/s]
Epoch 0: 33%|███▎ | 1/3 [00:00<00:00, 40.40it/s]
Epoch 0: 33%|███▎ | 1/3 [00:00<00:00, 38.90it/s, v_num=p4zu]
Epoch 0: 67%|██████▋ | 2/3 [00:00<00:00, 58.53it/s, v_num=p4zu]
Epoch 0: 67%|██████▋ | 2/3 [00:00<00:00, 57.32it/s, v_num=p4zu]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 69.47it/s, v_num=p4zu]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 68.01it/s, v_num=p4zu]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 65.98it/s, v_num=p4zu]
Epoch 0: 0%| | 0/3 [00:00<?, ?it/s, v_num=p4zu]
Epoch 1: 0%| | 0/3 [00:00<?, ?it/s, v_num=p4zu]
Epoch 1: 33%|███▎ | 1/3 [00:00<00:00, 78.72it/s, v_num=p4zu]
Epoch 1: 33%|███▎ | 1/3 [00:00<00:00, 73.93it/s, v_num=p4zu]
Epoch 1: 67%|██████▋ | 2/3 [00:00<00:00, 91.61it/s, v_num=p4zu]
Epoch 1: 67%|██████▋ | 2/3 [00:00<00:00, 88.16it/s, v_num=p4zu]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 97.10it/s, v_num=p4zu]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 94.47it/s, v_num=p4zu]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 91.50it/s, v_num=p4zu]
`Trainer.fit` stopped: `max_epochs=2` reached.
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 75.98it/s, v_num=p4zu]
wandb_logger.experiment.name
Show code cell output
'unique-pine-162'
wandb_logger.version
Show code cell output
'a56lp4zu'
wandb.finish()
Show code cell output
wandb:
wandb: 🚀 View run unique-pine-162 at: https://wandb.ai/lamin-mlops-demo/lamin/runs/a56lp4zu
wandb: ⭐️ View project at: https://wandb.ai/lamin-mlops-demo/lamin
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20250120_073440-a56lp4zu/logs
See the training progress in the wandb
UI:
Save model in LaminDB¶
# save checkpoint as a model in LaminDB
artifact = ln.Artifact(
f"model_checkpoints/{wandb_logger.version}",
key="testmodels/litautoencoder", # is automatically versioned
type="model",
).save()
# create a label with the wandb experiment name
experiment_label = ln.ULabel(
name=wandb_logger.experiment.name,
description="wandb experiment name"
).save()
# annotate the model artifact
artifact.ulabels.add(experiment_label)
# define the associated model hyperparameters in ln.Param
for k, v in MODEL_CONFIG.items():
ln.Param(name=k, dtype=type(v).__name__).save()
artifact.params.add_values(MODEL_CONFIG)
# describe the artifact
artifact.describe()
Show code cell output
! `type` will be removed soon, please use `kind`
Artifact ├── General │ ├── .uid = 'gKDSEkwGbnv98S6a0000' │ ├── .key = 'testmodels/litautoencoder' │ ├── .size = 636275 │ ├── .hash = 'AsiKqD3_Z1cPYG4fZYfgCw' │ ├── .n_files = 1 │ ├── .path = /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/gKDSEkwGbnv98S6a │ ├── .created_by = anonymous │ ├── .created_at = 2025-01-20 07:34:43 │ └── .transform = 'Weights & Biases' └── Labels └── .ulabels ULabel unique-pine-162
See the checkpoints:
If later on, you want to re-use the checkpoint, you can download it like so:
ln.Artifact.get(key='testmodels/litautoencoder').cache()
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/gKDSEkwGbnv98S6a')
Or on the CLI:
lamin get artifact --key 'testmodels/litautoencoder'
# save notebook
# ln.context.finish()