WandbΒΆ

We show how LaminDB can be integrated with Wandb to track the whole training process, associate data with models, and facilitate model querying based on hyperparameters, among other criteria.

# uncomment below to install necessary dependencies for this notebook:
# !pip install 'lamindb[jupyter,aws]' -q
# !pip install wandb -qU
# !pip install torch torchvision torchaudio lightning -q
# you can also pass s3://my-bucket
!lamin init --storage ./lamin-mlops
Hide code cell output
πŸ’‘ connected lamindb: testuser1/lamin-mlops
import lamindb as ln
import wandb

ln.settings.transform.stem_uid = "tULn4Va2yERp"
ln.settings.transform.version = "1"

ln.track()
Hide code cell output
πŸ’‘ connected lamindb: testuser1/lamin-mlops
πŸ’‘ notebook imports: lamindb==0.74.1 lightning==2.3.2 torch==2.3.1 torchvision==0.18.1 wandb==0.17.4
πŸ’‘ saved: Transform(uid='tULn4Va2yERp5zKv', version='1', name='Wandb', key='wandb', type='notebook', created_by_id=1, updated_at='2024-07-06 09:08:45 UTC')
πŸ’‘ saved: Run(uid='RVfRbRb6SbixhBtJmTm7', transform_id=2, created_by_id=1)
Run(uid='RVfRbRb6SbixhBtJmTm7', started_at='2024-07-06 09:08:45 UTC', is_consecutive=True, transform_id=2, created_by_id=1)
!wandb login
Hide code cell output
wandb: Currently logged in as: felix_lamin (lamin-mlops-demo). Use `wandb login --relogin` to force relogin

Define a modelΒΆ

Define a simple autoencoder as an example model using PyTorch Lightning.

from torch import optim, nn, utils
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
import lightning as L


class LitAutoEncoder(L.LightningModule):
    def __init__(self, hidden_size, bottleneck_size):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, hidden_size), 
            nn.ReLU(), 
            nn.Linear(hidden_size, bottleneck_size)
        )
        self.decoder = nn.Sequential(
            nn.Linear(bottleneck_size, hidden_size), 
            nn.ReLU(), 
            nn.Linear(hidden_size, 28 * 28)
        )
        # save hyper-parameters to self.hparams auto-logged by wandb
        self.save_hyperparameters()

    def training_step(self, batch, batch_idx):
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = nn.functional.mse_loss(x_hat, x)
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self):
        optimizer = optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

Query & cache MNIST dataset from LaminDBΒΆ

We curated the MNIST dataset in another notebook and it now shows up on LaminHub:

We can either query it by uid from there or query it by any other metadata combination.

Here, by description:

training_data_artifact = ln.Artifact.filter(description="MNIST-dataset").one()
training_data_artifact
Hide code cell output
Artifact(uid='cn5W706oVyxNhqOSJVJ0', description='MNIST-dataset', suffix='', type='dataset', size=54950048, hash='amFx_vXqnUtJr0kmxxWK2Q', hash_type='md5-d', n_objects=4, visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1, transform_id=1, run_id=1, updated_at='2024-07-06 09:08:35 UTC')

Let’s cache the dataset:

cache_path = training_data_artifact.cache()
cache_path
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/cn5W706oVyxNhqOS')

Create a pytorch-compatible dataset:

!ls -r {cache_path.as_posix()}/MNIST/raw
train-labels-idx1-ubyte  t10k-labels-idx1-ubyte
train-images-idx3-ubyte  t10k-images-idx3-ubyte
cache_path.as_posix()
'/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/cn5W706oVyxNhqOS'
dataset = MNIST(cache_path.as_posix(), transform=ToTensor())
dataset
Dataset MNIST
    Number of datapoints: 60000
    Root location: /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/cn5W706oVyxNhqOS
    Split: Train
    StandardTransform
Transform: ToTensor()

Monitor training with wandbΒΆ

Train our example model and track training progress with Wandb.

MODEL_CONFIG = {
    "hidden_size": 32,
    "bottleneck_size": 16,
    "batch_size": 32
}
# create PyTorch dataloader
train_loader = utils.data.DataLoader(dataset, batch_size=MODEL_CONFIG["batch_size"], shuffle=True)
# init model
autoencoder = LitAutoEncoder(MODEL_CONFIG["hidden_size"], MODEL_CONFIG["bottleneck_size"])
from lightning.pytorch.loggers import WandbLogger

# initialise the wandb logger
wandb_logger = WandbLogger(project="lamin")
# add batch size to the wandb config
wandb_logger.experiment.config["batch_size"] = MODEL_CONFIG["batch_size"]
Hide code cell output
wandb: Currently logged in as: felix_lamin (lamin-mlops-demo). Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.17.4
wandb: Run data is saved locally in ./wandb/run-20240706_090853-x7use59q
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run wobbly-leaf-67
wandb: ⭐️ View project at https://wandb.ai/lamin-mlops-demo/lamin
wandb: πŸš€ View run at https://wandb.ai/lamin-mlops-demo/lamin/runs/x7use59q
from lightning.pytorch.callbacks import ModelCheckpoint

# store checkpoints to disk and upload to LaminDB after training
checkpoint_callback = ModelCheckpoint(
    dirpath=f"model_checkpoints/{wandb_logger.version}", 
    filename="last_epoch",
    save_top_k=1,
    monitor="train_loss"
)
# train model
trainer = L.Trainer(
    accelerator="cpu",
    limit_train_batches=3, 
    max_epochs=2,
    logger=wandb_logger,
    callbacks=[checkpoint_callback]
)
trainer.fit(model=autoencoder, train_dataloaders=train_loader)
Hide code cell output
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
  | Name    | Type       | Params | Mode 
-----------------------------------------------
0 | encoder | Sequential | 25.6 K | train
1 | decoder | Sequential | 26.4 K | train
-----------------------------------------------
52.1 K    Trainable params
0         Non-trainable params
52.1 K    Total params
0.208     Total estimated model params size (MB)
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:424: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py:298: The number of training batches (3) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
Training: |          | 0/? [00:00<?, ?it/s]
Training:   0%|          | 0/3 [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/3 [00:00<?, ?it/s] 
Epoch 0:  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 1/3 [00:00<00:00, 39.28it/s]
Epoch 0:  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 1/3 [00:00<00:00, 37.43it/s, v_num=e59q]
Epoch 0:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 2/3 [00:00<00:00, 55.60it/s, v_num=e59q]
Epoch 0:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 2/3 [00:00<00:00, 53.59it/s, v_num=e59q]
Epoch 0: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 66.89it/s, v_num=e59q]
Epoch 0: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 65.02it/s, v_num=e59q]
Epoch 0: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 62.99it/s, v_num=e59q]
Epoch 0:   0%|          | 0/3 [00:00<?, ?it/s, v_num=e59q]        
Epoch 1:   0%|          | 0/3 [00:00<?, ?it/s, v_num=e59q]
Epoch 1:  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 1/3 [00:00<00:00, 92.80it/s, v_num=e59q]
Epoch 1:  33%|β–ˆβ–ˆβ–ˆβ–Ž      | 1/3 [00:00<00:00, 83.35it/s, v_num=e59q]
Epoch 1:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 2/3 [00:00<00:00, 101.93it/s, v_num=e59q]
Epoch 1:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹   | 2/3 [00:00<00:00, 96.63it/s, v_num=e59q] 
Epoch 1: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 106.03it/s, v_num=e59q]
Epoch 1: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 102.11it/s, v_num=e59q]
Epoch 1: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 98.01it/s, v_num=e59q] 
`Trainer.fit` stopped: `max_epochs=2` reached.
Epoch 1: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:00<00:00, 77.47it/s, v_num=e59q]

wandb_logger.experiment.name
'wobbly-leaf-67'
wandb_logger.version
'x7use59q'
wandb.finish()
Hide code cell output
wandb: - 0.001 MB of 0.001 MB uploaded
wandb: \ 0.001 MB of 0.003 MB uploaded
wandb: | 0.001 MB of 0.003 MB uploaded
wandb: / 0.006 MB of 0.006 MB uploaded
wandb:                                                                                
wandb: πŸš€ View run wobbly-leaf-67 at: https://wandb.ai/lamin-mlops-demo/lamin/runs/x7use59q
wandb: ⭐️ View project at: https://wandb.ai/lamin-mlops-demo/lamin
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20240706_090853-x7use59q/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.

Check out the training progress on the Wandb UI:

Save model in LaminDBΒΆ

Upload the model checkpoint of the trained model to LaminDB.

We annotate the LaminDB Artifact with the wandb experiment ID and the hyper parameters.

# save checkpoint in LaminDB
ckpt_artifact = ln.Artifact(
    f"model_checkpoints/{wandb_logger.version}",
    description="model-checkpoint",
    type="model",
).save()
# create a label with the wandb experiment name
experiment_label = ln.ULabel(
    name=wandb_logger.experiment.name, 
    description="wandb experiment name"
).save()
# annotate the artifact
ckpt_artifact.ulabels.add(experiment_label)
# define the associated model hyperparameters in ln.Param
for k, v in MODEL_CONFIG.items():
    ln.Param(name=k, dtype=type(v).__name__).save()
# annotate the artifact with them
ckpt_artifact.params.add_values(MODEL_CONFIG)
# show info about the checkpoint artifact
ckpt_artifact.describe()
Artifact(uid='2EjwUdSq6oEvu3MRAh2n', description='model-checkpoint', suffix='', type='model', size=636275, hash='HW_RPRIuU6fWRmX2z1VCRg', hash_type='md5-d', n_objects=1, visibility=1, key_is_virtual=True, updated_at='2024-07-06 09:08:58 UTC')
  Provenance
    .created_by = 'testuser1'
    .storage = '/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops'
    .transform = 'Wandb'
    .run = '2024-07-06 09:08:45 UTC'
  Labels
    .ulabels = 'wobbly-leaf-67'
  Params
    'batch_size' = 32
    'bottleneck_size' = 16
    'hidden_size' = 32

Look at saved checkpoints in LaminHub:

# save notebook
# ln.finish()