MLFlow

We show how LaminDB can be integrated with MLflow to track the training process and associate datasets & parameters with models.

# !pip install 'lamindb[jupyter]' torchvision lightning wandb
!lamin init --storage ./lamin-mlops
Hide code cell output
 resetting django module variables
 connected lamindb: anonymous/lamin-mlops
import lamindb as ln
import mlflow
import lightning

from torch import utils
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
from autoencoder import LitAutoEncoder
Hide code cell output
 connected lamindb: anonymous/lamin-mlops
Tracking models in both LaminDB and MLFlow

Note

It is not always necessary to track all model parameters and metrics in both LaminDB and MLFlow. However, if specific artifacts or runs should be queryable by specific model attributes such as, for example, the learning rate, then these attributes should be tracked. Below, we show exemplary how to do that for the batch size and learning rate but the approach generalizes to more features.


# define model run parameters & features
MODEL_CONFIG = {"batch_size": 32, "lr": 0.001}

hyperparameter = ln.Feature(name="Autoencoder hyperparameter", is_type=True).save()
hyperparams = ln.Feature.from_dict(MODEL_CONFIG, str_as_cat=True)
for param in hyperparams:
    param.type = hyperparameter
    param.save()

ln.track(params=MODEL_CONFIG)
 created Transform('eLaoqQwwt2m60000'), started new Run('vCefiQAg...') at 2025-09-14 14:08:37 UTC
→ params: batch_size=32, lr=0.001
 notebook imports: autoencoder lamindb==1.11.0 lightning==2.5.5 mlflow-skinny==3.3.2 mlflow-tracing==3.3.2 mlflow==3.3.2 torch==2.8.0 torchvision==0.23.0
 recommendation: to identify the notebook across renames, pass the uid: ln.track("eLaoqQwwt2m6", params={...})

Define a model

We use a basic PyTorch Lightning autoencoder as an example model.

Code of LitAutoEncoder
Simple autoencoder model
import torch
import lightning

from torch import optim, nn


class LitAutoEncoder(lightning.LightningModule):
    def __init__(self, hidden_size: int, bottleneck_size: int) -> None:
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, bottleneck_size),
        )
        self.decoder = nn.Sequential(
            nn.Linear(bottleneck_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, 28 * 28),
        )
        self.save_hyperparameters()

    def training_step(
        self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int
    ) -> torch.Tensor:
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = nn.functional.mse_loss(x_hat, x)
        self.log("train_loss", loss)
        return loss

    def configure_optimizers(self) -> optim.Optimizer:
        optimizer = optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

Query & download the MNIST dataset

We saved the MNIST dataset in a curation notebook which now shows up in the Artifact registry:

ln.Artifact.filter(kind="dataset").df()
Hide code cell output
/tmp/ipykernel_3681/2387445862.py:1: DeprecationWarning: Use to_dataframe instead of df, df will be removed in the future.
  ln.Artifact.filter(kind="dataset").df()
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux branch_id
id
1 RFRgEHdpXja7yPrH0000 testdata/mnist None dataset None 54950048 amFx_vXqnUtJr0kmxxWK2Q 4 None md5-d True True 1 1 None None True 1 2025-09-14 14:08:06.578000+00:00 1 {'af': {'0': True}} 1

Let’s get the dataset:

artifact = ln.Artifact.get(key="testdata/mnist")
artifact
Hide code cell output
Artifact(uid='RFRgEHdpXja7yPrH0000', is_latest=True, key='testdata/mnist', suffix='', kind='dataset', size=54950048, hash='amFx_vXqnUtJr0kmxxWK2Q', n_files=4, branch_id=1, space_id=1, storage_id=1, run_id=1, created_by_id=1, created_at=2025-09-14 14:08:06 UTC)

And download it to a local cache:

path = artifact.cache()
path
Hide code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/RFRgEHdpXja7yPrH')

Create a PyTorch-compatible dataset:

dataset = MNIST(path.as_posix(), transform=ToTensor())
dataset
Hide code cell output
Dataset MNIST
    Number of datapoints: 60000
    Root location: /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/RFRgEHdpXja7yPrH
    Split: Train
    StandardTransform
Transform: ToTensor()

Monitor training with MLflow

Train our example model and track the training progress with MLflow.

# enable MLFlow PyTorch autologging
mlflow.pytorch.autolog()
with mlflow.start_run() as mlflow_run:
    train_dataset = MNIST(
        root="./data", train=True, download=True, transform=ToTensor()
    )
    train_loader = utils.data.DataLoader(train_dataset, batch_size=32)

    # Initialize model
    autoencoder = LitAutoEncoder(32, 16)

    # Create checkpoint callback
    from lightning.pytorch.callbacks import ModelCheckpoint

    checkpoint_callback = ModelCheckpoint(
        dirpath="model_checkpoints",
        filename=f"{mlflow_run.info.run_id}_last_epoch",
        save_top_k=1,
        monitor="train_loss",
    )

    # Train model
    trainer = lightning.Trainer(
        accelerator="cpu",
        limit_train_batches=3,
        max_epochs=2,
        callbacks=[checkpoint_callback],
    )
    trainer.fit(model=autoencoder, train_dataloaders=train_loader)

    # Get run information
    run_id = mlflow_run.info.run_id
    ln.context.run.reference = run_id

    # save model summary artifact
    local_model_summary_path = (
        f"{mlflow_run.info.artifact_uri.removeprefix('file://')}/model_summary.txt"
    )
    mlflow_model_summary_af = ln.Artifact(
        local_model_summary_path,
        key=f"testmodels/mlflow/{local_model_summary_path}",
        kind="model",
    ).save()

    # save checkpoint as a model
    mlflow_model_ckpt_af = ln.Artifact(
        f"model_checkpoints/{run_id}_last_epoch.ckpt",
        key="testmodels/mlflow/litautoencoder.ckpt",
        kind="model",
    ).save()
Hide code cell output
  0%|          | 0.00/9.91M [00:00<?, ?B/s]
  3%|▎         | 262k/9.91M [00:00<00:03, 2.61MB/s]
 21%|██▏       | 2.13M/9.91M [00:00<00:00, 11.8MB/s]
100%|██████████| 9.91M/9.91M [00:00<00:00, 33.2MB/s]

  0%|          | 0.00/28.9k [00:00<?, ?B/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.02MB/s]

  0%|          | 0.00/1.65M [00:00<?, ?B/s]
 24%|██▍       | 393k/1.65M [00:00<00:00, 3.65MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 10.3MB/s]

  0%|          | 0.00/4.54k [00:00<?, ?B/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 9.69MB/s]
INFO:pytorch_lightning.utilities.rank_zero:GPU available: False, used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs
/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:76: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
2025/09/14 14:08:40 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: "/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/mlflow/pytorch/_lightning_autolog.py:467: UserWarning: Autologging is known to be compatible with pytorch-lightning versions between 2.0.7 and 2.5.2 and may not succeed with packages outside this range."
/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/lightning/pytorch/callbacks/model_checkpoint.py:751: Checkpoint directory /home/runner/work/lamin-mlops/lamin-mlops/docs/model_checkpoints exists and is not empty.

  | Name    | Type       | Params | Mode 
-----------------------------------------------
0 | encoder | Sequential | 25.6 K | train
1 | decoder | Sequential | 26.4 K | train
-----------------------------------------------
52.1 K    Trainable params
0         Non-trainable params
52.1 K    Total params
0.208     Total estimated model params size (MB)
8         Modules in train mode
0         Modules in eval mode
/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:433: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/lightning/pytorch/loops/fit_loop.py:310: The number of training batches (3) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
Training: |          | 0/? [00:00<?, ?it/s]
Training: |          | 0/? [00:00<?, ?it/s]
Epoch 0:   0%|          | 0/3 [00:00<?, ?it/s]
Epoch 0:  33%|███▎      | 1/3 [00:00<00:00, 59.07it/s]
Epoch 0:  33%|███▎      | 1/3 [00:00<00:00, 56.86it/s, v_num=0]
Epoch 0:  67%|██████▋   | 2/3 [00:00<00:00, 80.25it/s, v_num=0]
Epoch 0:  67%|██████▋   | 2/3 [00:00<00:00, 78.55it/s, v_num=0]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 93.09it/s, v_num=0]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 91.61it/s, v_num=0]
Epoch 0: 100%|██████████| 3/3 [00:00<00:00, 89.58it/s, v_num=0]
2025/09/14 14:08:40 WARNING mlflow.utils.checkpoint_utils: Checkpoint logging is skipped, because checkpoint 'save_best_only' config is True, it requires to compare the monitored metric value, but the provided monitored metric value is not available.
Epoch 0:   0%|          | 0/3 [00:00<?, ?it/s, v_num=0]        
Epoch 1:   0%|          | 0/3 [00:00<?, ?it/s, v_num=0]
Epoch 1:  33%|███▎      | 1/3 [00:00<00:00, 129.46it/s, v_num=0]
Epoch 1:  33%|███▎      | 1/3 [00:00<00:00, 120.92it/s, v_num=0]
Epoch 1:  67%|██████▋   | 2/3 [00:00<00:00, 123.91it/s, v_num=0]
Epoch 1:  67%|██████▋   | 2/3 [00:00<00:00, 119.77it/s, v_num=0]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 126.55it/s, v_num=0]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 123.53it/s, v_num=0]
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 120.01it/s, v_num=0]
2025/09/14 14:08:40 WARNING mlflow.utils.checkpoint_utils: Checkpoint logging is skipped, because checkpoint 'save_best_only' config is True, it requires to compare the monitored metric value, but the provided monitored metric value is not available.
INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=2` reached.
Epoch 1: 100%|██████████| 3/3 [00:00<00:00, 92.01it/s, v_num=0] 

2025/09/14 14:08:46 WARNING mlflow.models.model: Model logged without a signature and input example. Please set `input_example` parameter when logging the model to auto infer the model signature.
! calling anonymously, will miss private instances

See the training progress in the mlflow UI:

MLFlow training UI

See the checkpoints:

MLFlow checkpoints UI

If later on, you want to re-use the checkpoint, you can get it via:

ln.Artifact.get(key="testmodels/mlflow/litautoencoder.ckpt").cache()
Hide code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/2l3qoKmi3TdoKMNn0000.ckpt')

Or on the CLI:

lamin get artifact --key 'testmodels/litautoencoder'
ln.finish()
Hide code cell output
! cells [(10, 12)] were not run consecutively
 finished Run('vCefiQAg') after 10s at 2025-09-14 14:08:47 UTC
Hide code cell content
!rm -rf ./lamin-mlops
!lamin delete --force lamin-mlops
! calling anonymously, will miss private instances
 deleting instance anonymous/lamin-mlops