Markdown

MLFlow

LaminDB can be integrated with MLflow to track model checkpoints as artifacts linked against training runs.

# pip install lamindb torchvision lightning wandb
!lamin init --storage ./lamin-mlops
Hide code cell output
 connected lamindb: anonymous/lamin-mlops
import lightning as pl
import lamindb as ln
from lamindb.integrations import lightning as ll
import mlflow

from torch import utils
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
from autoencoder import LitAutoEncoder

import logging

# Suppress unrelated logger messages
logging.getLogger("alembic").setLevel(logging.WARNING)
Hide code cell output
 connected lamindb: anonymous/lamin-mlops
Tracking models in both LaminDB and MLFlow

It is not always necessary to track all model parameters and metrics in both LaminDB and MLFlow. However, if specific artifacts or runs should be queryable by specific model attributes such as, for example, the learning rate, then these attributes should be tracked. Below, we show exemplary how to do that for the batch size and learning rate but the approach generalizes to more features.

# define model run parameters, features, and labels so that validation passes later on
MODEL_CONFIG = {"batch_size": 32, "lr": 0.001}
hyperparameter = ln.Feature(name="Autoencoder hyperparameter", is_type=True).save()
hyperparams = ln.Feature.from_dict(MODEL_CONFIG, type=hyperparameter).save()

metrics_to_annotate = ["train_loss", "val_loss", "current_epoch"]
for metric in metrics_to_annotate:
    dtype = int if metric == "current_epoch" else float
    ln.Feature(name=metric, dtype=dtype).save()

# create all MLflow related features like 'mlflow_run_id'
ln.examples.mlflow.save_mlflow_features()

# create all lightning integration features like 'score'
ll.save_lightning_features()
Hide code cell output
 returning feature with same name: 'Autoencoder hyperparameter'
! rather than passing a string 'int' to dtype, consider passing a Python object
 returning feature with same name: 'batch_size'
! rather than passing a string 'float' to dtype, consider passing a Python object
 returning feature with same name: 'train_loss'
 returning feature with same name: 'val_loss'
 returning feature with same name: 'current_epoch'
! you are trying to create a record with name='mlflow_experiment_name' but a record with similar name exists: 'mlflow_experiment_id'. Did you mean to load it?
 returning feature with same name: 'is_best_model'
 returning feature with same name: 'score'
 returning feature with same name: 'model_rank'
 returning feature with same name: 'logger_name'
 returning feature with same name: 'logger_version'
 returning feature with same name: 'max_epochs'
 returning feature with same name: 'max_steps'
 returning feature with same name: 'precision'
 returning feature with same name: 'accumulate_grad_batches'
 returning feature with same name: 'gradient_clip_val'
 returning feature with same name: 'monitor'
 returning feature with same name: 'save_weights_only'
 returning feature with same name: 'mode'
# track this notebook/script run so that all checkpoint artifacts are associated with the source code
ln.track(params=MODEL_CONFIG, project=ln.Project(name="MLflow tutorial").save())
Hide code cell output
 created Transform('PKBTvfzJKHcx0000', key='mlflow.ipynb'), started new Run('SboJlDLZGMAyJkHS') at 2026-01-29 22:08:45 UTC
→ params: batch_size=32, lr=0.001
 notebook imports: autoencoder lamindb==2.0.1 lightning==2.6.0 mlflow-skinny==3.9.0 mlflow-tracing==3.9.0 mlflow==3.9.0 torch==2.10.0 torchvision==0.25.0
 recommendation: to identify the notebook across renames, pass the uid: ln.track("PKBTvfzJKHcx", project="MLflow tutorial", params={...})

Define a model

We use a basic PyTorch Lightning autoencoder as an example model.

Code of LitAutoEncoder
Simple autoencoder model
import torch
import lightning

from torch import optim, nn


class LitAutoEncoder(lightning.LightningModule):
    def __init__(self, hidden_size: int, bottleneck_size: int) -> None:
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(28 * 28, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, bottleneck_size),
        )
        self.decoder = nn.Sequential(
            nn.Linear(bottleneck_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, 28 * 28),
        )
        self.save_hyperparameters()

    def training_step(
        self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int
    ) -> torch.Tensor:
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = nn.functional.mse_loss(x_hat, x)
        self.log("train_loss", loss, on_epoch=True)
        return loss

    def validation_step(
        self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int
    ) -> torch.Tensor:
        x, y = batch
        x = x.view(x.size(0), -1)
        z = self.encoder(x)
        x_hat = self.decoder(z)
        loss = nn.functional.mse_loss(x_hat, x)
        self.log("val_loss", loss, on_epoch=True)
        return loss

    def configure_optimizers(self) -> optim.Optimizer:
        optimizer = optim.Adam(self.parameters(), lr=1e-3)
        return optimizer

Query & download the MNIST dataset

We saved the MNIST dataset in a curation notebook which now shows up in the Artifact registry:

ln.Artifact.filter(kind="dataset").to_dataframe()

Let’s get the dataset:

mnist_af = ln.Artifact.get(key="testdata/mnist")
mnist_af
Hide code cell output
Artifact(uid='AxuFhgq2XFThNW3j0000', version_tag=None, is_latest=True, key='testdata/mnist', description='Complete MNIST dataset directory containing training and test data', suffix='', kind='dataset', otype=None, size=54950048, hash='amFx_vXqnUtJr0kmxxWK2Q', n_files=4, n_observations=None, branch_id=1, space_id=1, storage_id=3, run_id=1, schema_id=None, created_by_id=3, created_at=2026-01-29 22:07:54 UTC, is_locked=False)

And download it to a local cache:

path = mnist_af.cache()
path
Hide code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/AxuFhgq2XFThNW3j')

Create a PyTorch-compatible dataset:

mnist_dataset = MNIST(path.as_posix(), transform=ToTensor())
mnist_dataset
Hide code cell output
Dataset MNIST
    Number of datapoints: 60000
    Root location: /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/AxuFhgq2XFThNW3j
    Split: Train
    StandardTransform
Transform: ToTensor()

Monitor training with MLflow

Train our example model and track the training progress with MLflow.

# enable MLFlow PyTorch autologging
mlflow.pytorch.autolog()
with mlflow.start_run() as mlflow_run:
    train_dataset = MNIST(
        root="./data", train=True, download=True, transform=ToTensor()
    )
    val_dataset = MNIST(root="./data", train=False, download=True, transform=ToTensor())
    train_loader = utils.data.DataLoader(train_dataset, batch_size=32)
    val_loader = utils.data.DataLoader(val_dataset, batch_size=32)

    # create model
    autoencoder = LitAutoEncoder(hidden_size=32, bottleneck_size=16)

    # Create a LaminDB Lightning integration Checkpoint which also (optionally) annotates checkpoints by desired metrics
    lamindb_callback = ll.Checkpoint(
        dirpath=f"testmodels/mlflow/{mlflow_run.info.run_id}",
        features={
            "run": {
                "mlflow_run_id": mlflow_run.info.run_id,
                "mlflow_run_name": mlflow_run.info.run_name,
            },
            "artifact": {
                **{metric: None for metric in metrics_to_annotate}
            },  # auto-populated through callback
        },
    )

    # Train model
    trainer = pl.Trainer(
        limit_train_batches=3,
        max_epochs=5,
        callbacks=[lamindb_callback],
    )
    trainer.fit(
        model=autoencoder, train_dataloaders=train_loader, val_dataloaders=val_loader
    )

    # Register model_summary.txt
    local_model_summary_path = (
        f"{mlflow_run.info.artifact_uri.removeprefix('file://')}/model_summary.txt"
    )
    mlflow_model_summary_af = ln.Artifact(
        local_model_summary_path,
        key=local_model_summary_path,
        kind="model",
    ).save()
Hide code cell output
2026/01/29 22:08:48 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2026/01/29 22:08:48 INFO mlflow.store.db.utils: Updating database tables
INFO:pytorch_lightning.utilities.rank_zero:GPU available: False, used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:76: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
┏━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓
┃    Name     Type        Params  Mode   FLOPs ┃
┡━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩
│ 0 │ encoder │ Sequential │ 25.6 K │ train │     0 │
│ 1 │ decoder │ Sequential │ 26.4 K │ train │     0 │
└───┴─────────┴────────────┴────────┴───────┴───────┘
Trainable params: 52.1 K                                                                                           
Non-trainable params: 0                                                                                            
Total params: 52.1 K                                                                                               
Total estimated model params size (MB): 0                                                                          
Modules in train mode: 8                                                                                           
Modules in eval mode: 0                                                                                            
Total FLOPs: 0                                                                                                     
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/utilities/_pytree.py:21: 
`isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` 
instead.
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_conn
ector.py:434: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the 
value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_conn
ector.py:434: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the 
value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/loops/fit_loop.py:317: The 
number of training batches (3) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower 
value for log_every_n_steps if you want to see logs for the training epoch.
! calling anonymously, will miss private instances
INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.

2026/01/29 22:09:04 WARNING mlflow.pytorch: Saving pytorch model by Pickle or CloudPickle format requires exercising caution because these formats rely on Python's object serialization mechanism, which can execute arbitrary code during deserialization.The recommended safe alternative is to set 'export_model' to True to save the pytorch model using the safe graph model format.

MLflow and LaminDB user interfaces together

MLflow and LaminDB runs:

Both MLflow and LaminDB capture any runs together with run parameters.

MLFlow experiment overview

LaminHub run overview

MLFlow experiment overview UI

LaminHub run UI

MLflow run details and LaminDB artifact details:

MLflow and LaminDB complement each other. Whereas MLflow is excellent at capturing metrics over time, LaminDB excells at capturing lineage of input & output data and training checkpoints.

MLFlow run view

LaminHub lineage view

MLFlow runs

Laminhub lineage lineage

Both frameworks display output artifacts that were generated during the run. LaminDB further captures input artifacts, their origin and the associated source code.

MLFlow artifact view

LaminHub artifact view

MLFlow artifact UI

LaminHub artifact UI

All checkpoints are automatically annotated by the specified training metrics and MLflow run ID & name to keep both frameworks in sync:

last_checkpoint_af = (
    ln.Artifact.filter(is_best_model=True)
    .filter(suffix__endswith="ckpt", is_latest=True)
    .last()
)
last_checkpoint_af.describe()
Hide code cell output
Artifact: 
/home/runner/work/lamin-mlops/lamin-mlops/docs/testmodels/mlflow/ef0820a9b29f4a82ad9715e9bde22c4b/epoch=4-step=15.c
kpt (0000)
|   description: Lightning model checkpoint
├── uid: tS0uInCTV7O50Bxz0000            run: SboJlDL (mlflow.ipynb)
kind: model                          otype: None                
hash: FYHuPyn8-WUhHtpyyzLESg         size: 621.8 KB             
branch: main                         space: all                 
created_at: 2026-01-29 22:09:04 UTC  created_by: anonymous      
├── storage/path: /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/tS0uInCTV7O50Bxz0000.ckpt
├── Features
└── current_epoch                  int                                  4                                      
    is_best_model                  bool                                 True                                   
    train_loss                     float                                0.10700435191392899                    
    val_loss                       float                                0.11011336743831635                    
└── Labels
    └── .projects                      Project                              MLflow tutorial                        

To reuse the checkpoint later:

last_checkpoint_af.cache()
Hide code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/tS0uInCTV7O50Bxz0000.ckpt')
last_checkpoint_af.view_lineage()
Hide code cell output
! The following artifacts are both inputs and outputs of Run(uid=SboJlDLZGMAyJkHS): {Artifact(uid='RQKDR5gyX9FaiPyH0000', version_tag=None, is_latest=True, key='/home/runner/work/lamin-mlops/lamin-mlops/docs/testmodels/mlflow/ef0820a9b29f4a82ad9715e9bde22c4b/hparams.yaml', description='Lightning run hyperparameters', suffix='.yaml', kind=None, otype=None, size=36, hash='ywrpBniuL7K7y0Aq0FT1Wg', n_files=None, n_observations=None, branch_id=1, space_id=1, storage_id=3, run_id=3, schema_id=None, created_by_id=3, created_at=2026-01-29 22:08:52 UTC, is_locked=False)}
   → Only showing as outputs.
_images/58c4383b0539190ca0490519d394ae3f6eedac7cb629d6e5ab47cac1d0d35684.svg

Features associated with a whole training run are annotated on a run level:

ln.context.run.features
Hide code cell output
Run: SboJlDL (mlflow.ipynb)
└── Features
    └── accumulate_grad_batches        int                                  1                                      
        bottleneck_size                int                                  16                                     
        hidden_size                    int                                  32                                     
        logger_name                    str                                  lightning_logs                         
        logger_version                 str                                  version_0                              
        max_epochs                     int                                  5                                      
        max_steps                      int                                  -1                                     
        mlflow_run_id                  str                                  ef0820a9b29f4a82ad9715e9bde22c4b       
        mlflow_run_name                str                                  welcoming-duck-941                     
        mode                           str                                  min                                    
        precision                      str                                  32-true                                
        save_weights_only              bool                                 False                                  

ln.finish()
Hide code cell output
 finished Run('SboJlDLZGMAyJkHS') after 29s at 2026-01-29 22:09:15 UTC