MLFlow¶
LaminDB can be integrated with MLflow to track model checkpoints as artifacts linked against training runs.
# pip install lamindb torchvision lightning wandb
!lamin init --storage ./lamin-mlops
Show code cell output
→ connected lamindb: anonymous/lamin-mlops
import lightning as pl
import lamindb as ln
from lamindb.integrations import lightning as ll
import mlflow
from torch import utils
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
from autoencoder import LitAutoEncoder
import logging
# Suppress unrelated logger messages
logging.getLogger("alembic").setLevel(logging.WARNING)
Show code cell output
→ connected lamindb: anonymous/lamin-mlops
Tracking models in both LaminDB and MLFlow
It is not always necessary to track all model parameters and metrics in both LaminDB and MLFlow. However, if specific artifacts or runs should be queryable by specific model attributes such as, for example, the learning rate, then these attributes should be tracked. Below, we show exemplary how to do that for the batch size and learning rate but the approach generalizes to more features.
# define model run parameters, features, and labels so that validation passes later on
MODEL_CONFIG = {"batch_size": 32, "lr": 0.001}
hyperparameter = ln.Feature(name="Autoencoder hyperparameter", is_type=True).save()
hyperparams = ln.Feature.from_dict(MODEL_CONFIG, type=hyperparameter).save()
metrics_to_annotate = ["train_loss", "val_loss", "current_epoch"]
for metric in metrics_to_annotate:
dtype = int if metric == "current_epoch" else float
ln.Feature(name=metric, dtype=dtype).save()
# create all MLflow related features like 'mlflow_run_id'
ln.examples.mlflow.save_mlflow_features()
# create all lightning integration features like 'score'
ll.save_lightning_features()
Show code cell output
→ returning feature with same name: 'Autoencoder hyperparameter'
! rather than passing a string 'int' to dtype, consider passing a Python object
→ returning feature with same name: 'batch_size'
! rather than passing a string 'float' to dtype, consider passing a Python object
→ returning feature with same name: 'train_loss'
→ returning feature with same name: 'val_loss'
→ returning feature with same name: 'current_epoch'
! you are trying to create a record with name='mlflow_experiment_name' but a record with similar name exists: 'mlflow_experiment_id'. Did you mean to load it?
→ returning feature with same name: 'is_best_model'
→ returning feature with same name: 'score'
→ returning feature with same name: 'model_rank'
→ returning feature with same name: 'logger_name'
→ returning feature with same name: 'logger_version'
→ returning feature with same name: 'max_epochs'
→ returning feature with same name: 'max_steps'
→ returning feature with same name: 'precision'
→ returning feature with same name: 'accumulate_grad_batches'
→ returning feature with same name: 'gradient_clip_val'
→ returning feature with same name: 'monitor'
→ returning feature with same name: 'save_weights_only'
→ returning feature with same name: 'mode'
# track this notebook/script run so that all checkpoint artifacts are associated with the source code
ln.track(params=MODEL_CONFIG, project=ln.Project(name="MLflow tutorial").save())
Show code cell output
→ created Transform('PKBTvfzJKHcx0000', key='mlflow.ipynb'), started new Run('SboJlDLZGMAyJkHS') at 2026-01-29 22:08:45 UTC
→ params: batch_size=32, lr=0.001
→ notebook imports: autoencoder lamindb==2.0.1 lightning==2.6.0 mlflow-skinny==3.9.0 mlflow-tracing==3.9.0 mlflow==3.9.0 torch==2.10.0 torchvision==0.25.0
• recommendation: to identify the notebook across renames, pass the uid: ln.track("PKBTvfzJKHcx", project="MLflow tutorial", params={...})
Define a model¶
We use a basic PyTorch Lightning autoencoder as an example model.
Code of LitAutoEncoder
import torch
import lightning
from torch import optim, nn
class LitAutoEncoder(lightning.LightningModule):
def __init__(self, hidden_size: int, bottleneck_size: int) -> None:
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(28 * 28, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, bottleneck_size),
)
self.decoder = nn.Sequential(
nn.Linear(bottleneck_size, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, 28 * 28),
)
self.save_hyperparameters()
def training_step(
self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int
) -> torch.Tensor:
x, y = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
self.log("train_loss", loss, on_epoch=True)
return loss
def validation_step(
self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int
) -> torch.Tensor:
x, y = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
self.log("val_loss", loss, on_epoch=True)
return loss
def configure_optimizers(self) -> optim.Optimizer:
optimizer = optim.Adam(self.parameters(), lr=1e-3)
return optimizer
Query & download the MNIST dataset¶
We saved the MNIST dataset in a curation notebook which now shows up in the Artifact registry:
ln.Artifact.filter(kind="dataset").to_dataframe()
Let’s get the dataset:
mnist_af = ln.Artifact.get(key="testdata/mnist")
mnist_af
Show code cell output
Artifact(uid='AxuFhgq2XFThNW3j0000', version_tag=None, is_latest=True, key='testdata/mnist', description='Complete MNIST dataset directory containing training and test data', suffix='', kind='dataset', otype=None, size=54950048, hash='amFx_vXqnUtJr0kmxxWK2Q', n_files=4, n_observations=None, branch_id=1, space_id=1, storage_id=3, run_id=1, schema_id=None, created_by_id=3, created_at=2026-01-29 22:07:54 UTC, is_locked=False)
And download it to a local cache:
path = mnist_af.cache()
path
Show code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/AxuFhgq2XFThNW3j')
Create a PyTorch-compatible dataset:
mnist_dataset = MNIST(path.as_posix(), transform=ToTensor())
mnist_dataset
Show code cell output
Dataset MNIST
Number of datapoints: 60000
Root location: /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/AxuFhgq2XFThNW3j
Split: Train
StandardTransform
Transform: ToTensor()
Monitor training with MLflow¶
Train our example model and track the training progress with MLflow.
# enable MLFlow PyTorch autologging
mlflow.pytorch.autolog()
with mlflow.start_run() as mlflow_run:
train_dataset = MNIST(
root="./data", train=True, download=True, transform=ToTensor()
)
val_dataset = MNIST(root="./data", train=False, download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(train_dataset, batch_size=32)
val_loader = utils.data.DataLoader(val_dataset, batch_size=32)
# create model
autoencoder = LitAutoEncoder(hidden_size=32, bottleneck_size=16)
# Create a LaminDB Lightning integration Checkpoint which also (optionally) annotates checkpoints by desired metrics
lamindb_callback = ll.Checkpoint(
dirpath=f"testmodels/mlflow/{mlflow_run.info.run_id}",
features={
"run": {
"mlflow_run_id": mlflow_run.info.run_id,
"mlflow_run_name": mlflow_run.info.run_name,
},
"artifact": {
**{metric: None for metric in metrics_to_annotate}
}, # auto-populated through callback
},
)
# Train model
trainer = pl.Trainer(
limit_train_batches=3,
max_epochs=5,
callbacks=[lamindb_callback],
)
trainer.fit(
model=autoencoder, train_dataloaders=train_loader, val_dataloaders=val_loader
)
# Register model_summary.txt
local_model_summary_path = (
f"{mlflow_run.info.artifact_uri.removeprefix('file://')}/model_summary.txt"
)
mlflow_model_summary_af = ln.Artifact(
local_model_summary_path,
key=local_model_summary_path,
kind="model",
).save()
Show code cell output
2026/01/29 22:08:48 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2026/01/29 22:08:48 INFO mlflow.store.db.utils: Updating database tables
INFO:pytorch_lightning.utilities.rank_zero:GPU available: False, used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:76: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
┏━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓ ┃ ┃ Name ┃ Type ┃ Params ┃ Mode ┃ FLOPs ┃ ┡━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩ │ 0 │ encoder │ Sequential │ 25.6 K │ train │ 0 │ │ 1 │ decoder │ Sequential │ 26.4 K │ train │ 0 │ └───┴─────────┴────────────┴────────┴───────┴───────┘
Trainable params: 52.1 K Non-trainable params: 0 Total params: 52.1 K Total estimated model params size (MB): 0 Modules in train mode: 8 Modules in eval mode: 0 Total FLOPs: 0
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/utilities/_pytree.py:21: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_conn ector.py:434: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_conn ector.py:434: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/loops/fit_loop.py:317: The number of training batches (3) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
! calling anonymously, will miss private instances
INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.
2026/01/29 22:09:04 WARNING mlflow.pytorch: Saving pytorch model by Pickle or CloudPickle format requires exercising caution because these formats rely on Python's object serialization mechanism, which can execute arbitrary code during deserialization.The recommended safe alternative is to set 'export_model' to True to save the pytorch model using the safe graph model format.
MLflow and LaminDB user interfaces together¶
MLflow and LaminDB runs:
Both MLflow and LaminDB capture any runs together with run parameters.
MLFlow experiment overview |
LaminHub run overview |
|---|---|
MLflow run details and LaminDB artifact details:
MLflow and LaminDB complement each other. Whereas MLflow is excellent at capturing metrics over time, LaminDB excells at capturing lineage of input & output data and training checkpoints.
MLFlow run view |
LaminHub lineage view |
|---|---|
Both frameworks display output artifacts that were generated during the run. LaminDB further captures input artifacts, their origin and the associated source code.
MLFlow artifact view |
LaminHub artifact view |
|---|---|
All checkpoints are automatically annotated by the specified training metrics and MLflow run ID & name to keep both frameworks in sync:
last_checkpoint_af = (
ln.Artifact.filter(is_best_model=True)
.filter(suffix__endswith="ckpt", is_latest=True)
.last()
)
last_checkpoint_af.describe()
Show code cell output
Artifact: /home/runner/work/lamin-mlops/lamin-mlops/docs/testmodels/mlflow/ef0820a9b29f4a82ad9715e9bde22c4b/epoch=4-step=15.c kpt (0000) | description: Lightning model checkpoint ├── uid: tS0uInCTV7O50Bxz0000 run: SboJlDL (mlflow.ipynb) │ kind: model otype: None │ hash: FYHuPyn8-WUhHtpyyzLESg size: 621.8 KB │ branch: main space: all │ created_at: 2026-01-29 22:09:04 UTC created_by: anonymous ├── storage/path: /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/tS0uInCTV7O50Bxz0000.ckpt ├── Features │ └── current_epoch int 4 │ is_best_model bool True │ train_loss float 0.10700435191392899 │ val_loss float 0.11011336743831635 └── Labels └── .projects Project MLflow tutorial
To reuse the checkpoint later:
last_checkpoint_af.cache()
Show code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/tS0uInCTV7O50Bxz0000.ckpt')
last_checkpoint_af.view_lineage()
Show code cell output
! The following artifacts are both inputs and outputs of Run(uid=SboJlDLZGMAyJkHS): {Artifact(uid='RQKDR5gyX9FaiPyH0000', version_tag=None, is_latest=True, key='/home/runner/work/lamin-mlops/lamin-mlops/docs/testmodels/mlflow/ef0820a9b29f4a82ad9715e9bde22c4b/hparams.yaml', description='Lightning run hyperparameters', suffix='.yaml', kind=None, otype=None, size=36, hash='ywrpBniuL7K7y0Aq0FT1Wg', n_files=None, n_observations=None, branch_id=1, space_id=1, storage_id=3, run_id=3, schema_id=None, created_by_id=3, created_at=2026-01-29 22:08:52 UTC, is_locked=False)}
→ Only showing as outputs.
Features associated with a whole training run are annotated on a run level:
ln.context.run.features
Show code cell output
Run: SboJlDL (mlflow.ipynb) └── Features └── accumulate_grad_batches int 1 bottleneck_size int 16 hidden_size int 32 logger_name str lightning_logs logger_version str version_0 max_epochs int 5 max_steps int -1 mlflow_run_id str ef0820a9b29f4a82ad9715e9bde22c4b mlflow_run_name str welcoming-duck-941 mode str min precision str 32-true save_weights_only bool False
ln.finish()
Show code cell output
→ finished Run('SboJlDLZGMAyJkHS') after 29s at 2026-01-29 22:09:15 UTC





