MLFlow¶
LaminDB can be integrated with MLflow to track model checkpoints as artifacts linked against training runs.
# pip install lamindb torchvision lightning wandb
!lamin init --storage ./lamin-mlops
Show code cell output
→ connected lamindb: anonymous/lamin-mlops
import lightning as pl
import lamindb as ln
from lamindb.integrations import lightning as ll
import mlflow
from pathlib import Path
from torch import utils
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
from autoencoder import LitAutoEncoder
Show code cell output
→ connected lamindb: anonymous/lamin-mlops
Tracking models in both LaminDB and MLFlow
It is not always necessary to track all model parameters and metrics in both LaminDB and MLFlow. However, if specific artifacts or runs should be queryable by specific model attributes such as, for example, the learning rate, then these attributes should be tracked. Below, we show exemplary how to do that for the batch size and learning rate but the approach generalizes to more features.
# define model run parameters, features, and labels so that validation passes later on
MODEL_CONFIG = {"batch_size": 32, "lr": 0.001}
hyperparameter = ln.Feature(name="Autoencoder hyperparameter", is_type=True).save()
hyperparams = ln.Feature.from_dict(MODEL_CONFIG, type=hyperparameter)
ln.save(hyperparams)
metrics_to_annotate = ["train_loss", "val_loss", "current_epoch"]
for metric in metrics_to_annotate:
dtype = int if metric == "current_epoch" else float
ln.Feature(name=metric, dtype=dtype).save()
# create all MLflow related features like 'mlflow_run_id'
ln.examples.mlflow.save_mlflow_features()
Show code cell output
→ returning feature with same name: 'Autoencoder hyperparameter'
! rather than passing a string 'int' to dtype, pass a Python object
→ returning feature with same name: 'batch_size'
! rather than passing a string 'float' to dtype, pass a Python object
→ returning feature with same name: 'train_loss'
→ returning feature with same name: 'val_loss'
→ returning feature with same name: 'current_epoch'
! you are trying to create a record with name='mlflow_experiment_name' but a record with similar name exists: 'mlflow_experiment_id'. Did you mean to load it?
# track this notebook/script run so that all checkpoint artifacts are associated with the source code
ln.track(params=MODEL_CONFIG, project=ln.Project(name="MLflow tutorial").save())
Show code cell output
→ created Transform('E09Gn7qRJ05Z0000', key='mlflow.ipynb'), started new Run('iHwyyF0gud8HJwoM') at 2025-12-14 22:40:52 UTC
→ params: batch_size=32, lr=0.001
→ notebook imports: autoencoder lamindb==1.17.0 lightning==2.6.0 mlflow-skinny==3.7.0 mlflow-tracing==3.7.0 mlflow==3.7.0 torch==2.9.1 torchvision==0.24.1
• recommendation: to identify the notebook across renames, pass the uid: ln.track("E09Gn7qRJ05Z", project="MLflow tutorial", params={...})
Define a model¶
We use a basic PyTorch Lightning autoencoder as an example model.
Code of LitAutoEncoder
import torch
import lightning
from torch import optim, nn
class LitAutoEncoder(lightning.LightningModule):
def __init__(self, hidden_size: int, bottleneck_size: int) -> None:
super().__init__()
self.encoder = nn.Sequential(
nn.Linear(28 * 28, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, bottleneck_size),
)
self.decoder = nn.Sequential(
nn.Linear(bottleneck_size, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, 28 * 28),
)
self.save_hyperparameters()
def training_step(
self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int
) -> torch.Tensor:
x, y = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
self.log("train_loss", loss, on_epoch=True)
return loss
def validation_step(
self, batch: tuple[torch.Tensor, torch.Tensor], batch_idx: int
) -> torch.Tensor:
x, y = batch
x = x.view(x.size(0), -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
self.log("val_loss", loss, on_epoch=True)
return loss
def configure_optimizers(self) -> optim.Optimizer:
optimizer = optim.Adam(self.parameters(), lr=1e-3)
return optimizer
Query & download the MNIST dataset¶
We saved the MNIST dataset in a curation notebook which now shows up in the Artifact registry:
ln.Artifact.filter(kind="dataset").to_dataframe()
Show code cell output
| uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | version | is_latest | is_locked | created_at | branch_id | space_id | storage_id | run_id | schema_id | created_by_id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| id | ||||||||||||||||||||
| 1 | L4dnb0BsuSqbMHxE0000 | testdata/mnist | Complete MNIST dataset directory containing tr... | dataset | None | 54950048 | amFx_vXqnUtJr0kmxxWK2Q | 4 | None | None | True | False | 2025-12-14 22:40:11.399000+00:00 | 1 | 1 | 2 | 1 | None | 2 |
Let’s get the dataset:
artifact = ln.Artifact.get(key="testdata/mnist")
artifact
Show code cell output
Artifact(uid='L4dnb0BsuSqbMHxE0000', version=None, is_latest=True, key='testdata/mnist', description='Complete MNIST dataset directory containing training and test data', suffix='', kind='dataset', otype=None, size=54950048, hash='amFx_vXqnUtJr0kmxxWK2Q', n_files=4, n_observations=None, branch_id=1, space_id=1, storage_id=2, run_id=1, schema_id=None, created_by_id=2, created_at=2025-12-14 22:40:11 UTC, is_locked=False)
And download it to a local cache:
path = artifact.cache()
path
Show code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/L4dnb0BsuSqbMHxE')
Create a PyTorch-compatible dataset:
dataset = MNIST(path.as_posix(), transform=ToTensor())
dataset
Show code cell output
Dataset MNIST
Number of datapoints: 60000
Root location: /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/L4dnb0BsuSqbMHxE
Split: Train
StandardTransform
Transform: ToTensor()
Monitor training with MLflow¶
Train our example model and track the training progress with MLflow.
# enable MLFlow PyTorch autologging
mlflow.pytorch.autolog()
with mlflow.start_run() as mlflow_run:
train_dataset = MNIST(
root="./data", train=True, download=True, transform=ToTensor()
)
val_dataset = MNIST(root="./data", train=False, download=True, transform=ToTensor())
train_loader = utils.data.DataLoader(train_dataset, batch_size=32)
val_loader = utils.data.DataLoader(val_dataset, batch_size=32)
# create model
autoencoder = LitAutoEncoder(hidden_size=32, bottleneck_size=16)
# Create a LaminDB LightningCallback which also (optionally) annotates checkpoints by desired metrics
lamindb_callback = ll.Callback(
path=Path("model_checkpoints") / f"{mlflow_run.info.run_id}_last_epoch.ckpt",
key=f"testmodels/mlflow/{mlflow_run.info.run_id}.ckpt",
features={
"mlflow_run_id": mlflow_run.info.run_id,
"mlflow_run_name": mlflow_run.info.run_name,
**{
metric: None for metric in metrics_to_annotate
}, # auto-populated through callback
},
)
# Train model
trainer = pl.Trainer(
limit_train_batches=3,
max_epochs=5,
callbacks=[lamindb_callback],
)
trainer.fit(
model=autoencoder, train_dataloaders=train_loader, val_dataloaders=val_loader
)
# Register model_summary.txt
local_model_summary_path = (
f"{mlflow_run.info.artifact_uri.removeprefix('file://')}/model_summary.txt"
)
mlflow_model_summary_af = ln.Artifact(
local_model_summary_path,
key=local_model_summary_path,
kind="model",
).save()
Show code cell output
2025/12/14 22:40:54 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2025/12/14 22:40:54 INFO mlflow.store.db.utils: Updating database tables
2025/12/14 22:40:54 INFO alembic.runtime.migration: Context impl SQLiteImpl.
2025/12/14 22:40:54 INFO alembic.runtime.migration: Will assume non-transactional DDL.
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade -> 451aebb31d03, add metric step
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 451aebb31d03 -> 90e64c465722, migrate user column to tags
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 90e64c465722 -> 181f10493468, allow nulls for metric values
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 181f10493468 -> df50e92ffc5e, Add Experiment Tags Table
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade df50e92ffc5e -> 7ac759974ad8, Update run tags with larger limit
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 7ac759974ad8 -> 89d4b8295536, create latest metrics table
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 89d4b8295536 -> 2b4d017a5e9b, add model registry tables to db
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 2b4d017a5e9b -> cfd24bdc0731, Update run status constraint with killed
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade cfd24bdc0731 -> 0a8213491aaa, drop_duplicate_killed_constraint
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 0a8213491aaa -> 728d730b5ebd, add registered model tags table
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 728d730b5ebd -> 27a6a02d2cf1, add model version tags table
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 27a6a02d2cf1 -> 84291f40a231, add run_link to model_version
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 84291f40a231 -> a8c4a736bde6, allow nulls for run_id
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade a8c4a736bde6 -> 39d1c3be5f05, add_is_nan_constraint_for_metrics_tables_if_necessary
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 39d1c3be5f05 -> c48cb773bb87, reset_default_value_for_is_nan_in_metrics_table_for_mysql
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade c48cb773bb87 -> bd07f7e963c5, create index on run_uuid
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade bd07f7e963c5 -> 0c779009ac13, add deleted_time field to runs table
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 0c779009ac13 -> cc1f77228345, change param value length to 500
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade cc1f77228345 -> 97727af70f4d, Add creation_time and last_update_time to experiments table
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 97727af70f4d -> 3500859a5d39, Add Model Aliases table
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 3500859a5d39 -> 7f2a7d5fae7d, add datasets inputs input_tags tables
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 7f2a7d5fae7d -> 2d6e25af4d3e, increase max param val length from 500 to 8000
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 2d6e25af4d3e -> acf3f17fdcc7, add storage location field to model versions
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade acf3f17fdcc7 -> 867495a8f9d4, add trace tables
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 867495a8f9d4 -> 5b0e9adcef9c, add cascade deletion to trace tables foreign keys
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 5b0e9adcef9c -> 4465047574b1, increase max dataset schema size
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 4465047574b1 -> f5a4f2784254, increase run tag value limit to 8000
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade f5a4f2784254 -> 0584bdc529eb, add cascading deletion to datasets from experiments
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 0584bdc529eb -> 400f98739977, add logged model tables
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 400f98739977 -> 6953534de441, add step to inputs table
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 6953534de441 -> bda7b8c39065, increase_model_version_tag_value_limit
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade bda7b8c39065 -> cbc13b556ace, add V3 trace schema columns
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade cbc13b556ace -> 770bee3ae1dd, add assessments table
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 770bee3ae1dd -> a1b2c3d4e5f6, add spans table
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade a1b2c3d4e5f6 -> de4033877273, create entity_associations table
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade de4033877273 -> 1a0cddfcaa16, Add webhooks and webhook_events tables
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 1a0cddfcaa16 -> 534353b11cbc, add scorer tables
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 534353b11cbc -> 71994744cf8e, add evaluation datasets
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 71994744cf8e -> 3da73c924c2f, add outputs to dataset record
2025/12/14 22:40:54 INFO alembic.runtime.migration: Running upgrade 3da73c924c2f -> bf29a5ff90ea, add jobs table
2025/12/14 22:40:55 INFO alembic.runtime.migration: Context impl SQLiteImpl.
2025/12/14 22:40:55 INFO alembic.runtime.migration: Will assume non-transactional DDL.
INFO:pytorch_lightning.utilities.rank_zero:💡 Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: False, used: False
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/logger_connector/logger_connector.py:76: Starting from v1.9.0, `tensorboardX` has been removed as a dependency of the `lightning.pytorch` package, due to potential conflicts with other packages in the ML ecosystem. For this reason, `logger=True` will use `CSVLogger` as the default logger, unless the `tensorboard` or `tensorboardX` packages are found. Please `pip install lightning[extra]` or one of them to enable TensorBoard support by default
2025/12/14 22:40:55 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: "/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/mlflow/pytorch/_lightning_autolog.py:542: UserWarning: Autologging is known to be compatible with pytorch-lightning versions between 2.1.3 and 2.5.6 and may not succeed with packages outside this range."
┏━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━┓ ┃ ┃ Name ┃ Type ┃ Params ┃ Mode ┃ FLOPs ┃ ┡━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━┩ │ 0 │ encoder │ Sequential │ 25.6 K │ train │ 0 │ │ 1 │ decoder │ Sequential │ 26.4 K │ train │ 0 │ └───┴─────────┴────────────┴────────┴───────┴───────┘
Trainable params: 52.1 K Non-trainable params: 0 Total params: 52.1 K Total estimated model params size (MB): 0 Modules in train mode: 8 Modules in eval mode: 0 Total FLOPs: 0
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/rich/live.py:256: UserWarning: install
"ipywidgets" for Jupyter support
warnings.warn('install "ipywidgets" for Jupyter support')
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_conn ector.py:434: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/trainer/connectors/data_conn ector.py:434: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=3` in the `DataLoader` to improve performance.
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/lightning/pytorch/loops/fit_loop.py:317: The number of training batches (3) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
`weights_only` was not set, defaulting to `False`.
! calling anonymously, will miss private instances
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/rich/live.py:256: UserWarning: install
"ipywidgets" for Jupyter support
warnings.warn('install "ipywidgets" for Jupyter support')
`weights_only` was not set, defaulting to `False`.
→ creating new artifact version for key 'testmodels/mlflow/9aea8e4bdd5c4cdebd840c0ebe698776.ckpt' in storage '/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops'
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/rich/live.py:256: UserWarning: install
"ipywidgets" for Jupyter support
warnings.warn('install "ipywidgets" for Jupyter support')
`weights_only` was not set, defaulting to `False`.
→ creating new artifact version for key 'testmodels/mlflow/9aea8e4bdd5c4cdebd840c0ebe698776.ckpt' in storage '/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops'
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/rich/live.py:256: UserWarning: install
"ipywidgets" for Jupyter support
warnings.warn('install "ipywidgets" for Jupyter support')
`weights_only` was not set, defaulting to `False`.
→ creating new artifact version for key 'testmodels/mlflow/9aea8e4bdd5c4cdebd840c0ebe698776.ckpt' in storage '/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops'
/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/rich/live.py:256: UserWarning: install
"ipywidgets" for Jupyter support
warnings.warn('install "ipywidgets" for Jupyter support')
`weights_only` was not set, defaulting to `False`.
→ creating new artifact version for key 'testmodels/mlflow/9aea8e4bdd5c4cdebd840c0ebe698776.ckpt' in storage '/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops'
INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_epochs=5` reached.
MLflow and LaminDB user interfaces together¶
MLflow and LaminDB runs:
Both MLflow and LaminDB capture any runs together with run parameters.
MLFlow experiment overview |
LaminHub run overview |
|---|---|
MLflow run details and LaminDB artifact details:
MLflow and LaminDB complement each other. Whereas MLflow is excellent at capturing metrics over time, LaminDB excells at capturing lineage of input & output data and training checkpoints.
MLFlow run view |
LaminHub lineage view |
|---|---|
Both frameworks display output artifacts that were generated during the run. LaminDB further captures input artifacts, their origin and the associated source code.
MLFlow artifact view |
LaminHub artifact view |
|---|---|
All checkpoints are automatically annotated by the specified training metrics and MLflow run ID & name to keep both frameworks in sync:
last_checkpoint_af = ln.Artifact.filter(
key__startswith="testmodels/mlflow/", suffix__endswith="ckpt", is_latest=True
).last()
last_checkpoint_af.describe()
Show code cell output
Artifact: testmodels/mlflow/9aea8e4bdd5c4cdebd840c0ebe698776.ckpt (0004) ├── uid: UvZq6EokrF0nmxBA0004 run: iHwyyF0 (mlflow.ipynb) │ kind: model otype: None │ hash: cZfl8isS3e0pZN6VtbNXVQ size: 621.8 KB │ branch: main space: all │ created_at: 2025-12-14 22:41:02 UTC created_by: anonymous ├── storage/path: /home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/UvZq6EokrF0nmxBA0004.ckpt ├── Features │ └── current_epoch int 4 │ mlflow_run_id str 9aea8e4bdd5c4cdebd840c0ebe698776 │ mlflow_run_name str worried-gull-198 │ train_loss float 0.10542944818735123 │ val_loss float 0.10727514326572418 └── Labels └── .projects Project MLflow tutorial
To reuse the checkpoint later:
last_checkpoint_af.cache()
Show code cell output
PosixUPath('/home/runner/work/lamin-mlops/lamin-mlops/docs/lamin-mlops/.lamindb/UvZq6EokrF0nmxBA0004.ckpt')
last_checkpoint_af.view_lineage()
Show code cell output
ln.finish()
Show code cell output
→ finished Run('iHwyyF0gud8HJwoM') after 17s at 2025-12-14 22:41:10 UTC





