lamindb.integrations.lightning .md

PyTorch Lightning integration for LaminDB.

The public API has two layers:

  • Checkpoint is the concrete LaminDB implementation that persists checkpoint, config, and hparams.yaml files as Artifact objects and annotates them with Feature objects.

  • ArtifactPublishingModelCheckpoint is the generic extension layer adding checkpoint artifact lifecycle hooks without implementing Lamin persistence details yet.

External integrations can either subclass Checkpoint directly or attach an ArtifactObserver to react to saved and removed artifacts.

Here is a guide: lightning.

Main API

class lamindb.integrations.lightning.Checkpoint(dirpath=None, *, features=None, monitor=None, verbose=False, save_last=None, save_top_k=1, save_weights_only=False, mode='min', auto_insert_metric_name=True, every_n_train_steps=None, train_time_interval=None, every_n_epochs=None, save_on_train_epoch_end=None, enable_version_counter=True, run_uid_is_version=True, artifact_observers=None)

A ModelCheckpoint that annotates pytorch lightning checkpoints.

Extends lightning’s ModelCheckpoint with artifact creation & feature annotation. Each checkpoint is a separate artifact whose key is derived from either the explicit dirpath or the trainer’s logger configuration.

When dirpath is omitted (recommended), Lightning decides where to store checkpoints locally (typically lightning_logs/version_N/checkpoints/) and the artifact key is derived from the logger’s save_dir, name, and version. When dirpath is provided, it is used directly as the key prefix.

All artifacts are scoped under a single base prefix. Checkpoints (and hparams.yaml) live under {base}/checkpoints/; other artifacts (e.g. config.yaml) live directly under {base}/.

Base prefix derivation (highest priority first):

  1. dirpath provided → {dirpath} (logger is ignored for key purposes)

  2. dirpath omitted, logger present → {save_dir_basename}/{name}/{version}

  3. dirpath omitted, no logger → empty

When run_uid_is_version is True (the default) and a Lamin run context is active, the run UID is incorporated into the base prefix:

  • Case 1/3: the run UID is appended as an extra path segment (e.g. my/dir/{run_uid}, or just {run_uid}).

  • Case 2: the logger’s auto-incremented version is replaced by the run UID ({save_dir_basename}/{name}/{run_uid}).

Resulting key layout (with run UID active):

{base}/checkpoints/epoch=0-step=100.ckpt
{base}/checkpoints/hparams.yaml
{base}/config.yaml

If available in the database through save_lightning_features(), the following lamindb.lightning features are automatically tracked:

  • Artifact-level: is_best_model, is_last_model, score, model_rank, save_weights_only, monitor, mode

  • Run-level: logger_name, logger_version, max_epochs, max_steps, precision, accumulate_grad_batches, gradient_clip_val, monitor, mode

Additionally, model hyperparameters (from pl_module.hparams) and datamodule hyperparameters (from trainer.datamodule.hparams) are captured if corresponding features exist.

This is the concrete LaminDB implementation built on top of ArtifactPublishingModelCheckpoint. Use it when you want LaminDB to be the persistence layer. For secondary systems such as ClearML, prefer attaching an ArtifactObserver or subclassing Checkpoint and reacting in on_artifact_saved().

Parameters:
  • dirpath (_PATH | None, default: None) – Directory for checkpoints. When provided, also used as the artifact key prefix. When omitted (recommended), Lightning picks the local directory and the key prefix is derived from the logger.

  • features (dict[Literal[‘run’, ‘artifact’], dict[str, Any]] | None, default: None) – Features to annotate runs and artifacts. Use “run” key for run-level features (static metadata). Use “artifact” key for artifact-level features (values can be static or None for auto-population from trainer metrics/attributes).

  • monitor (str | None, default: None) – Quantity to monitor for saving best checkpoint.

  • verbose (bool, default: False) – Verbosity mode.

  • save_last (bool | None, default: None) – Save a copy of the last checkpoint.

  • save_top_k (int, default: 1) – Number of best checkpoints to keep.

  • save_weights_only (bool, default: False) – Save only model weights (not optimizer state).

  • mode (Literal[‘min’, ‘max’], default: 'min') – One of “min” or “max” for monitor comparison.

  • auto_insert_metric_name (bool, default: True) – Include metric name in checkpoint filename.

  • every_n_train_steps (int | None, default: None) – Checkpoint every N training steps.

  • train_time_interval (timedelta | None, default: None) – Checkpoint at time intervals.

  • every_n_epochs (int | None, default: None) – Checkpoint every N epochs.

  • save_on_train_epoch_end (bool | None, default: None) – Run checkpointing at end of training epoch.

  • enable_version_counter (bool, default: True) – Append version to filename to avoid collisions.

  • run_uid_is_version (bool, default: True) – When True (default) and a Lamin run context is active, incorporate the run UID into the base prefix. For the logger case the logger’s auto-incremented version is replaced; for the dirpath and no-logger cases the run UID is appended as an extra path segment. Prevents cross-run key collisions.

  • artifact_observers (list[ArtifactObserver] | None, default: None) – Optional observer objects notified when checkpoint, config, or hparams artifacts are saved or when checkpoint files are removed locally. Observers follow ArtifactObserver and receive ArtifactSavedEvent and ArtifactRemovedEvent.

Examples

Let Lightning decide where to store checkpoints (recommended):

import lightning as pl
from lightning.pytorch.loggers import CSVLogger
from lamindb.integrations import lightning as ll

ll.save_lightning_features()

callback = ll.Checkpoint(monitor="val_loss", save_top_k=3)
logger = CSVLogger(save_dir="logs")

trainer = pl.Trainer(callbacks=[callback], logger=logger)
trainer.fit(model, dataloader)

# Query checkpoints — key prefix is derived from the logger
# e.g. "logs/lightning_logs/version_0/checkpoints/"
ln.Artifact.filter(key__startswith=callback.checkpoint_key_prefix)

Explicit dirpath for full control over the artifact key prefix:

callback = ll.Checkpoint(
    dirpath="deployments/my_model/",
    monitor="val_loss",
    save_top_k=3,
)

trainer = pl.Trainer(callbacks=[callback])
trainer.fit(model, dataloader)

# Query checkpoints
ln.Artifact.filter(key__startswith=callback.checkpoint_key_prefix)

Using the CLI:

# config.yaml
trainer:
  callbacks:
    - class_path: lamindb.integrations.lightning.Checkpoint
      init_args:
        monitor: val_loss
        save_top_k: 3

# Run with:
# python main.py fit --config config.yaml

For more, see the guide: lightning.

property base_prefix: str

The base artifact key prefix for all artifacts from this callback.

Checkpoints live under {base_prefix}/checkpoints/ and configs directly under {base_prefix}/.

Available after setup() has been called.

property checkpoint_key_prefix: str

The artifact key prefix used for checkpoint artifacts.

Available after setup() has been called, for example once trainer.fit() has started.

setup(trainer, pl_module, stage)

Validate user features and detect available auto-features.

Return type:

None

resolve_artifact_storage_uri(artifact)

Resolve the physical artifact location for downstream registries.

This is the stable abstraction external packages should use instead of reconstructing storage locations from Lamin internals.

Return type:

str

resolve_artifact_key(trainer, filepath, kind)

Return the Lamin artifact key for a checkpoint-related file.

Return type:

str

save_checkpoint_artifact(trainer, filepath, *, feature_values=None)

Save a checkpoint artifact to Lamin and emit the corresponding event.

This is the main persistence hook used by _save_checkpoint(). It is a useful override point for subclasses that want to augment Lamin persistence while keeping the generic lifecycle behavior from the base class.

Return type:

Artifact

save_config_artifact(trainer, config_path)

Save a Lightning CLI config artifact and emit the corresponding event.

Config artifacts are routed through the same lifecycle surface as checkpoints so observers and subclasses see a unified event stream.

Return type:

Artifact

save_hparams_artifact(trainer, hparams_path)

Save Lightning’s auto-generated hparams file and emit the event.

Returns None if Lightning did not generate hparams.yaml for the current run.

Return type:

Artifact | None

lamindb.integrations.lightning.save_lightning_features()

Save features to auto-track lightning parameters & metrics.

Creates the following features under the lamindb.lightning feature type if they do not already exist:

Artifact-level features:

  • is_best_model (bool): Whether this checkpoint is the best model.

  • is_last_model (bool): Whether this checkpoint is the most recently saved model.

  • score (float): The monitored metric score.

  • model_rank (int): Rank among all checkpoints (0 = best).

  • save_weights_only (bool): Whether this checkpoint only stores model weights.

  • monitor (str): Metric name this checkpoint uses for comparison.

  • mode (str): Optimization mode (min or max) used for checkpoint ranking.

Run-level features:

  • logger_name (str): Name from the first Lightning logger.

  • logger_version (str): Version from the first Lightning logger.

  • max_epochs (int): Maximum number of epochs.

  • max_steps (int): Maximum number of training steps.

  • precision (str): Training precision (e.g., “32”, “16-mixed”, “bf16”).

  • accumulate_grad_batches (int): Number of batches to accumulate gradients over.

  • gradient_clip_val (float): Gradient clipping value.

  • monitor (str): Metric name being monitored.

  • mode (str): Optimization mode (“min” or “max”).

Parameters:

None.

Return type:

None

Example

Save the features to the database:

from lamindb.integrations import lightning as ll

ll.save_lightning_features()

Auxiliary classes

class lamindb.integrations.lightning.ArtifactPublishingModelCheckpoint(*args, artifact_observers=None, **kwargs)

ModelCheckpoint with observable artifact lifecycle hooks.

This layer captures artifact kinds, observer registration, saved/removed

events, latest artifact tracking, and key compatibility hooks. Concrete subclasses remain responsible for how artifacts are persisted.

Subclasses are expected to implement:

SaveConfigCallback only depends on this base class, which means a custom checkpoint callback can participate in config saving without inheriting from Lamin’s concrete Checkpoint.

property last_checkpoint_artifact: Any | None

The most recently saved checkpoint artifact handle.

property last_config_artifact: Any | None

The most recently saved config artifact handle.

property last_hparams_artifact: Any | None

The most recently saved hparams artifact handle.

property last_artifact_event: ArtifactSavedEvent | ArtifactRemovedEvent | None

The last artifact lifecycle event emitted by this callback.

get_last_artifact(kind)

Return the most recently saved artifact for a given artifact kind.

Return type:

Any | None

add_artifact_observer(observer)

Register an observer notified about artifact lifecycle events.

Return type:

None

remove_artifact_observer(observer)

Unregister a previously added artifact observer.

Return type:

None

resolve_artifact_storage_uri(artifact)

Resolve the physical location for a persisted artifact.

Return type:

str

resolve_artifact_key(trainer, filepath, kind)

Return the logical artifact key for a checkpoint-related file.

Return type:

str

on_artifact_saved(event)

Hook for subclasses after an artifact has been saved.

Return type:

None

on_artifact_removed(event)

Hook for subclasses after a checkpoint file has been removed.

Return type:

None

save_checkpoint_artifact(trainer, filepath, *, feature_values=None)

Persist a checkpoint artifact and emit the corresponding event.

Return type:

Any

save_config_artifact(trainer, config_path)

Persist a config artifact and emit the corresponding event.

Return type:

Any

save_hparams_artifact(trainer, hparams_path)

Persist an hparams artifact and emit the corresponding event.

Return type:

Any | None

class lamindb.integrations.lightning.SaveConfigCallback(*args, **kwargs)

SaveConfigCallback that also saves config to the instance.

Use with LightningCLI to save the resolved configuration file alongside checkpoints.

The local config file is saved under {save_dir}/{name}/{version}/ derived from the first logger, avoiding Lightning’s trainer.log_dir which hardcodes an isinstance check for TensorBoardLogger / CSVLogger and silently changes the directory for other loggers.

This callback looks for any ArtifactPublishingModelCheckpoint, not just Lamin’s concrete Checkpoint. That keeps the config-save path aligned with custom subclasses built on the generic artifact-publishing base.

Config artifacts are stored directly under the base prefix of the active Checkpoint callback. The base prefix follows the same derivation rules as for checkpoints (dirpath > logger > empty), so configs are always co-located with their checkpoints:

  • Checkpoint.dirpath set → {dirpath}/config.yaml ({dirpath}/{run_uid}/config.yaml with run-UID scoping)

  • Logger present, no dirpath{save_dir_basename}/{name}/{version}/config.yaml

  • Neither → config.yaml (or {run_uid}/config.yaml with run-UID scoping)

Example:

from lightning.pytorch.cli import LightningCLI
from lamindb.integrations import lightning as ll

cli = LightningCLI(
    MyModel,
    MyDataModule,
    save_config_callback=ll.SaveConfigCallback,
)
setup(trainer, pl_module, stage)

Save resolved configuration file alongside checkpoints.

Return type:

None

class lamindb.integrations.lightning.ArtifactSavedEvent(kind, key, local_path, trainer, artifact, storage_uri)

Metadata emitted after a checkpoint-related artifact has been persisted.

artifact is intentionally typed generically so downstream integrations can expose their own persisted object while still using the common lifecycle API. storage_uri is the stable hand-off value for registries such as ClearML.

class lamindb.integrations.lightning.ArtifactRemovedEvent(kind, key, local_path, trainer, artifact=None, storage_uri=None)

Metadata emitted after a local checkpoint file has been removed.

Removal currently applies to checkpoint files. Config and hparams artifacts are save-only in the current Lightning integration.

artifact: Any | None = None
storage_uri: str | None = None