Manage notebooks, scripts & workflows

If you don’t have a lamindb instance, here’s how to create one:

!lamin init --storage ./test-track
Hide code cell output
 initialized lamindb: testuser1/test-track

Manage notebooks and scripts

Call track() to save your notebook or script as a transform and start tracking inputs & outputs of a run.

import lamindb as ln

ln.track()  # initiate a tracked notebook/script run

# your code automatically tracks inputs & outputs

ln.finish()  # mark run as finished, save execution report, source code & environment

You find your notebooks and scripts in the Transform registry along with pipelines & functions:

transform = ln.Transform.get(key="my_analyses/my_notebook.ipynb")
transform.source_code             # source code
transform.runs.to_dataframe()     # all runs in a dataframe
transform.latest_run.report       # report of latest run
transform.latest_run.environment  # environment of latest run

You can use the CLI to load a transform into your current (development) directory:

lamin load --key my_analyses/my_notebook.ipynb

If your instance is connected to LaminHub, you can search or filter the transform page and explore data lineage:

Here is how you’d load the notebook from the video into your local directory:

lamin load https://lamin.ai/laminlabs/lamindata/transform/F4L3oC6QsZvQ

Organize local development

If no development directory is set, script & notebooks keys equal their filenames. Otherwise, script & notebooks keys equal the relative path in the development directory.

To set the development directory to your current shell development directory, run:

lamin settings set dev-dir .

You can see the current status by running:

lamin info

Use projects

You can link the entities created during a run to a project.

import lamindb as ln

my_project = ln.Project(name="My project").save()  # create & save a project
ln.track(project="My project")  # pass project
open("sample.fasta", "w").write(">seq1\nACGT\n")  # create a dataset
ln.Artifact("sample.fasta", key="sample.fasta").save()  # auto-labeled by project
Hide code cell output
 connected lamindb: testuser1/test-track
 created Transform('q6LXZ4wdmQUK0000', key='track.ipynb'), started new Run('KZoURayURTqhBFxn') at 2026-01-11 16:51:28 UTC
 notebook imports: lamindb==2.0.0
 recommendation: to identify the notebook across renames, pass the uid: ln.track("q6LXZ4wdmQUK", project="My project")
Artifact(uid='ILtcisMgKrQLnL2f0000', version_tag=None, is_latest=True, key='sample.fasta', description=None, suffix='.fasta', kind=None, otype=None, size=11, hash='83rEPcAoBHmYiIuyBYrFKg', n_files=None, n_observations=None, branch_id=1, space_id=1, storage_id=3, run_id=1, schema_id=None, created_by_id=3, created_at=2026-01-11 16:51:29 UTC, is_locked=False)

Filter entities by project, e.g., artifacts:

ln.Artifact.filter(projects=my_project).to_dataframe()
Hide code cell output
uid key description suffix kind otype size hash n_files n_observations version_tag is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
1 ILtcisMgKrQLnL2f0000 sample.fasta None .fasta None None 11 83rEPcAoBHmYiIuyBYrFKg None None None True False 2026-01-11 16:51:29.815000+00:00 1 1 3 1 None 3

Access entities linked to a project:

my_project.artifacts.to_dataframe()
Hide code cell output
uid key description suffix kind otype size hash n_files n_observations version_tag is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
1 ILtcisMgKrQLnL2f0000 sample.fasta None .fasta None None 11 83rEPcAoBHmYiIuyBYrFKg None None None True False 2026-01-11 16:51:29.815000+00:00 1 1 3 1 None 3

The same works for my_project.transforms or my_project.runs.

Use spaces

You can write the entities created during a run into a space that you configure on LaminHub. This is particularly useful if you want to restrict access to a space. Note that this doesn’t affect bionty entities who should typically be commonly accessible.

ln.track(space="Our team space")

Sync code with git

To sync scripts or workflows with their correponding files in a git repo, either export an environment variable:

export LAMINDB_SYNC_GIT_REPO = <YOUR-GIT-REPO-URL>

Or set the following setting:

ln.settings.sync_git_repo = <YOUR-GIT-REPO-URL>

If you work on a single project in your lamindb instance, it makes sense to set LaminDB’s dev-dir to the root of the local git repo clone. If you work on multiple projects in your lamindb instance, you can use the dev-dir as the local root and nest git repositories in it.

Manage workflows

Here we’ll manage workflows with lamindb’s flow() and step() decorators, which works out-of-the-box with the majority of Python workflow managers:

tool

workflow decorator

step/task decorator

notes

lamindb

@flow

@step

inspired by prefect

prefect

@flow

@task

two decorators

redun

@task (on main)

@task

single decorator for everything

dagster

@job or @asset

@op or @asset

asset-centric; @asset is primary

flyte

@workflow

@task

also @dynamic for runtime DAGs

airflow

@dag

@task

TaskFlow API (modern); also supports operators

zenml

@pipeline

@step

inspired by prefect

If you’re looking for more in-depth examples or for integrating with non-decorator-based workflow managers such as Nextflow or Snakemake, see Manage computational workflows.

tool

workflow

step/task

notes

nextflow

workflow keyword

process keyword

groovy-based DSL

snakemake

rule keyword

rule keyword

file-based DSL

metaflow

FlowSpec

@step

class-based

kedro

Pipeline()

node()

function-based

A one-step workflow

Decorate a function with flow() to track it as a workflow:

my_workflow.py
import lamindb as ln


@ln.flow()
def ingest_dataset(key: str) -> ln.Artifact:
    df = ln.examples.datasets.mini_immuno.get_dataset1()
    artifact = ln.Artifact.from_dataframe(df, key=key).save()
    return artifact


if __name__ == "__main__":
    ingest_dataset(key="my_analysis/dataset.parquet")

Let’s run the workflow:

!python scripts/my_workflow.py
Hide code cell output
 connected lamindb: testuser1/test-track
 writing the in-memory object into cache

Query the workflow via its filename:

transform = ln.Transform.get(key="my_workflow.py")
transform.describe()
Hide code cell output
Transform: my_workflow.py (0000)
├── uid: KEWEf1mBXvzJ0000                                     
hash: uJ3fsnfaNN6EZ7Q0d8SQtw         type: function       
branch: main                         space: all           
created_at: 2026-01-11 16:51:32 UTC  created_by: testuser1
└── source_code: 
    import lamindb as ln
    
    
    @ln.flow()
    def ingest_dataset(key: str) -> ln.Artifact:
        df = ln.examples.datasets.mini_immuno.get_dataset1()
        artifact = ln.Artifact.from_dataframe(df, key=key).save()
        return artifact
    
    
    if __name__ == "__main__":
        ingest_dataset(key="my_analysis/dataset.parquet")

The run stored the parameter value for key:

transform.latest_run.describe()
Hide code cell output
Run: mbm9WG0 (my_workflow.py)
├── uid: mbm9WG0yq3SqvPxO                transform: my_workflow.py (0000)    
started_at: 2026-01-11 16:51:32 UTC  finished_at: 2026-01-11 16:51:32 UTC
status: completed                                                        
branch: main                         space: all                          
created_at: 2026-01-11 16:51:32 UTC  created_by: testuser1               
└── Params
    └── key: my_analysis/dataset.parquet

It links output artifacts:

transform.latest_run.output_artifacts.to_dataframe()
Hide code cell output
uid key description suffix kind otype size hash n_files n_observations version_tag is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
2 5FRVoeDL5C2VKVkF0000 my_analysis/dataset.parquet None .parquet dataset DataFrame 10354 ug6ICnjB8oyqescoUDbYKg None 3 None True False 2026-01-11 16:51:32.631000+00:00 1 1 3 2 None 3

You can query for all runs that ran with that parameter:

ln.Run.filter(
    params__key="my_analysis/dataset.parquet",
).to_dataframe()
Hide code cell output
uid name entrypoint started_at finished_at params reference reference_type cli_args is_locked created_at branch_id space_id transform_id report_id environment_id created_by_id initiated_by_run_id
id
2 mbm9WG0yq3SqvPxO None ingest_dataset 2026-01-11 16:51:32.597440+00:00 2026-01-11 16:51:32.636916+00:00 {'key': 'my_analysis/dataset.parquet'} None None None False 2026-01-11 16:51:32.598000+00:00 1 1 2 None None 3 None

You can also pass complex parameters and features, see: Track parameters & features.

A multi-step workflow

Here, the workflow calls an additional processing step:

my_workflow_with_step.py
import lamindb as ln


@ln.step()
def subset_dataframe(
    artifact: ln.Artifact,
    subset_rows: int = 2,
    subset_cols: int = 2,
) -> ln.Artifact:
    df = artifact.load()
    new_data = df.iloc[:subset_rows, :subset_cols]
    new_key = artifact.key.replace(".parquet", "_subsetted.parquet")
    return ln.Artifact.from_dataframe(new_data, key=new_key).save()


@ln.flow()
def ingest_dataset(key: str, subset: bool = False) -> ln.Artifact:
    df = ln.examples.datasets.mini_immuno.get_dataset1()
    artifact = ln.Artifact.from_dataframe(df, key=key).save()
    if subset:
        artifact = subset_dataframe(artifact)
    return artifact


if __name__ == "__main__":
    ingest_dataset(key="my_analysis/dataset.parquet", subset=True)

Let’s run the workflow:

!python scripts/my_workflow_with_step.py
Hide code cell output
 connected lamindb: testuser1/test-track
 writing the in-memory object into cache
 returning artifact with same hash: Artifact(uid='5FRVoeDL5C2VKVkF0000', version_tag=None, is_latest=True, key='my_analysis/dataset.parquet', description=None, suffix='.parquet', kind='dataset', otype='DataFrame', size=10354, hash='ug6ICnjB8oyqescoUDbYKg', n_files=None, n_observations=3, branch_id=1, space_id=1, storage_id=3, run_id=2, schema_id=None, created_by_id=3, created_at=2026-01-11 16:51:32 UTC, is_locked=False); to track this artifact as an input, use: ln.Artifact.get()
 writing the in-memory object into cache

The lineage of the subsetted artifact resolves the subsetting step:

subsetted_artifact = ln.Artifact.get(key="my_analysis/dataset_subsetted.parquet")
subsetted_artifact.view_lineage()
_images/f4d4d789aac4307f15d59ba48032498e45bfe8d51d0c63128783b1ef77306873.svg

This is the run that created the subsetted_artifact:

subsetted_artifact.run
Hide code cell output
Run(uid='uAkQ4fHv89BC8ubV', name=None, entrypoint='subset_dataframe', started_at=2026-01-11 16:51:35 UTC, finished_at=2026-01-11 16:51:35 UTC, params={'artifact': 'Artifact[5FRVoeDL5C2VKVkF0000]', 'subset_rows': 2, 'subset_cols': 2}, reference=None, reference_type=None, cli_args=None, branch_id=1, space_id=1, transform_id=3, report_id=None, environment_id=None, created_by_id=3, initiated_by_run_id=3, created_at=2026-01-11 16:51:35 UTC, is_locked=False)

This is the initating run that triggered the function call:

subsetted_artifact.run.initiated_by_run
Hide code cell output
Run(uid='UMXUPL4pycKqDsO4', name=None, entrypoint='ingest_dataset', started_at=2026-01-11 16:51:35 UTC, finished_at=2026-01-11 16:51:35 UTC, params={'key': 'my_analysis/dataset.parquet', 'subset': True}, reference=None, reference_type=None, cli_args=None, branch_id=1, space_id=1, transform_id=3, report_id=None, environment_id=None, created_by_id=3, initiated_by_run_id=None, created_at=2026-01-11 16:51:35 UTC, is_locked=False)

These are the parameters of the run:

subsetted_artifact.run.params
Hide code cell output
{'artifact': 'Artifact[5FRVoeDL5C2VKVkF0000]',
 'subset_rows': 2,
 'subset_cols': 2}

These are the input artifacts:

subsetted_artifact.run.input_artifacts.to_dataframe()
Hide code cell output
uid key description suffix kind otype size hash n_files n_observations version_tag is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
2 5FRVoeDL5C2VKVkF0000 my_analysis/dataset.parquet None .parquet dataset DataFrame 10354 ug6ICnjB8oyqescoUDbYKg None 3 None True False 2026-01-11 16:51:32.631000+00:00 1 1 3 2 None 3

These are output artifacts:

subsetted_artifact.run.output_artifacts.to_dataframe()
Hide code cell output
uid key description suffix kind otype size hash n_files n_observations version_tag is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
3 Rcepg67lvLkg6mvw0000 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 3696 siPfGX_YztG7sm3oNnHRUw None 2 None True False 2026-01-11 16:51:35.952000+00:00 1 1 3 4 None 3

A workflow with CLI arguments

Let’s use click to parse CLI arguments:

my_workflow_with_click.py
import click
import lamindb as ln


@click.command()
@click.option("--key", required=True)
@ln.flow()
def main(key: str):
    df = ln.examples.datasets.mini_immuno.get_dataset2()
    ln.Artifact.from_dataframe(df, key=key).save()


if __name__ == "__main__":
    main()

Let’s run the workflow:

!python scripts/my_workflow_with_click.py --key my_analysis/dataset2.parquet
Hide code cell output
 connected lamindb: testuser1/test-track
 function invoked with: --key my_analysis/dataset2.parquet
 writing the in-memory object into cache

CLI arguments are tracked and accessible via run.cli_args:

run = ln.Run.filter(transform__key="my_workflow_with_click.py").first()
run.describe()
Hide code cell output
Run: tcHmUzh (my_workflow_with_click.py)
├── uid: tcHmUzhvymiKz809                transform: my_workflow_with_click.py (0000)
started_at: 2026-01-11 16:51:39 UTC  finished_at: 2026-01-11 16:51:39 UTC       
status: completed                                                               
branch: main                         space: all                                 
created_at: 2026-01-11 16:51:39 UTC  created_by: testuser1                      
├── cli_args: 
--key my_analysis/dataset2.parquet
└── Params
    └── key: my_analysis/dataset2.parquet

Note that it doesn’t matter whether you use click, argparse, or any other CLI argument parser.

Track parameters & features

We just saw that the function decorators @ln.flow() and @ln.step() track parameter values automatically. Here is how to pass parameters to ln.track():

run_track_with_params.py
import argparse
import lamindb as ln

if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("--input-dir", type=str)
    p.add_argument("--downsample", action="store_true")
    p.add_argument("--learning-rate", type=float)
    args = p.parse_args()
    params = {
        "input_dir": args.input_dir,
        "learning_rate": args.learning_rate,
        "preprocess_params": {
            "downsample": args.downsample,
            "normalization": "the_good_one",
        },
    }
    ln.track(params=params)

    # your code

    ln.finish()

Run the script.

!python scripts/run_track_with_params.py  --input-dir ./mydataset --learning-rate 0.01 --downsample
Hide code cell output
 connected lamindb: testuser1/test-track
 script invoked with: --input-dir ./mydataset --learning-rate 0.01 --downsample
 created Transform('tSK1x4dkrgeK0000', key='run_track_with_params.py'), started new Run('BYdnsLN8fbxnmqzY') at 2026-01-11 16:51:43 UTC
→ params: input_dir='./mydataset', learning_rate=0.01, preprocess_params={'downsample': True, 'normalization': 'the_good_one'}
 recommendation: to identify the script across renames, pass the uid: ln.track("tSK1x4dkrgeK", params={...})

Query for all runs that match certain parameters:

ln.Run.filter(
    params__learning_rate=0.01,
    params__preprocess_params__downsample=True,
).to_dataframe()
Hide code cell output
uid name entrypoint started_at finished_at params reference reference_type cli_args is_locked created_at branch_id space_id transform_id report_id environment_id created_by_id initiated_by_run_id
id
6 BYdnsLN8fbxnmqzY None None 2026-01-11 16:51:43.003616+00:00 2026-01-11 16:51:44.277816+00:00 {'input_dir': './mydataset', 'learning_rate': ... None None --input-dir ./mydataset --learning-rate 0.01 -... False 2026-01-11 16:51:43.005000+00:00 1 1 5 6 5 3 None

Describe & get parameters:

run = ln.Run.filter(params__learning_rate=0.01).order_by("-started_at").first()
run.describe()
run.params
Hide code cell output
Run: BYdnsLN (run_track_with_params.py)
├── uid: BYdnsLN8fbxnmqzY                transform: run_track_with_params.py (0000)
started_at: 2026-01-11 16:51:43 UTC  finished_at: 2026-01-11 16:51:44 UTC      
status: completed                                                              
branch: main                         space: all                                
created_at: 2026-01-11 16:51:43 UTC  created_by: testuser1                     
├── cli_args: 
--input-dir ./mydataset --learning-rate 0.01 --downsample
├── report: TxWGJZQ
→ connected lamindb: testuser1/test-track
→ created Transform('tSK1x4dkrgeK0000', key='run_track_with_params.py'), started …
→ params: input_dir='./mydataset', learning_rate=0.01, preprocess_params={'downs …
• recommendation: to identify the script across renames, pass the uid: ln.track( …
├── environment: RXBPxdv
aiobotocore==2.26.0
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aioitertools==0.13.0
│ …
└── Params
    ├── input_dir: ./mydataset
    ├── learning_rate: 0.01
    └── preprocess_params: {'downsample': True, 'normalization': 'the_good_one'}
{'input_dir': './mydataset',
 'learning_rate': 0.01,
 'preprocess_params': {'downsample': True, 'normalization': 'the_good_one'}}

You can also access the CLI arguments used to start the run directly:

run.cli_args
Hide code cell output
'--input-dir ./mydataset --learning-rate 0.01 --downsample'

You can also track run features in analogy to artifact features.

In contrast to params, features are validated against the Feature registry and allow to express relationships with entities in your registries.

Let’s first define labels & features.

experiment_type = ln.Record(name="Experiment", is_type=True).save()
experiment_label = ln.Record(name="Experiment1", type=experiment_type).save()
ln.Feature(name="s3_folder", dtype=str).save()
ln.Feature(name="experiment", dtype=experiment_type).save()
Hide code cell output
Feature(uid='qROiuCxVAC4l', is_type=False, name='experiment', _dtype_str='cat[Record[gCBfZS8DCWm1fNCi]]', unit=None, description=None, array_rank=0, array_size=0, array_shape=None, synonyms=None, default_value=None, nullable=True, coerce=None, branch_id=1, space_id=1, created_by_id=3, run_id=1, type_id=None, created_at=2026-01-11 16:51:44 UTC, is_locked=False)
!python scripts/run_track_with_features_and_params.py  --s3-folder s3://my-bucket/my-folder --experiment Experiment1
Hide code cell output
 connected lamindb: testuser1/test-track
 script invoked with: --s3-folder s3://my-bucket/my-folder --experiment Experiment1
 created Transform('6iwyj2StZOCd0000', key='run_track_with_features_and_params.py'), started new Run('adBQ8OI3bySmpFYL') at 2026-01-11 16:51:47 UTC
→ params: example_param=42
→ features: s3_folder='s3://my-bucket/my-folder', experiment='Experiment1'
 recommendation: to identify the script across renames, pass the uid: ln.track("6iwyj2StZOCd", params={...})
ln.Run.filter(s3_folder="s3://my-bucket/my-folder").to_dataframe()
Hide code cell output
uid name entrypoint started_at finished_at params reference reference_type cli_args is_locked created_at branch_id space_id transform_id report_id environment_id created_by_id initiated_by_run_id
id
7 adBQ8OI3bySmpFYL None None 2026-01-11 16:51:47.789396+00:00 2026-01-11 16:51:49.051728+00:00 {'example_param': 42} None None --s3-folder s3://my-bucket/my-folder --experim... False 2026-01-11 16:51:47.790000+00:00 1 1 6 7 5 3 None

Describe & get feature values.

run2 = ln.Run.filter(
    s3_folder="s3://my-bucket/my-folder", experiment="Experiment1"
).last()
run2.describe()
run2.features.get_values()
Hide code cell output
Run: adBQ8OI (run_track_with_features_and_params.py)
├── uid: adBQ8OI3bySmpFYL                transform: run_track_with_features_and_params.py (0000)
started_at: 2026-01-11 16:51:47 UTC  finished_at: 2026-01-11 16:51:49 UTC                   
status: completed                                                                           
branch: main                         space: all                                             
created_at: 2026-01-11 16:51:47 UTC  created_by: testuser1                                  
├── cli_args: 
--s3-folder s3://my-bucket/my-folder --experiment Experiment1
├── report: W55lyuB
→ connected lamindb: testuser1/test-track
→ created Transform('6iwyj2StZOCd0000', key='run_track_with_features_and_params. …
→ params: example_param=42
→ features: s3_folder='s3://my-bucket/my-folder', experiment='Experiment1'
│ …
├── environment: RXBPxdv
aiobotocore==2.26.0
aiohappyeyeballs==2.6.1
aiohttp==3.13.3
aioitertools==0.13.0
│ …
├── Params
│   └── example_param: 42
└── Features
    └── experiment                      Record[Experiment]                 Experiment1                             
        s3_folder                       str                                s3://my-bucket/my-folder                
{'experiment': 'Experiment1', 's3_folder': 's3://my-bucket/my-folder'}

Manage functions in scripts and notebooks

If you want more-fined-grained data lineage tracking in a script or notebook where you called ln.track(), you can also use the step() decorator.

In a notebook

@ln.step()
def subset_dataframe(
    input_artifact_key: str,
    output_artifact_key: str,
    subset_rows: int = 2,
    subset_cols: int = 2,
) -> None:
    artifact = ln.Artifact.get(key=input_artifact_key)
    dataset = artifact.load()
    new_data = dataset.iloc[:subset_rows, :subset_cols]
    ln.Artifact.from_dataframe(new_data, key=output_artifact_key).save()

Prepare a test dataset:

df = ln.examples.datasets.mini_immuno.get_dataset1(otype="DataFrame")
input_artifact_key = "my_analysis/dataset.parquet"
artifact = ln.Artifact.from_dataframe(df, key=input_artifact_key).save()
 writing the in-memory object into cache
 returning artifact with same hash: Artifact(uid='5FRVoeDL5C2VKVkF0000', version_tag=None, is_latest=True, key='my_analysis/dataset.parquet', description=None, suffix='.parquet', kind='dataset', otype='DataFrame', size=10354, hash='ug6ICnjB8oyqescoUDbYKg', n_files=None, n_observations=3, branch_id=1, space_id=1, storage_id=3, run_id=2, schema_id=None, created_by_id=3, created_at=2026-01-11 16:51:32 UTC, is_locked=False); to track this artifact as an input, use: ln.Artifact.get()

Run the function with default params:

ouput_artifact_key = input_artifact_key.replace(".parquet", "_subsetted.parquet")
subset_dataframe(input_artifact_key, ouput_artifact_key, subset_rows=1)
Hide code cell output
 no source code was yet saved, returning existing transform with same key
 writing the in-memory object into cache
 creating new artifact version for key 'my_analysis/dataset_subsetted.parquet' in storage '/home/runner/work/lamindb/lamindb/docs/test-track'

Query for the output:

subsetted_artifact = ln.Artifact.get(key=ouput_artifact_key)
subsetted_artifact.view_lineage()
_images/a581cf7af3e80d300fdc6d530ed610584c643c6146ef745183a4e05deb36a68f.svg

Re-run the function with a different parameter:

subsetted_artifact = subset_dataframe(
    input_artifact_key, ouput_artifact_key, subset_cols=3
)
subsetted_artifact = ln.Artifact.get(key=ouput_artifact_key)
subsetted_artifact.view_lineage()
Hide code cell output
 no source code was yet saved, returning existing transform with same key
 writing the in-memory object into cache
 creating new artifact version for key 'my_analysis/dataset_subsetted.parquet' in storage '/home/runner/work/lamindb/lamindb/docs/test-track'
_images/9634da82d2c2172b65b5e6d0edcdf0110e1b7d6843dd44d4e6b0d430612e1c67.svg

We created a new run:

subsetted_artifact.run
Hide code cell output
Run(uid='yipYco04A0jqCnE1', name=None, entrypoint='subset_dataframe', started_at=2026-01-11 16:51:49 UTC, finished_at=2026-01-11 16:51:49 UTC, params={'input_artifact_key': 'my_analysis/dataset.parquet', 'output_artifact_key': 'my_analysis/dataset_subsetted.parquet', 'subset_rows': 2, 'subset_cols': 3}, reference=None, reference_type=None, cli_args=None, branch_id=1, space_id=1, transform_id=1, report_id=None, environment_id=None, created_by_id=3, initiated_by_run_id=1, created_at=2026-01-11 16:51:49 UTC, is_locked=False)

With new parameters:

subsetted_artifact.run.params
Hide code cell output
{'input_artifact_key': 'my_analysis/dataset.parquet',
 'output_artifact_key': 'my_analysis/dataset_subsetted.parquet',
 'subset_rows': 2,
 'subset_cols': 3}

And a new version of the output artifact:

subsetted_artifact.run.output_artifacts.to_dataframe()
Hide code cell output
uid key description suffix kind otype size hash n_files n_observations version_tag is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
9 Rcepg67lvLkg6mvw0002 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 4314 L3pK_0XXK30OIkqCY2_H9w None 2 None True False 2026-01-11 16:51:49.816000+00:00 1 1 3 9 None 3

In a script

run_script_with_step.py
import argparse
import lamindb as ln


@ln.step()
def subset_dataframe(
    artifact: ln.Artifact,
    subset_rows: int = 2,
    subset_cols: int = 2,
    run: ln.Run | None = None,
) -> ln.Artifact:
    dataset = artifact.load(is_run_input=run)
    new_data = dataset.iloc[:subset_rows, :subset_cols]
    new_key = artifact.key.replace(".parquet", "_subsetted.parquet")
    return ln.Artifact.from_dataframe(new_data, key=new_key, run=run).save()


if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("--subset", action="store_true")
    args = p.parse_args()

    params = {"is_subset": args.subset}

    ln.track(params=params)

    if args.subset:
        df = ln.examples.datasets.mini_immuno.get_dataset1(otype="DataFrame")
        artifact = ln.Artifact.from_dataframe(
            df, key="my_analysis/dataset.parquet"
        ).save()
        subsetted_artifact = subset_dataframe(artifact)

    ln.finish()
!python scripts/run_script_with_step.py --subset
Hide code cell output
 connected lamindb: testuser1/test-track
 script invoked with: --subset
 created Transform('RzirVK1pH8r20000', key='run_script_with_step.py'), started new Run('zEHyxFhHtRZHy0lv') at 2026-01-11 16:51:52 UTC
→ params: is_subset=True
 recommendation: to identify the script across renames, pass the uid: ln.track("RzirVK1pH8r2", params={...})
 writing the in-memory object into cache
 returning artifact with same hash: Artifact(uid='5FRVoeDL5C2VKVkF0000', version_tag=None, is_latest=True, key='my_analysis/dataset.parquet', description=None, suffix='.parquet', kind='dataset', otype='DataFrame', size=10354, hash='ug6ICnjB8oyqescoUDbYKg', n_files=None, n_observations=3, branch_id=1, space_id=1, storage_id=3, run_id=2, schema_id=None, created_by_id=3, created_at=2026-01-11 16:51:32 UTC, is_locked=False); to track this artifact as an input, use: ln.Artifact.get()
! cannot infer feature type of: None, returning '?
! skipping param run because dtype not JSON serializable
 writing the in-memory object into cache
 returning artifact with same hash: Artifact(uid='Rcepg67lvLkg6mvw0000', version_tag=None, is_latest=False, key='my_analysis/dataset_subsetted.parquet', description=None, suffix='.parquet', kind='dataset', otype='DataFrame', size=3696, hash='siPfGX_YztG7sm3oNnHRUw', n_files=None, n_observations=2, branch_id=1, space_id=1, storage_id=3, run_id=4, schema_id=None, created_by_id=3, created_at=2026-01-11 16:51:35 UTC, is_locked=False); to track this artifact as an input, use: ln.Artifact.get()
ln.view()
Hide code cell output
Artifact
uid key description suffix kind otype size hash n_files n_observations version_tag is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
9 Rcepg67lvLkg6mvw0002 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 4314 L3pK_0XXK30OIkqCY2_H9w None 2.0 None True False 2026-01-11 16:51:49.816000+00:00 1 1 3 9 None 3
8 Rcepg67lvLkg6mvw0001 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 3669 jd7m1cdbwUoMxQ1uNl6Yqg None 1.0 None False False 2026-01-11 16:51:49.714000+00:00 1 1 3 8 None 3
4 Q8JCbaQk8XzaXlcq0000 my_analysis/dataset2.parquet None .parquet dataset DataFrame 7054 TKr2x4wAyx8b0h5UhlD3Ww None 3.0 None True False 2026-01-11 16:51:39.612000+00:00 1 1 3 5 None 3
3 Rcepg67lvLkg6mvw0000 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 3696 siPfGX_YztG7sm3oNnHRUw None 2.0 None False False 2026-01-11 16:51:35.952000+00:00 1 1 3 4 None 3
2 5FRVoeDL5C2VKVkF0000 my_analysis/dataset.parquet None .parquet dataset DataFrame 10354 ug6ICnjB8oyqescoUDbYKg None 3.0 None True False 2026-01-11 16:51:32.631000+00:00 1 1 3 2 None 3
1 ILtcisMgKrQLnL2f0000 sample.fasta None .fasta None None 11 83rEPcAoBHmYiIuyBYrFKg None NaN None True False 2026-01-11 16:51:29.815000+00:00 1 1 3 1 None 3
Feature
uid name _dtype_str unit description array_rank array_size array_shape synonyms default_value nullable coerce is_locked is_type created_at branch_id space_id created_by_id run_id type_id
id
2 qROiuCxVAC4l experiment cat[Record[gCBfZS8DCWm1fNCi]] None None 0 0 None None None True None False False 2026-01-11 16:51:44.905000+00:00 1 1 3 1 None
1 B7xtMkWGGfnx s3_folder str None None 0 0 None None None True None False False 2026-01-11 16:51:44.894000+00:00 1 1 3 1 None
JsonValue
value hash is_locked created_at branch_id space_id created_by_id run_id feature_id
id
1 s3://my-bucket/my-folder E-3iWq1AziFBjh_cbyr5ZA False 2026-01-11 16:51:47.809000+00:00 1 1 3 None 1
Project
uid name description abbr url start_date end_date is_locked is_type created_at branch_id space_id created_by_id run_id type_id
id
1 CE54TthQIfx6 My project None None None None None False False 2026-01-11 16:51:27.107000+00:00 1 1 3 None None
Record
uid name description reference reference_type extra_data is_locked is_type created_at branch_id space_id created_by_id type_id schema_id run_id
id
2 kGfcSpd98LBG8t9z Experiment1 None None None None False False 2026-01-11 16:51:44.886000+00:00 1 1 3 1.0 None 1
1 gCBfZS8DCWm1fNCi Experiment None None None None False True 2026-01-11 16:51:44.879000+00:00 1 1 3 NaN None 1
Run
uid name entrypoint started_at finished_at params reference reference_type cli_args is_locked created_at branch_id space_id transform_id report_id environment_id created_by_id initiated_by_run_id
id
11 YULKKTaQqsUcCIM4 None subset_dataframe 2026-01-11 16:51:54.043153+00:00 2026-01-11 16:51:54.063901+00:00 {'artifact': 'Artifact[5FRVoeDL5C2VKVkF0000]',... None None None False 2026-01-11 16:51:54.044000+00:00 1 1 7 NaN NaN 3 10.0
10 zEHyxFhHtRZHy0lv None None 2026-01-11 16:51:52.797743+00:00 2026-01-11 16:51:54.065817+00:00 {'is_subset': True} None None --subset False 2026-01-11 16:51:52.799000+00:00 1 1 7 10.0 5.0 3 NaN
9 yipYco04A0jqCnE1 None subset_dataframe 2026-01-11 16:51:49.791539+00:00 2026-01-11 16:51:49.825602+00:00 {'input_artifact_key': 'my_analysis/dataset.pa... None None None False 2026-01-11 16:51:49.792000+00:00 1 1 1 NaN NaN 3 1.0
8 nMy6H0v0fSYDCQ2I None subset_dataframe 2026-01-11 16:51:49.687690+00:00 2026-01-11 16:51:49.723452+00:00 {'input_artifact_key': 'my_analysis/dataset.pa... None None None False 2026-01-11 16:51:49.688000+00:00 1 1 1 NaN NaN 3 1.0
7 adBQ8OI3bySmpFYL None None 2026-01-11 16:51:47.789396+00:00 2026-01-11 16:51:49.051728+00:00 {'example_param': 42} None None --s3-folder s3://my-bucket/my-folder --experim... False 2026-01-11 16:51:47.790000+00:00 1 1 6 7.0 5.0 3 NaN
6 BYdnsLN8fbxnmqzY None None 2026-01-11 16:51:43.003616+00:00 2026-01-11 16:51:44.277816+00:00 {'input_dir': './mydataset', 'learning_rate': ... None None --input-dir ./mydataset --learning-rate 0.01 -... False 2026-01-11 16:51:43.005000+00:00 1 1 5 6.0 5.0 3 NaN
5 tcHmUzhvymiKz809 None main 2026-01-11 16:51:39.589319+00:00 2026-01-11 16:51:39.617383+00:00 {'key': 'my_analysis/dataset2.parquet'} None None --key my_analysis/dataset2.parquet False 2026-01-11 16:51:39.590000+00:00 1 1 4 NaN NaN 3 NaN
Storage
uid root description type region instance_uid is_locked created_at branch_id space_id created_by_id run_id
id
3 xN7ZLE2ZRDkY /home/runner/work/lamindb/lamindb/docs/test-track None local None 73KPGC58ahU9 False 2026-01-11 16:51:23.416000+00:00 1 1 3 None
Transform
uid key description kind source_code hash reference reference_type version_tag is_latest is_locked created_at branch_id space_id environment_id created_by_id
id
7 RzirVK1pH8r20000 run_script_with_step.py None script import argparse\nimport lamindb as ln\n\n\n@ln... HJbjZyWWczP-VmzKQsSORg None None None True False 2026-01-11 16:51:52.795000+00:00 1 1 None 3
6 6iwyj2StZOCd0000 run_track_with_features_and_params.py None script import argparse\nimport lamindb as ln\n\n\nif ... 9MjLyvM1QzE2nPIPDRzBwg None None None True False 2026-01-11 16:51:47.786000+00:00 1 1 None 3
5 tSK1x4dkrgeK0000 run_track_with_params.py None script import argparse\nimport lamindb as ln\n\nif __... 5RBz7zJICeKE1OSmg7gEdQ None None None True False 2026-01-11 16:51:43.001000+00:00 1 1 None 3
4 OJMwOrSY23iQ0000 my_workflow_with_click.py None function import click\nimport lamindb as ln\n\n\n@click... 0eX8wmaAWkuuAvACWwL1Xg None None None True False 2026-01-11 16:51:39.578000+00:00 1 1 None 3
3 VWauimLMV0g50000 my_workflow_with_step.py None function import lamindb as ln\n\n\[email protected]()\ndef subs... Ncx6UswxtCN3FZD86kgcVQ None None None True False 2026-01-11 16:51:35.899000+00:00 1 1 None 3
2 KEWEf1mBXvzJ0000 my_workflow.py None function import lamindb as ln\n\n\[email protected]()\ndef inge... uJ3fsnfaNN6EZ7Q0d8SQtw None None None True False 2026-01-11 16:51:32.593000+00:00 1 1 None 3
1 q6LXZ4wdmQUK0000 track.ipynb Manage notebooks, scripts & workflows notebook None None None None None True False 2026-01-11 16:51:28.169000+00:00 1 1 None 3

The database

See the state of the database after we ran these different examples:

ln.view()
Hide code cell output
Artifact
uid key description suffix kind otype size hash n_files n_observations version_tag is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
9 Rcepg67lvLkg6mvw0002 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 4314 L3pK_0XXK30OIkqCY2_H9w None 2.0 None True False 2026-01-11 16:51:49.816000+00:00 1 1 3 9 None 3
8 Rcepg67lvLkg6mvw0001 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 3669 jd7m1cdbwUoMxQ1uNl6Yqg None 1.0 None False False 2026-01-11 16:51:49.714000+00:00 1 1 3 8 None 3
4 Q8JCbaQk8XzaXlcq0000 my_analysis/dataset2.parquet None .parquet dataset DataFrame 7054 TKr2x4wAyx8b0h5UhlD3Ww None 3.0 None True False 2026-01-11 16:51:39.612000+00:00 1 1 3 5 None 3
3 Rcepg67lvLkg6mvw0000 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 3696 siPfGX_YztG7sm3oNnHRUw None 2.0 None False False 2026-01-11 16:51:35.952000+00:00 1 1 3 4 None 3
2 5FRVoeDL5C2VKVkF0000 my_analysis/dataset.parquet None .parquet dataset DataFrame 10354 ug6ICnjB8oyqescoUDbYKg None 3.0 None True False 2026-01-11 16:51:32.631000+00:00 1 1 3 2 None 3
1 ILtcisMgKrQLnL2f0000 sample.fasta None .fasta None None 11 83rEPcAoBHmYiIuyBYrFKg None NaN None True False 2026-01-11 16:51:29.815000+00:00 1 1 3 1 None 3
Feature
uid name _dtype_str unit description array_rank array_size array_shape synonyms default_value nullable coerce is_locked is_type created_at branch_id space_id created_by_id run_id type_id
id
2 qROiuCxVAC4l experiment cat[Record[gCBfZS8DCWm1fNCi]] None None 0 0 None None None True None False False 2026-01-11 16:51:44.905000+00:00 1 1 3 1 None
1 B7xtMkWGGfnx s3_folder str None None 0 0 None None None True None False False 2026-01-11 16:51:44.894000+00:00 1 1 3 1 None
JsonValue
value hash is_locked created_at branch_id space_id created_by_id run_id feature_id
id
1 s3://my-bucket/my-folder E-3iWq1AziFBjh_cbyr5ZA False 2026-01-11 16:51:47.809000+00:00 1 1 3 None 1
Project
uid name description abbr url start_date end_date is_locked is_type created_at branch_id space_id created_by_id run_id type_id
id
1 CE54TthQIfx6 My project None None None None None False False 2026-01-11 16:51:27.107000+00:00 1 1 3 None None
Record
uid name description reference reference_type extra_data is_locked is_type created_at branch_id space_id created_by_id type_id schema_id run_id
id
2 kGfcSpd98LBG8t9z Experiment1 None None None None False False 2026-01-11 16:51:44.886000+00:00 1 1 3 1.0 None 1
1 gCBfZS8DCWm1fNCi Experiment None None None None False True 2026-01-11 16:51:44.879000+00:00 1 1 3 NaN None 1
Run
uid name entrypoint started_at finished_at params reference reference_type cli_args is_locked created_at branch_id space_id transform_id report_id environment_id created_by_id initiated_by_run_id
id
11 YULKKTaQqsUcCIM4 None subset_dataframe 2026-01-11 16:51:54.043153+00:00 2026-01-11 16:51:54.063901+00:00 {'artifact': 'Artifact[5FRVoeDL5C2VKVkF0000]',... None None None False 2026-01-11 16:51:54.044000+00:00 1 1 7 NaN NaN 3 10.0
10 zEHyxFhHtRZHy0lv None None 2026-01-11 16:51:52.797743+00:00 2026-01-11 16:51:54.065817+00:00 {'is_subset': True} None None --subset False 2026-01-11 16:51:52.799000+00:00 1 1 7 10.0 5.0 3 NaN
9 yipYco04A0jqCnE1 None subset_dataframe 2026-01-11 16:51:49.791539+00:00 2026-01-11 16:51:49.825602+00:00 {'input_artifact_key': 'my_analysis/dataset.pa... None None None False 2026-01-11 16:51:49.792000+00:00 1 1 1 NaN NaN 3 1.0
8 nMy6H0v0fSYDCQ2I None subset_dataframe 2026-01-11 16:51:49.687690+00:00 2026-01-11 16:51:49.723452+00:00 {'input_artifact_key': 'my_analysis/dataset.pa... None None None False 2026-01-11 16:51:49.688000+00:00 1 1 1 NaN NaN 3 1.0
7 adBQ8OI3bySmpFYL None None 2026-01-11 16:51:47.789396+00:00 2026-01-11 16:51:49.051728+00:00 {'example_param': 42} None None --s3-folder s3://my-bucket/my-folder --experim... False 2026-01-11 16:51:47.790000+00:00 1 1 6 7.0 5.0 3 NaN
6 BYdnsLN8fbxnmqzY None None 2026-01-11 16:51:43.003616+00:00 2026-01-11 16:51:44.277816+00:00 {'input_dir': './mydataset', 'learning_rate': ... None None --input-dir ./mydataset --learning-rate 0.01 -... False 2026-01-11 16:51:43.005000+00:00 1 1 5 6.0 5.0 3 NaN
5 tcHmUzhvymiKz809 None main 2026-01-11 16:51:39.589319+00:00 2026-01-11 16:51:39.617383+00:00 {'key': 'my_analysis/dataset2.parquet'} None None --key my_analysis/dataset2.parquet False 2026-01-11 16:51:39.590000+00:00 1 1 4 NaN NaN 3 NaN
Storage
uid root description type region instance_uid is_locked created_at branch_id space_id created_by_id run_id
id
3 xN7ZLE2ZRDkY /home/runner/work/lamindb/lamindb/docs/test-track None local None 73KPGC58ahU9 False 2026-01-11 16:51:23.416000+00:00 1 1 3 None
Transform
uid key description kind source_code hash reference reference_type version_tag is_latest is_locked created_at branch_id space_id environment_id created_by_id
id
7 RzirVK1pH8r20000 run_script_with_step.py None script import argparse\nimport lamindb as ln\n\n\n@ln... HJbjZyWWczP-VmzKQsSORg None None None True False 2026-01-11 16:51:52.795000+00:00 1 1 None 3
6 6iwyj2StZOCd0000 run_track_with_features_and_params.py None script import argparse\nimport lamindb as ln\n\n\nif ... 9MjLyvM1QzE2nPIPDRzBwg None None None True False 2026-01-11 16:51:47.786000+00:00 1 1 None 3
5 tSK1x4dkrgeK0000 run_track_with_params.py None script import argparse\nimport lamindb as ln\n\nif __... 5RBz7zJICeKE1OSmg7gEdQ None None None True False 2026-01-11 16:51:43.001000+00:00 1 1 None 3
4 OJMwOrSY23iQ0000 my_workflow_with_click.py None function import click\nimport lamindb as ln\n\n\n@click... 0eX8wmaAWkuuAvACWwL1Xg None None None True False 2026-01-11 16:51:39.578000+00:00 1 1 None 3
3 VWauimLMV0g50000 my_workflow_with_step.py None function import lamindb as ln\n\n\[email protected]()\ndef subs... Ncx6UswxtCN3FZD86kgcVQ None None None True False 2026-01-11 16:51:35.899000+00:00 1 1 None 3
2 KEWEf1mBXvzJ0000 my_workflow.py None function import lamindb as ln\n\n\[email protected]()\ndef inge... uJ3fsnfaNN6EZ7Q0d8SQtw None None None True False 2026-01-11 16:51:32.593000+00:00 1 1 None 3
1 q6LXZ4wdmQUK0000 track.ipynb Manage notebooks, scripts & workflows notebook None None None None None True False 2026-01-11 16:51:28.169000+00:00 1 1 None 3

Manage notebook templates

A notebook acts like a template upon using lamin load to load it. Consider you run:

lamin load https://lamin.ai/account/instance/transform/Akd7gx7Y9oVO0000

Upon running the returned notebook, you’ll automatically create a new version and be able to browse it via the version dropdown on the UI.

Additionally, you can:

  • label using ULabel or Record, e.g., transform.records.add(template_label)

  • tag with an indicative version string, e.g., transform.version = "T1"; transform.save()

Saving a notebook as an artifact

Sometimes you might want to save a notebook as an artifact. This is how you can do it:

lamin save template1.ipynb --key templates/template1.ipynb --description "Template for analysis type 1" --registry artifact

A few checks at the end of this notebook:

assert run.params == {
    "input_dir": "./mydataset",
    "learning_rate": 0.01,
    "preprocess_params": {"downsample": True, "normalization": "the_good_one"},
}, run.params
assert my_project.artifacts.exists()
assert my_project.transforms.exists()
assert my_project.runs.exists()