Track notebooks, scripts & functions

For tracking pipelines, see: docs:pipelines.

# pip install lamindb
!lamin init --storage ./test-track
Hide code cell output
 initialized lamindb: testuser1/test-track

Track a notebook or script

Call track() to register your notebook or script as a transform and start capturing inputs & outputs of a run.

import lamindb as ln

ln.track()  # initiate a tracked notebook/script run

# your code automatically tracks inputs & outputs

ln.finish()  # mark run as finished, save execution report, source code & environment

Here is how a notebook with run report looks on the hub.

Explore it here.

You find your notebooks and scripts in the Transform registry (along with pipelines & functions). Run stores executions. You can use all usual ways of querying to obtain one or multiple transform records, e.g.:

transform = ln.Transform.get(key="my_analyses/my_notebook.ipynb")
transform.source_code  # source code
transform.runs  # all runs
transform.latest_run.report  # report of latest run
transform.latest_run.environment  # environment of latest run

To load a notebook or script from the hub, search or filter the transform page and use the CLI.

lamin load https://lamin.ai/laminlabs/lamindata/transform/13VINnFk89PE

Organize local development

If no development directory is set, script & notebooks keys equal their filenames. Otherwise, script & notebooks keys equal the relative path in the development directory.

To set the development directory to your current shell development directory, run:

lamin settings set dev-dir .

You can see the current status by running:

lamin info

Sync scripts with git

To sync scripts with with a git repo, either export an environment variable:

export LAMINDB_SYNC_GIT_REPO = <YOUR-GIT-REPO-URL>

Or set the following setting:

ln.settings.sync_git_repo = <YOUR-GIT-REPO-URL>

If you work on a single project in your lamindb instance, it makes sense to set LaminDB’s dev-dir to the root of the local git repo clone. If you work on multiple projects in your lamindb instance, you can use the dev-dir as the local root and nest git repositories in it.

Use projects

You can link the entities created during a run to a project.

import lamindb as ln

my_project = ln.Project(name="My project").save()  # create a project

ln.track(project="My project")  # auto-link entities to "My project"

ln.Artifact(
    ln.examples.datasets.file_fcs(), key="my_file.fcs"
).save()  # save an artifact
Hide code cell output
 connected lamindb: testuser1/test-track
 created Transform('NsUIB9UUK0LO0000', key='track.ipynb'), started new Run('GH1KOCjpxPWDC7G2') at 2025-11-05 21:33:13 UTC
 notebook imports: lamindb==1.15.0
 recommendation: to identify the notebook across renames, pass the uid: ln.track("NsUIB9UUK0LO", project="My project")
Artifact(uid='ZNpsjozpAihYHBUv0000', version=None, is_latest=True, key='my_file.fcs', description=None, suffix='.fcs', kind=None, otype=None, size=19330507, hash='rCPvmZB19xs4zHZ7p_-Wrg', n_files=None, n_observations=None, branch_id=1, space_id=1, storage_id=1, run_id=1, schema_id=None, created_by_id=1, created_at=2025-11-05 21:33:15 UTC, is_locked=False)

Filter entities by project, e.g., artifacts:

ln.Artifact.filter(projects=my_project).to_dataframe()
Hide code cell output
uid key description suffix kind otype size hash n_files n_observations version is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
1 ZNpsjozpAihYHBUv0000 my_file.fcs None .fcs None None 19330507 rCPvmZB19xs4zHZ7p_-Wrg None None None True False 2025-11-05 21:33:15.431000+00:00 1 1 1 1 None 1

Access entities linked to a project.

display(my_project.artifacts.to_dataframe())
display(my_project.transforms.to_dataframe())
display(my_project.runs.to_dataframe())
Hide code cell output
uid key description suffix kind otype size hash n_files n_observations version is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
1 ZNpsjozpAihYHBUv0000 my_file.fcs None .fcs None None 19330507 rCPvmZB19xs4zHZ7p_-Wrg None None None True False 2025-11-05 21:33:15.431000+00:00 1 1 1 1 None 1
uid key description type source_code hash reference reference_type version is_latest is_locked created_at branch_id space_id created_by_id _template_id
id
1 NsUIB9UUK0LO0000 track.ipynb Track notebooks, scripts & functions notebook None None None None None True False 2025-11-05 21:33:13.177000+00:00 1 1 1 None
uid name started_at finished_at params reference reference_type is_locked created_at branch_id space_id transform_id report_id _logfile_id environment_id created_by_id initiated_by_run_id
id
1 GH1KOCjpxPWDC7G2 None 2025-11-05 21:33:13.185094+00:00 None None None None False 2025-11-05 21:33:13.185000+00:00 1 1 1 None None None 1 None

Use spaces

You can write the entities created during a run into a space that you configure on LaminHub. This is particularly useful if you want to restrict access to a space. Note that this doesn’t affect bionty entities who should typically be commonly accessible.

ln.track(space="Our team space")

Track parameters & features

In addition to tracking source code, run reports & environments, you can track run parameters & features.

Let’s look at the following script, which has a few parameters.

run_track_with_params.py
import argparse
import lamindb as ln

if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("--input-dir", type=str)
    p.add_argument("--downsample", action="store_true")
    p.add_argument("--learning-rate", type=float)
    args = p.parse_args()
    params = {
        "input_dir": args.input_dir,
        "learning_rate": args.learning_rate,
        "preprocess_params": {
            "downsample": args.downsample,
            "normalization": "the_good_one",
        },
    }
    ln.track(params=params)

    # your code

    ln.finish()

Run the script.

!python scripts/run_track_with_params.py  --input-dir ./mydataset --learning-rate 0.01 --downsample
Hide code cell output
 connected lamindb: testuser1/test-track
 created Transform('jTSyIY1c5q580000', key='run_track_with_params.py'), started new Run('HbyLpugWPxxF8c1n') at 2025-11-05 21:33:17 UTC
→ params: input_dir='./mydataset', learning_rate=0.01, preprocess_params={'downsample': True, 'normalization': 'the_good_one'}
 recommendation: to identify the script across renames, pass the uid: ln.track("jTSyIY1c5q58", params={...})

Query for all runs that match certain parameters:

ln.Run.filter(
    params__learning_rate=0.01,
    params__preprocess_params__downsample=True,
).to_dataframe()
Hide code cell output
uid name started_at finished_at params reference reference_type is_locked created_at branch_id space_id transform_id report_id _logfile_id environment_id created_by_id initiated_by_run_id
id
2 HbyLpugWPxxF8c1n None 2025-11-05 21:33:17.903804+00:00 2025-11-05 21:33:19.027712+00:00 {'input_dir': './mydataset', 'learning_rate': ... None None False 2025-11-05 21:33:17.904000+00:00 1 1 2 3 None 2 1 None

Describe & get parameters:

run = ln.Run.filter(params__learning_rate=0.01).order_by("-started_at").first()
run.describe()
run.params
Hide code cell output
Run: HbyLpug (run_track_with_params.py)
├── uid: HbyLpugWPxxF8c1n                transform: run_track_with_params.py (0000)
started_at: 2025-11-05 21:33:17 UTC  finished_at: 2025-11-05 21:33:19 UTC      
status: completed                                                              
branch: main                         space: all                                
created_at: 2025-11-05 21:33:17 UTC  created_by: testuser1                     
├── Params
│   ├── input_dir: ./mydataset
│   ├── learning_rate: 0.01
│   └── preprocess_params: {'downsample': True, 'normalization': 'the_good_one'}
├── report: xNkImBr
→ connected lamindb: testuser1/test-track
→ created Transform('jTSyIY1c5q580000', key='run_track_with_params.py'), started …
→ params: input_dir='./mydataset', learning_rate=0.01, preprocess_params={'downs …
• recommendation: to identify the script across renames, pass the uid: ln.track( …
└── environment: AjdcpwV
    aiobotocore==2.25.1
    aiohappyeyeballs==2.6.1
    aiohttp==3.13.2
    aioitertools==0.12.0
    │ …
{'input_dir': './mydataset',
 'learning_rate': 0.01,
 'preprocess_params': {'downsample': True, 'normalization': 'the_good_one'}}

You can also track run features in analogy to artifact features.

In contrast to params, features are validated against the Feature registry and allow to express relationships with entities in your registries.

Let’s first define labels & features.

experiment_type = ln.Record(name="Experiment", is_type=True).save()
experiment_label = ln.Record(name="Experiment1", type=experiment_type).save()
ln.Feature(name="s3_folder", dtype=str).save()
ln.Feature(name="experiment", dtype=experiment_type).save()
Hide code cell output
Feature(uid='rfawEgXITG7Y', name='experiment', dtype='cat[Record[Experiment]]', is_type=None, unit=None, description=None, array_rank=0, array_size=0, array_shape=None, proxy_dtype=None, synonyms=None, branch_id=1, space_id=1, created_by_id=1, run_id=1, type_id=None, created_at=2025-11-05 21:33:19 UTC, is_locked=False)
!python scripts/run_track_with_features_and_params.py  --s3-folder s3://my-bucket/my-folder --experiment Experiment1
Hide code cell output
 connected lamindb: testuser1/test-track
 created Transform('qfXlgJu7IUvm0000', key='run_track_with_features_and_params.py'), started new Run('cMLJQuzQGVF7ujZ0') at 2025-11-05 21:33:21 UTC
→ params: example_param=42
→ features: s3_folder='s3://my-bucket/my-folder', experiment='Experiment1'
 recommendation: to identify the script across renames, pass the uid: ln.track("qfXlgJu7IUvm", params={...})
ln.Run.filter(s3_folder="s3://my-bucket/my-folder").to_dataframe()
Hide code cell output
uid name started_at finished_at params reference reference_type is_locked created_at branch_id space_id transform_id report_id _logfile_id environment_id created_by_id initiated_by_run_id
id
3 cMLJQuzQGVF7ujZ0 None 2025-11-05 21:33:21.964465+00:00 2025-11-05 21:33:23.052524+00:00 {'example_param': 42} None None False 2025-11-05 21:33:21.965000+00:00 1 1 3 4 None 2 1 None

Describe & get feature values.

run2 = ln.Run.filter(
    s3_folder="s3://my-bucket/my-folder", experiment="Experiment1"
).last()
run2.describe()
run2.features.get_values()
Hide code cell output
Run: cMLJQuz (run_track_with_features_and_params.py)
├── uid: cMLJQuzQGVF7ujZ0                transform: run_track_with_features_and_params.py (0000)
started_at: 2025-11-05 21:33:21 UTC  finished_at: 2025-11-05 21:33:23 UTC                   
status: completed                                                                           
branch: main                         space: all                                             
created_at: 2025-11-05 21:33:21 UTC  created_by: testuser1                                  
├── Params
│   └── example_param: 42
├── Features
└── experiment                      Record[Experiment]                 Experiment1                             
    s3_folder                       str                                s3://my-bucket/my-folder                
├── report: WN2A3bn
→ connected lamindb: testuser1/test-track
→ created Transform('qfXlgJu7IUvm0000', key='run_track_with_features_and_params. …
→ params: example_param=42
→ features: s3_folder='s3://my-bucket/my-folder', experiment='Experiment1'
│ …
└── environment: AjdcpwV
    aiobotocore==2.25.1
    aiohappyeyeballs==2.6.1
    aiohttp==3.13.2
    aioitertools==0.12.0
    │ …
{'experiment': 'Experiment1', 's3_folder': 's3://my-bucket/my-folder'}

Track functions

If you want more-fined-grained data lineage tracking, use the tracked() decorator.

@ln.tracked()
def subset_dataframe(
    input_artifact_key: str,
    output_artifact_key: str,
    subset_rows: int = 2,
    subset_cols: int = 2,
) -> None:
    artifact = ln.Artifact.get(key=input_artifact_key)
    dataset = artifact.load()
    new_data = dataset.iloc[:subset_rows, :subset_cols]
    ln.Artifact.from_dataframe(new_data, key=output_artifact_key).save()

Prepare a test dataset:

df = ln.examples.datasets.mini_immuno.get_dataset1(otype="DataFrame")
input_artifact_key = "my_analysis/dataset.parquet"
artifact = ln.Artifact.from_dataframe(df, key=input_artifact_key).save()
 writing the in-memory object into cache

Run the function with default params:

ouput_artifact_key = input_artifact_key.replace(".parquet", "_subsetted.parquet")
subset_dataframe(input_artifact_key, ouput_artifact_key)
Hide code cell output
 writing the in-memory object into cache

Query for the output:

subsetted_artifact = ln.Artifact.get(key=ouput_artifact_key)
subsetted_artifact.view_lineage()
_images/5ebbc478693e988116cefff36036a0c87bddc611e29cd7d8aac77d55c7c80144.svg

This is the run that created the subsetted_artifact:

subsetted_artifact.run
Run(uid='TZ78Fuha16Et2dVF', name=None, started_at=2025-11-05 21:33:23 UTC, finished_at=2025-11-05 21:33:23 UTC, params={'input_artifact_key': 'my_analysis/dataset.parquet', 'output_artifact_key': 'my_analysis/dataset_subsetted.parquet', 'subset_rows': 2, 'subset_cols': 2}, reference=None, reference_type=None, branch_id=1, space_id=1, transform_id=4, report_id=None, environment_id=None, created_by_id=1, initiated_by_run_id=1, created_at=2025-11-05 21:33:23 UTC, is_locked=False)

This is the function that created it:

subsetted_artifact.run.transform
Transform(uid='fxXqxn3AbO8W0000', version=None, is_latest=True, key='track.ipynb/subset_dataframe.py', description=None, type='function', hash='CUqkJpolJY1Q1tqyCoWIWg', reference=None, reference_type=None, branch_id=1, space_id=1, created_by_id=1, created_at=2025-11-05 21:33:23 UTC, is_locked=False)

This is the source code of this function:

subsetted_artifact.run.transform.source_code
'@ln.tracked()\ndef subset_dataframe(\n    input_artifact_key: str,\n    output_artifact_key: str,\n    subset_rows: int = 2,\n    subset_cols: int = 2,\n) -> None:\n    artifact = ln.Artifact.get(key=input_artifact_key)\n    dataset = artifact.load()\n    new_data = dataset.iloc[:subset_rows, :subset_cols]\n    ln.Artifact.from_dataframe(new_data, key=output_artifact_key).save()\n'

These are all versions of this function:

subsetted_artifact.run.transform.versions.to_dataframe()
uid key description type source_code hash reference reference_type version is_latest is_locked created_at branch_id space_id created_by_id _template_id
id
4 fxXqxn3AbO8W0000 track.ipynb/subset_dataframe.py None function @ln.tracked()\ndef subset_dataframe(\n inpu... CUqkJpolJY1Q1tqyCoWIWg None None None True False 2025-11-05 21:33:23.658000+00:00 1 1 1 None

This is the initating run that triggered the function call:

subsetted_artifact.run.initiated_by_run
Run(uid='GH1KOCjpxPWDC7G2', name=None, started_at=2025-11-05 21:33:13 UTC, finished_at=None, params=None, reference=None, reference_type=None, branch_id=1, space_id=1, transform_id=1, report_id=None, environment_id=None, created_by_id=1, initiated_by_run_id=None, created_at=2025-11-05 21:33:13 UTC, is_locked=False)

This is the transform of the initiating run:

subsetted_artifact.run.initiated_by_run.transform
Transform(uid='NsUIB9UUK0LO0000', version=None, is_latest=True, key='track.ipynb', description='Track notebooks, scripts & functions', type='notebook', hash=None, reference=None, reference_type=None, branch_id=1, space_id=1, created_by_id=1, created_at=2025-11-05 21:33:13 UTC, is_locked=False)

These are the parameters of the run:

subsetted_artifact.run.params
{'input_artifact_key': 'my_analysis/dataset.parquet',
 'output_artifact_key': 'my_analysis/dataset_subsetted.parquet',
 'subset_rows': 2,
 'subset_cols': 2}

These are the input artifacts:

subsetted_artifact.run.input_artifacts.to_dataframe()
uid key description suffix kind otype size hash n_files n_observations version is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
5 8Evkq51yeMs59ehI0000 my_analysis/dataset.parquet None .parquet dataset DataFrame 9868 wvfEBPwHL3XHiAb-o8fU6Q None 3 None True False 2025-11-05 21:33:23.636000+00:00 1 1 1 1 None 1

These are output artifacts:

subsetted_artifact.run.output_artifacts.to_dataframe()
uid key description suffix kind otype size hash n_files n_observations version is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
6 XPpcBJ7ChYBYcurQ0000 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 3238 UM8d9C-x_2fbc_46BScp8A None 2 None True False 2025-11-05 21:33:23.681000+00:00 1 1 1 4 None 1

Re-run the function with a different parameter:

subsetted_artifact = subset_dataframe(
    input_artifact_key, ouput_artifact_key, subset_cols=3
)
subsetted_artifact = ln.Artifact.get(key=ouput_artifact_key)
subsetted_artifact.view_lineage()
Hide code cell output
 writing the in-memory object into cache
 creating new artifact version for key 'my_analysis/dataset_subsetted.parquet' in storage '/home/runner/work/lamindb/lamindb/docs/test-track'
_images/ef6f69fd187de4ebf89b1916dca33f961cea2e9f198f099e710173f68b0a75ea.svg

We created a new run:

subsetted_artifact.run
Run(uid='dPdmxb0XgHfkBR0F', name=None, started_at=2025-11-05 21:33:24 UTC, finished_at=2025-11-05 21:33:24 UTC, params={'input_artifact_key': 'my_analysis/dataset.parquet', 'output_artifact_key': 'my_analysis/dataset_subsetted.parquet', 'subset_rows': 2, 'subset_cols': 3}, reference=None, reference_type=None, branch_id=1, space_id=1, transform_id=4, report_id=None, environment_id=None, created_by_id=1, initiated_by_run_id=1, created_at=2025-11-05 21:33:24 UTC, is_locked=False)

With new parameters:

subsetted_artifact.run.params
{'input_artifact_key': 'my_analysis/dataset.parquet',
 'output_artifact_key': 'my_analysis/dataset_subsetted.parquet',
 'subset_rows': 2,
 'subset_cols': 3}

And a new version of the output artifact:

subsetted_artifact.run.output_artifacts.to_dataframe()
uid key description suffix kind otype size hash n_files n_observations version is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
7 XPpcBJ7ChYBYcurQ0001 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 3852 7WGuLVamVyBMhPb2qRE_tA None 2 None True False 2025-11-05 21:33:24.145000+00:00 1 1 1 5 None 1

See the state of the database:

ln.view()
Hide code cell output
Artifact
uid key description suffix kind otype size hash n_files n_observations version is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
7 XPpcBJ7ChYBYcurQ0001 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 3852 7WGuLVamVyBMhPb2qRE_tA None 2.0 None True False 2025-11-05 21:33:24.145000+00:00 1 1 1 5 None 1
6 XPpcBJ7ChYBYcurQ0000 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 3238 UM8d9C-x_2fbc_46BScp8A None 2.0 None False False 2025-11-05 21:33:23.681000+00:00 1 1 1 4 None 1
5 8Evkq51yeMs59ehI0000 my_analysis/dataset.parquet None .parquet dataset DataFrame 9868 wvfEBPwHL3XHiAb-o8fU6Q None 3.0 None True False 2025-11-05 21:33:23.636000+00:00 1 1 1 1 None 1
1 ZNpsjozpAihYHBUv0000 my_file.fcs None .fcs None None 19330507 rCPvmZB19xs4zHZ7p_-Wrg None NaN None True False 2025-11-05 21:33:15.431000+00:00 1 1 1 1 None 1
Feature
uid name dtype is_type unit description array_rank array_size array_shape proxy_dtype synonyms is_locked created_at branch_id space_id created_by_id run_id type_id
id
2 rfawEgXITG7Y experiment cat[Record[Experiment]] None None None 0 0 None None None False 2025-11-05 21:33:19.571000+00:00 1 1 1 1 None
1 oZanpf4iDBvn s3_folder str None None None 0 0 None None None False 2025-11-05 21:33:19.563000+00:00 1 1 1 1 None
FeatureValue
value hash is_locked created_at branch_id space_id created_by_id run_id feature_id
id
1 s3://my-bucket/my-folder E-3iWq1AziFBjh_cbyr5ZA False 2025-11-05 21:33:21.981000+00:00 1 1 1 None 1
Project
uid name description is_type abbr url start_date end_date is_locked created_at branch_id space_id created_by_id run_id type_id
id
1 v8wpORq3rUbx My project None False None None None None False 2025-11-05 21:33:12.288000+00:00 1 1 1 None None
Record
uid name is_type description reference reference_type is_locked created_at branch_id space_id created_by_id type_id schema_id run_id
id
2 vRnr9CiGgvYvRulP Experiment1 False None None None False 2025-11-05 21:33:19.556000+00:00 1 1 1 1.0 None 1
1 AlxqRCB6Hjq58tV4 Experiment True None None None False 2025-11-05 21:33:19.551000+00:00 1 1 1 NaN None 1
Run
uid name started_at finished_at params reference reference_type is_locked created_at branch_id space_id transform_id report_id _logfile_id environment_id created_by_id initiated_by_run_id
id
5 dPdmxb0XgHfkBR0F None 2025-11-05 21:33:24.123504+00:00 2025-11-05 21:33:24.152096+00:00 {'input_artifact_key': 'my_analysis/dataset.pa... None None False 2025-11-05 21:33:24.124000+00:00 1 1 4 NaN None NaN 1 1.0
4 TZ78Fuha16Et2dVF None 2025-11-05 21:33:23.662631+00:00 2025-11-05 21:33:23.688634+00:00 {'input_artifact_key': 'my_analysis/dataset.pa... None None False 2025-11-05 21:33:23.663000+00:00 1 1 4 NaN None NaN 1 1.0
3 cMLJQuzQGVF7ujZ0 None 2025-11-05 21:33:21.964465+00:00 2025-11-05 21:33:23.052524+00:00 {'example_param': 42} None None False 2025-11-05 21:33:21.965000+00:00 1 1 3 4.0 None 2.0 1 NaN
2 HbyLpugWPxxF8c1n None 2025-11-05 21:33:17.903804+00:00 2025-11-05 21:33:19.027712+00:00 {'input_dir': './mydataset', 'learning_rate': ... None None False 2025-11-05 21:33:17.904000+00:00 1 1 2 3.0 None 2.0 1 NaN
1 GH1KOCjpxPWDC7G2 None 2025-11-05 21:33:13.185094+00:00 NaT None None None False 2025-11-05 21:33:13.185000+00:00 1 1 1 NaN None NaN 1 NaN
Storage
uid root description type region instance_uid is_locked created_at branch_id space_id created_by_id run_id
id
1 wOBrh4mOTrHA /home/runner/work/lamindb/lamindb/docs/test-track None local None 73KPGC58ahU9 False 2025-11-05 21:33:09.257000+00:00 1 1 1 None
Transform
uid key description type source_code hash reference reference_type version is_latest is_locked created_at branch_id space_id created_by_id _template_id
id
4 fxXqxn3AbO8W0000 track.ipynb/subset_dataframe.py None function @ln.tracked()\ndef subset_dataframe(\n inpu... CUqkJpolJY1Q1tqyCoWIWg None None None True False 2025-11-05 21:33:23.658000+00:00 1 1 1 None
3 qfXlgJu7IUvm0000 run_track_with_features_and_params.py None script import argparse\nimport lamindb as ln\n\n\nif ... 9MjLyvM1QzE2nPIPDRzBwg None None None True False 2025-11-05 21:33:21.962000+00:00 1 1 1 None
2 jTSyIY1c5q580000 run_track_with_params.py None script import argparse\nimport lamindb as ln\n\nif __... 5RBz7zJICeKE1OSmg7gEdQ None None None True False 2025-11-05 21:33:17.901000+00:00 1 1 1 None
1 NsUIB9UUK0LO0000 track.ipynb Track notebooks, scripts & functions notebook None None None None None True False 2025-11-05 21:33:13.177000+00:00 1 1 1 None

In a script

run_workflow.py
import argparse
import lamindb as ln


@ln.tracked()
def subset_dataframe(
    artifact: ln.Artifact,
    subset_rows: int = 2,
    subset_cols: int = 2,
    run: ln.Run | None = None,
) -> ln.Artifact:
    dataset = artifact.load(is_run_input=run)
    new_data = dataset.iloc[:subset_rows, :subset_cols]
    new_key = artifact.key.replace(".parquet", "_subsetted.parquet")
    return ln.Artifact.from_dataframe(new_data, key=new_key, run=run).save()


if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("--subset", action="store_true")
    args = p.parse_args()

    params = {"is_subset": args.subset}

    ln.track(params=params)

    if args.subset:
        df = ln.examples.datasets.mini_immuno.get_dataset1(otype="DataFrame")
        artifact = ln.Artifact.from_dataframe(
            df, key="my_analysis/dataset.parquet"
        ).save()
        subsetted_artifact = subset_dataframe(artifact)

    ln.finish()
!python scripts/run_workflow.py --subset
Hide code cell output
 connected lamindb: testuser1/test-track
 created Transform('Z0Yx1gPOS3xm0000', key='run_workflow.py'), started new Run('aq3c1QYU63k2q1NL') at 2025-11-05 21:33:26 UTC
→ params: is_subset=True
 recommendation: to identify the script across renames, pass the uid: ln.track("Z0Yx1gPOS3xm", params={...})
 writing the in-memory object into cache
 returning artifact with same hash: Artifact(uid='8Evkq51yeMs59ehI0000', version=None, is_latest=True, key='my_analysis/dataset.parquet', description=None, suffix='.parquet', kind='dataset', otype='DataFrame', size=9868, hash='wvfEBPwHL3XHiAb-o8fU6Q', n_files=None, n_observations=3, branch_id=1, space_id=1, storage_id=1, run_id=1, schema_id=None, created_by_id=1, created_at=2025-11-05 21:33:23 UTC, is_locked=False); to track this artifact as an input, use: ln.Artifact.get()
! cannot infer feature type of: None, returning '?
! skipping param run because dtype not JSON serializable
 writing the in-memory object into cache
 returning artifact with same hash: Artifact(uid='XPpcBJ7ChYBYcurQ0001', version=None, is_latest=True, key='my_analysis/dataset_subsetted.parquet', description=None, suffix='.parquet', kind='dataset', otype='DataFrame', size=3852, hash='7WGuLVamVyBMhPb2qRE_tA', n_files=None, n_observations=2, branch_id=1, space_id=1, storage_id=1, run_id=5, schema_id=None, created_by_id=1, created_at=2025-11-05 21:33:24 UTC, is_locked=False); to track this artifact as an input, use: ln.Artifact.get()
ln.view()
Hide code cell output
Artifact
uid key description suffix kind otype size hash n_files n_observations version is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
7 XPpcBJ7ChYBYcurQ0001 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 3852 7WGuLVamVyBMhPb2qRE_tA None 2.0 None True False 2025-11-05 21:33:24.145000+00:00 1 1 1 5 None 1
6 XPpcBJ7ChYBYcurQ0000 my_analysis/dataset_subsetted.parquet None .parquet dataset DataFrame 3238 UM8d9C-x_2fbc_46BScp8A None 2.0 None False False 2025-11-05 21:33:23.681000+00:00 1 1 1 4 None 1
5 8Evkq51yeMs59ehI0000 my_analysis/dataset.parquet None .parquet dataset DataFrame 9868 wvfEBPwHL3XHiAb-o8fU6Q None 3.0 None True False 2025-11-05 21:33:23.636000+00:00 1 1 1 1 None 1
1 ZNpsjozpAihYHBUv0000 my_file.fcs None .fcs None None 19330507 rCPvmZB19xs4zHZ7p_-Wrg None NaN None True False 2025-11-05 21:33:15.431000+00:00 1 1 1 1 None 1
Feature
uid name dtype is_type unit description array_rank array_size array_shape proxy_dtype synonyms is_locked created_at branch_id space_id created_by_id run_id type_id
id
2 rfawEgXITG7Y experiment cat[Record[Experiment]] None None None 0 0 None None None False 2025-11-05 21:33:19.571000+00:00 1 1 1 1 None
1 oZanpf4iDBvn s3_folder str None None None 0 0 None None None False 2025-11-05 21:33:19.563000+00:00 1 1 1 1 None
FeatureValue
value hash is_locked created_at branch_id space_id created_by_id run_id feature_id
id
1 s3://my-bucket/my-folder E-3iWq1AziFBjh_cbyr5ZA False 2025-11-05 21:33:21.981000+00:00 1 1 1 None 1
Project
uid name description is_type abbr url start_date end_date is_locked created_at branch_id space_id created_by_id run_id type_id
id
1 v8wpORq3rUbx My project None False None None None None False 2025-11-05 21:33:12.288000+00:00 1 1 1 None None
Record
uid name is_type description reference reference_type is_locked created_at branch_id space_id created_by_id type_id schema_id run_id
id
2 vRnr9CiGgvYvRulP Experiment1 False None None None False 2025-11-05 21:33:19.556000+00:00 1 1 1 1.0 None 1
1 AlxqRCB6Hjq58tV4 Experiment True None None None False 2025-11-05 21:33:19.551000+00:00 1 1 1 NaN None 1
Run
uid name started_at finished_at params reference reference_type is_locked created_at branch_id space_id transform_id report_id _logfile_id environment_id created_by_id initiated_by_run_id
id
7 Ex21tzOjjMkYeMhl None 2025-11-05 21:33:27.764323+00:00 2025-11-05 21:33:27.784058+00:00 {'artifact': 'Artifact[8Evkq51yeMs59ehI0000]',... None None False 2025-11-05 21:33:27.765000+00:00 1 1 6 NaN None NaN 1 6.0
6 aq3c1QYU63k2q1NL None 2025-11-05 21:33:26.742823+00:00 2025-11-05 21:33:27.785663+00:00 {'is_subset': True} None None False 2025-11-05 21:33:26.743000+00:00 1 1 5 8.0 None 2.0 1 NaN
5 dPdmxb0XgHfkBR0F None 2025-11-05 21:33:24.123504+00:00 2025-11-05 21:33:24.152096+00:00 {'input_artifact_key': 'my_analysis/dataset.pa... None None False 2025-11-05 21:33:24.124000+00:00 1 1 4 NaN None NaN 1 1.0
4 TZ78Fuha16Et2dVF None 2025-11-05 21:33:23.662631+00:00 2025-11-05 21:33:23.688634+00:00 {'input_artifact_key': 'my_analysis/dataset.pa... None None False 2025-11-05 21:33:23.663000+00:00 1 1 4 NaN None NaN 1 1.0
3 cMLJQuzQGVF7ujZ0 None 2025-11-05 21:33:21.964465+00:00 2025-11-05 21:33:23.052524+00:00 {'example_param': 42} None None False 2025-11-05 21:33:21.965000+00:00 1 1 3 4.0 None 2.0 1 NaN
2 HbyLpugWPxxF8c1n None 2025-11-05 21:33:17.903804+00:00 2025-11-05 21:33:19.027712+00:00 {'input_dir': './mydataset', 'learning_rate': ... None None False 2025-11-05 21:33:17.904000+00:00 1 1 2 3.0 None 2.0 1 NaN
1 GH1KOCjpxPWDC7G2 None 2025-11-05 21:33:13.185094+00:00 NaT None None None False 2025-11-05 21:33:13.185000+00:00 1 1 1 NaN None NaN 1 NaN
Storage
uid root description type region instance_uid is_locked created_at branch_id space_id created_by_id run_id
id
1 wOBrh4mOTrHA /home/runner/work/lamindb/lamindb/docs/test-track None local None 73KPGC58ahU9 False 2025-11-05 21:33:09.257000+00:00 1 1 1 None
Transform
uid key description type source_code hash reference reference_type version is_latest is_locked created_at branch_id space_id created_by_id _template_id
id
6 k2S5AI75Bdr40000 run_workflow.py/subset_dataframe.py None function @ln.tracked()\ndef subset_dataframe(\n arti... 9NYMDP5l5Iuu9F8VrO3vWQ None None None True False 2025-11-05 21:33:27.762000+00:00 1 1 1 None
5 Z0Yx1gPOS3xm0000 run_workflow.py None script import argparse\nimport lamindb as ln\n\n\n@ln... fwij4oyLV27mmm9f2GVY_A None None None True False 2025-11-05 21:33:26.740000+00:00 1 1 1 None
4 fxXqxn3AbO8W0000 track.ipynb/subset_dataframe.py None function @ln.tracked()\ndef subset_dataframe(\n inpu... CUqkJpolJY1Q1tqyCoWIWg None None None True False 2025-11-05 21:33:23.658000+00:00 1 1 1 None
3 qfXlgJu7IUvm0000 run_track_with_features_and_params.py None script import argparse\nimport lamindb as ln\n\n\nif ... 9MjLyvM1QzE2nPIPDRzBwg None None None True False 2025-11-05 21:33:21.962000+00:00 1 1 1 None
2 jTSyIY1c5q580000 run_track_with_params.py None script import argparse\nimport lamindb as ln\n\nif __... 5RBz7zJICeKE1OSmg7gEdQ None None None True False 2025-11-05 21:33:17.901000+00:00 1 1 1 None
1 NsUIB9UUK0LO0000 track.ipynb Track notebooks, scripts & functions notebook None None None None None True False 2025-11-05 21:33:13.177000+00:00 1 1 1 None

Manage notebook templates

A notebook acts like a template upon using lamin load to load it. Consider you run:

lamin load https://lamin.ai/account/instance/transform/Akd7gx7Y9oVO0000

Upon running the returned notebook, you’ll automatically create a new version and be able to browse it via the version dropdown on the UI.

Additionally, you can:

  • label using Record, e.g., transform.records.add(template_label)

  • tag with an indicative version string, e.g., transform.version = "T1"; transform.save()

Saving a notebook as an artifact

Sometimes you might want to save a notebook as an artifact. This is how you can do it:

lamin save template1.ipynb --key templates/template1.ipynb --description "Template for analysis type 1" --registry artifact

A few checks at the end of this notebook:

assert run.params == {
    "input_dir": "./mydataset",
    "learning_rate": 0.01,
    "preprocess_params": {"downsample": True, "normalization": "the_good_one"},
}, run.params
assert my_project.artifacts.exists()
assert my_project.transforms.exists()
assert my_project.runs.exists()