Track notebooks & scripts

In addition to tracking Python scripts, LaminDB tracks interactive analyses performed in notebooks.

By calling track() in a notebook or script, input data, and output data get automatically registered associated with the run.

Note

Provenance tracking of notebooks & scripts is analogous to tracking pipelines, scripts & UI data manipulation, see Project flow.

Setup

Install the lamindb Python package:

pip install 'lamindb[jupyter]'
!lamin init --storage ./test-track
Hide code cell output
💡 connected lamindb: testuser1/test-track
import lamindb as ln

ln.settings.verbosity = "hint"
💡 connected lamindb: testuser1/test-track

Initiate tracking

Call track() to auto-generate IDs to track data lineage. Copy these into your cell above track().

ln.settings.transform.stem_uid = "9priar0hoE5u"
ln.settings.transform.version = "1"
ln.track()
💡 notebook imports: lamindb==0.74.1
💡 saved: Transform(uid='9priar0hoE5u5zKv', version='1', name='Track notebooks & scripts', key='track', type='notebook', created_by_id=1, updated_at='2024-07-06 13:06:23 UTC')
💡 saved: Run(uid='aCfUcGDU4pGjwBIJCrCG', transform_id=1, created_by_id=1)
💡 tracked pip freeze > /home/runner/.cache/lamindb/run_env_pip_aCfUcGDU4pGjwBIJCrCG.txt
Run(uid='aCfUcGDU4pGjwBIJCrCG', started_at='2024-07-06 13:06:23 UTC', is_consecutive=True, transform_id=1, created_by_id=1)

LaminDB now automatically tracks all input and output data.

Save run reports and source artifact

If you want to save a notebook including its run report & source artifact, run:

ln.finish()

See how a transform with execution reports looks in LaminHub:

Query for a notebook or script

In the API, filter the Transform registry to obtain a transform record:

import lamindb as ln


transform = ln.Transform.filter(name="Track notebooks & scripts").one()
# Your notebook is linked with to its source code (stripped of its output cells) and execution report (with the notebook's output cells)
transform.source_code
transform.latest_report

On LaminHub, use the search or filter in the Transform view.

Sync script transforms with GitHub

To sync with your git commit, add the following line to your script:

ln.settings.sync_git_repo = <YOUR-GIT-REPO-URL>

A tracked Python script typically looks like this:

# my_script.py

# initiate tracking
ln.settings.transform.stem_uid = "9priar0hoE5u"
ln.settings.transform.version = "1"
ln.settings.sync_git_repo = "https://github.com/..."
run = ln.track()
# you may tag your transform so that it's easier to find
ulabel = ln.ULabel.filter(name="guide").one()
run.transform.ulabels.add(ulabel)

# load input artifacts
artifact = ln.Artifact.filter(...).one()
artifact.load()
# <YOUR ANALYSIS CODE HERE>
output_data = ...

# save output artifacts
output_artifact = ln.Artifact(output_data, ...).save()

# save the script as transform.source_code
ln.finish()

See how a tracked and git-synced script looks in LaminHub:

Hide code cell content
# clean up test instance
!lamin delete --force test-track
!rm -r test-track
💡 deleting instance testuser1/test-track
rm: cannot remove 'test-track': No such file or directory