Track notebooks & scripts

Call track() to track code along with its inputs and outputs.

Note

Tracking of notebooks & scripts is analogous to tracking data transformations through pipelines, functions & UI, see Project flow.

# !pip install 'lamindb[jupyter]'
!lamin init --storage ./test-track
Hide code cell output
💡 connected lamindb: testuser1/test-track

Initiate tracking

When you first call ln.track(), it raises an exception and creates a stem_uid & version to identify a notebook or script in your database.

When you call it a second time, ln.track() saves transform and run records in their registries. The Transform registry allows you to find your notebooks, scripts and pipelines. The Run registry allows you to find their runs.

import lamindb as ln

ln.settings.transform.stem_uid = "9priar0hoE5u"  # <-- auto-generated by ln.track()
ln.settings.transform.version = "1"  # <-- auto-generated by ln.track()
ln.track()
💡 connected lamindb: testuser1/test-track
💡 notebook imports: lamindb==0.74.3
💡 saved: Transform(uid='9priar0hoE5u5zKv', version='1', name='Track notebooks & scripts', key='track', type='notebook', created_by_id=1, updated_at='2024-07-26 14:36:39 UTC')
💡 saved: Run(uid='ASw9w1EIkVDxKupQHTuo', transform_id=1, created_by_id=1)
Run(uid='ASw9w1EIkVDxKupQHTuo', started_at='2024-07-26 14:36:39 UTC', is_consecutive=True, transform_id=1, created_by_id=1)

LaminDB now automatically tracks all input and output data.

Save run reports and source code

If you want to save the run report & source code for a notebook or script, call:

ln.finish()

This is how a notebook with run report looks on the hub:

Query for a notebook or script

In the API, filter Transform to obtain a transform record:


transform = ln.Transform.filter(name="Track notebooks & scripts").one()
transform.source_code  # source code
transform.latest_run.report  # report of latest run
transform.runs  # all runs

On the hub, use the search or filter in the Transform view.

Sync scripts with GitHub

To sync with your git commit, add the following line to your script:

ln.settings.sync_git_repo = <YOUR-GIT-REPO-URL>

A tracked Python script could look like this:

# my_script.py

import lamindb as ln

# initiate tracking
ln.settings.sync_git_repo = "https://github.com/..."
ln.settings.transform.stem_uid = "9priar0hoE5u"
ln.settings.transform.version = "1"
run = ln.track()

# load input artifacts
artifact = ln.Artifact.filter(...).one()
data = artifact.load()

output_data = ...

# save output artifacts
output_artifact = ln.Artifact(output_data, ...).save()

# save the script as transform.source_code
ln.finish()

You’ll now see the GitHub emoji clickable on the hub:


Hide code cell content
# clean up test instance
!lamin delete --force test-track
💡 deleting instance testuser1/test-track