Python: lamindb .md

A data framework for biology.

Installation:

pip install lamindb

If you just want to read data from a LaminDB instance, use DB:

import lamindb as ln

db = ln.DB("laminlabs/cellxgene")

To write data, connect to a writable instance:

lamin login
lamin connect account/name

You can create an instance at lamin.ai and invite collaborators. If you prefer to work with a local database (no login required), run:

lamin init --storage ./quickstart-data --modules bionty

LaminDB will then auto-connect upon import and you can then create & save objects like this:

import lamindb as ln
# → connected lamindb: account/instance

ln.Artifact("./my_dataset.parquet", key="datasets/my_dataset.parquet").save()

Lineage

Track inputs, outputs, parameters, and environments of notebooks, scripts, and functions.

track([transform, project, space, branch, ...])

Track a run of a notebook or script.

finish([ignore_non_consecutive])

Finish the run of a notebook or script.

flow([uid, global_run, track_arg_aliases])

Use @flow() to track a function as a workflow.

step([uid])

Use @step() to track a function as a step.

Artifacts

The central Artifact registry holds files, folders & arrays across any number of storage locations.

Artifact()

Datasets & models stored as files, folders, or arrays.

All other registries link to Artifact to provide context for finding, querying, validating, and managing artifacts. Here is an overview of the core data model:

https://lamin-site-assets.s3.amazonaws.com/.lamindb/HMfWLa1rFkxcxQEN0000.svg

Transforms & runs

Data transformations and their executions.

Transform()

Data transformations such as scripts, notebooks, functions, or pipelines.

Run()

Runs of transforms such as the executions of a script.

Records, labels, features & schemas

Create labels and manage flexible records, e.g., for samples or donors.

Record()

Flexible records with sheets & markdown pages.

ULabel()

Universal labels.

Define features & schemas to validate artifacts & records.

Feature()

Measurable properties such as dataframe columns or record fields.

Schema()

Schemas of datasets such as column sets of dataframes.

Managing operations

Project()

Projects to label artifacts, transforms, records, and runs.

Storage()

Storage locations of artifacts such as local directories or S3 buckets.

User()

Users.

Branch()

Branches for change management with archive and trash states.

Space()

Spaces with managed access for specific users or teams.

Collection()

Versioned collections of artifacts.

Reference()

References such as internal studies, papers, documents, or URLs.

Basic utilities

Connecting, viewing database content, accessing settings & run context.

DB(instance)

Query any registry of any instance.

connect([instance])

Connect the default database.

view(*[, limit, modules, registries, df])

View metadata.

save(records[, ignore_conflicts, batch_size])

Bulk save records.

UPath(*args[, protocol, chain_parser])

Path-like access to files.

settings

Global live settings (Settings).

context

Global run context (Context).

Curators and integrations

curators

Curators.

integrations

Integrations.

Examples, errors & setup

examples

Examples.

errors

Errors.

setup

Setup & configure LaminDB.

Developer API

base

Base library.

core

Core library.

models

Auxiliary models & database library.