lamindb.Transform

class lamindb.Transform(name: str, key: str | None = None, type: TransformType | None = None, revises: Transform | None = None)

Bases: SQLRecord, IsVersioned

Data transformations such as scripts, notebooks, functions, or pipelines.

A “transform” can refer to a Python function, a script, a notebook, or a pipeline. If you execute a transform, you generate a run (Run). A run has inputs and outputs.

A pipeline is typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, MetaFlow, redun, Airflow, …) and stored in a versioned repository.

Transforms are versioned so that a given transform version maps on a given source code version.

Can I sync transforms to git?

If you switch on sync_git_repo a script-like transform is synched to its hashed state in a git repository upon calling ln.track().

>>> ln.settings.sync_git_repo = "https://github.com/laminlabs/lamindb"
>>> ln.track()

The definition of transforms and runs is consistent the OpenLineage specification where a Transform record would be called a “job” and a Run record a “run”.

Parameters:
  • namestr A name or title.

  • keystr | None = None A short name or path-like semantic key.

  • typeTransformType | None = "pipeline" See TransformType.

  • revisesTransform | None = None An old version of the transform.

See also

track()

Globally track a script or notebook run.

Run

Executions of transforms.

Notes

Examples

Create a transform for a pipeline:

>>> transform = ln.Transform(key="Cell Ranger", version="7.2.0", type="pipeline").save()

Create a transform from a notebook:

>>> ln.track()

View predecessors of a transform:

>>> transform.view_lineage()

Attributes

property latest_run: Run

The latest run of this transform.

property name: str

Name of the transform.

Splits key on / and returns the last element.

property stem_uid: str

Universal id characterizing the version family.

The full uid of a record is obtained via concatenating the stem uid and version information:

stem_uid = random_base62(n_char)  # a random base62 sequence of length 12 (transform) or 16 (artifact, collection)
version_uid = "0000"  # an auto-incrementing 4-digit base62 number
uid = f"{stem_uid}{version_uid}"  # concatenate the stem_uid & version_uid
property versions: QuerySet

Lists all records of the same version family.

>>> new_artifact = ln.Artifact(df2, revises=artifact).save()
>>> new_artifact.versions()

Simple fields

uid: str

Universal id.

key: str | None

A name or “/”-separated path-like string.

All transforms with the same key are part of the same version family.

description: str | None

A description.

type: TransformType

TransformType (default "pipeline").

source_code: str | None

Source code of the transform.

hash: str | None

Hash of the source code.

reference: str | None

Reference for the transform, e.g., a URL.

reference_type: str | None

Reference type of the transform, e.g., ‘url’.

created_at: datetime

Time of creation of record.

updated_at: datetime

Time of last update to record.

version: str | None

Version (default None).

Defines version of a family of records characterized by the same stem_uid.

Consider using semantic versioning with Python versioning.

is_latest: bool

Boolean flag that indicates whether a record is the latest in its version family.

Relational fields

branch: Branch

Whether record is on a branch or in another “special state”.

space: Space

The space in which the record lives.

created_by: User

Creator of record.

ulabels: ULabel

ULabel annotations of this transform.

predecessors: Transform

Preceding transforms.

Allows to _manually_ define predecessors. Is typically not necessary as data lineage is automatically tracked via runs whenever an artifact or collection serves as an input for a run.

runs: Run

Runs of this transform.

successors: Transform

Subsequent transforms.

See predecessors.

references: Reference

Linked references.

projects: Project

Linked projects.

Class methods

classmethod df(include=None, features=False, limit=100)

Convert to pd.DataFrame.

By default, shows all direct fields, except updated_at.

Use arguments include or feature to include other data.

Parameters:
  • include (str | list[str] | None, default: None) – Related fields to include as columns. Takes strings of form "ulabels__name", "cell_types__name", etc. or a list of such strings.

  • features (bool | list[str], default: False) – If a list of feature names, filters Feature down to these features. If True, prints all features with dtypes in the core schema module. If "queryset", infers the features used within the set of artifacts or records. Only available for Artifact and Record.

  • limit (int, default: 100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.

Return type:

DataFrame

Examples

Include the name of the creator in the DataFrame:

>>> ln.ULabel.df(include="created_by__name"])

Include display of features for Artifact:

>>> df = ln.Artifact.df(features=True)
>>> ln.view(df)  # visualize with type annotations

Only include select features:

>>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"])
classmethod filter(*queries, **expressions)

Query records.

Parameters:
  • queries – One or multiple Q objects.

  • expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

Returns:

A QuerySet.

See also

Examples

>>> ln.ULabel(name="my label").save()
>>> ln.ULabel.filter(name__startswith="my").df()
classmethod get(idlike=None, **expressions)

Get a single record.

Parameters:
  • idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.

  • expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

SQLRecord

See also

Examples

ulabel = ln.ULabel.get("FvtpPJLJ")
ulabel = ln.ULabel.get(name="my-label")
classmethod lookup(field=None, return_field=None)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

  • keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> bt.Gene.from_source(symbol="ADGB-DT").save()
>>> lookup = bt.Gene.lookup()
>>> lookup.adgb_dt
>>> lookup_dict = lookup.dict()
>>> lookup_dict['ADGB-DT']
>>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
>>> genes.ensg00000002745
>>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
classmethod search(string, *, field=None, limit=20, case_sensitive=False)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.

  • limit (int | None, default: 20) – Maximum amount of top results to return.

  • case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name")
>>> ln.save(ulabels)
>>> ln.ULabel.search("ULabel2")
classmethod using(instance)

Use a non-default LaminDB instance.

Parameters:

instance (str | None) – An instance identifier of form “account_handle/instance_name”.

Return type:

QuerySet

Examples

>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name")
            uid    score
name
ULabel7  g7Hk9b2v  100.0
ULabel5  t4Jm6s0q   75.0
ULabel6  r2Xw8p1z   75.0

Methods

delete()

Delete.

Return type:

None

save(*args, **kwargs)

Save.

Always saves to the default database.

Return type:

TypeVar(T, bound= SQLRecord)

view_lineage(with_successors=False, distance=5)

View lineage of transforms.

Note that this only accounts for manually defined predecessors and successors.

Auto-generate lineage through inputs and outputs of runs is not included.