lamindb.Transform

class lamindb.Transform(name: str, key: str | None = None, type: Literal['pipeline', 'notebook', 'upload', 'script', 'function', 'glue'] | None = None, revises: Transform | None = None)

Bases: Record, IsVersioned

Data transformations.

A “transform” can refer to a Python function, a script, a notebook, or a pipeline. If you execute a transform, you generate a run (Run). A run has inputs and outputs.

A pipeline is typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, MetaFlow, redun, Airflow, …) and stored in a versioned repository.

Transforms are versioned so that a given transform version maps on a given source code version.

Can I sync transforms to git?

If you switch on sync_git_repo a script-like transform is synched to its hashed state in a git repository upon calling ln.track().

>>> ln.settings.sync_git_repo = "https://github.com/laminlabs/lamindb"
>>> ln.track()

The definition of transforms and runs is consistent the OpenLineage specification where a Transform record would be called a “job” and a Run record a “run”.

Parameters:
  • namestr A name or title.

  • keystr | None = None A short name or path-like semantic key.

  • typeTransformType | None = "pipeline" See TransformType.

  • revisesTransform | None = None An old version of the transform.

See also

track()

Globally track a script, notebook or pipeline run.

Run

Executions of transforms.

Notes

Examples

Create a transform for a pipeline:

>>> transform = ln.Transform(name="Cell Ranger", version="7.2.0", type="pipeline").save()

Create a transform from a notebook:

>>> ln.track()

View predecessors of a transform:

>>> transform.view_lineage()

Attributes

property latest_run: Run

The latest run of this transform.

property stem_uid: str

Universal id characterizing the version family.

The full uid of a record is obtained via concatenating the stem uid and version information:

stem_uid = random_base62(n_char)  # a random base62 sequence of length 12 (transform) or 16 (artifact, collection)
version_uid = "0000"  # an auto-incrementing 4-digit base62 number
uid = f"{stem_uid}{version_uid}"  # concatenate the stem_uid & version_uid
property versions: QuerySet

Lists all records of the same version family.

>>> new_artifact = ln.Artifact(df2, revises=artifact).save()
>>> new_artifact.versions()

Simple fields

uid: str

Universal id.

name: str | None

A name or title. For instance, a pipeline name, notebook title, etc.

key: str | None

A key for concise reference & versioning (optional).

description: str | None

A description (optional).

type: Literal['pipeline', 'notebook', 'upload', 'script', 'function', 'glue']

TransformType (default "pipeline").

source_code: str | None

Source code of the transform.

Changed in version 0.75: The source_code field is no longer an artifact, but a text field.

hash: str | None

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

reference: str | None

Reference for the transform, e.g.. URL.

reference_type: str | None

A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.

created_at: datetime

Time of creation of record.

updated_at: datetime

Time of last update to record.

version: str | None

Version (default None).

Defines version of a family of records characterized by the same stem_uid.

Consider using semantic versioning with Python versioning.

is_latest: bool

Boolean flag that indicates whether a record is the latest in its version family.

Relational fields

created_by: User

Creator of record.

ulabels: ULabel

ULabel annotations of this transform.

predecessors: Transform

Preceding transforms.

These are auto-populated whenever an artifact or collection serves as a run input, e.g., artifact.run and artifact.transform get populated & saved.

The table provides a convenience method to query for the predecessors that bypassed querying the Run.

successors: Transform

Subsequent transforms.

See predecessors.

runs: Run

Runs of this transform.

output_artifacts: Artifact

The artifacts generated by all runs of this transform.

If you’re looking for the outputs of a single run, see lamindb.Run.output_artifacts.

output_collections: Collection

The collections generated by all runs of this transform.

If you’re looking for the outputs of a single run, see lamindb.Run.output_collections.

Class methods

classmethod df(include=None, features=False, limit=100)

Convert to pd.DataFrame.

By default, shows all direct fields, except updated_at.

Use arguments include or feature to include other data.

Parameters:
  • include (str | list[str] | None, default: None) – Related fields to include as columns. Takes strings of form "ulabels__name", "cell_types__name", etc. or a list of such strings.

  • features (bool | list[str], default: False) – If True, map all features of the Feature registry onto the resulting DataFrame. Only available for Artifact.

  • limit (int, default: 100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.

Return type:

DataFrame

Examples

Include the name of the creator in the DataFrame:

>>> ln.ULabel.df(include="created_by__name"])

Include display of features for Artifact:

>>> df = ln.Artifact.df(features=True)
>>> ln.view(df)  # visualize with type annotations

Only include select features:

>>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"])
classmethod filter(*queries, **expressions)

Query records.

Parameters:
  • queries – One or multiple Q objects.

  • expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

Returns:

A QuerySet.

See also

Examples

>>> ln.ULabel(name="my label").save()
>>> ln.ULabel.filter(name__startswith="my").df()
classmethod get(idlike=None, **expressions)

Get a single record.

Parameters:
  • idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.

  • expressions – Fields and values passed as Django query expressions.

Return type:

Record

Returns:

A record.

Raises:

lamindb.core.exceptions.DoesNotExist – In case no matching record is found.

See also

Examples

>>> ulabel = ln.ULabel.get("FvtpPJLJ")
>>> ulabel = ln.ULabel.get(name="my-label")
classmethod lookup(field=None, return_field=None)

Return an auto-complete object for a field.

Parameters:
  • field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.

  • return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

>>> import bionty as bt
>>> bt.settings.organism = "human"
>>> bt.Gene.from_source(symbol="ADGB-DT").save()
>>> lookup = bt.Gene.lookup()
>>> lookup.adgb_dt
>>> lookup_dict = lookup.dict()
>>> lookup_dict['ADGB-DT']
>>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
>>> genes.ensg00000002745
>>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
classmethod search(string, *, field=None, limit=20, case_sensitive=False)

Search.

Parameters:
  • string (str) – The input string to match against the field ontology values.

  • field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.

  • limit (int | None, default: 20) – Maximum amount of top results to return.

  • case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name")
>>> ln.save(ulabels)
>>> ln.ULabel.search("ULabel2")
classmethod using(instance)

Use a non-default LaminDB instance.

Parameters:

instance (str | None) – An instance identifier of form “account_handle/instance_name”.

Return type:

QuerySet

Examples

>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name")
            uid    score
name
ULabel7  g7Hk9b2v  100.0
ULabel5  t4Jm6s0q   75.0
ULabel6  r2Xw8p1z   75.0

Methods

delete()
Return type:

None

save(*args, **kwargs)

Save.

Always saves to the default database.

Return type:

Record

view_lineage(with_successors=False, distance=5)