lamindb.Transform¶

Bases: SQLRecord, IsVersioned

Data transformations such as scripts, notebooks, functions, or pipelines.

A transform can be a function, a script, a notebook, or a pipeline. If you execute a transform, you generate a run (Run). A run has inputs and outputs.

Pipelines are typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, Dagster, redun, Airflow, …).

Transforms are versioned so that a given transform version maps on a given source code version.

The definition of transforms and runs is consistent with the OpenLineage specification where a transform would be called a “job” and a run a “run”.

Parameters:

key – str | None = None A short name or path-like semantic key.
type – TransformType | None = "pipeline" See TransformType.
version – str | None = None A version string.
description – str | None = None A description.
reference – str | None = None A reference, e.g., a URL.
reference_type – str | None = None A reference type, e.g., ‘url’.
source_code – str | None = None Source code of the transform.
revises – Transform | None = None An old version of the transform.
skip_hash_lookup – bool = False Skip the hash lookup so that a new transform is created even if a transform with the same hash already exists.

See also

track(): Track a script or notebook run.
Run: Executions of transforms.

Notes

Examples

Create a transform for a pipeline:

transform = ln.Transform(key="Cell Ranger", version="7.2.0", type="pipeline").save()

Create a transform from a notebook:

ln.track()

Attributes¶

property latest_run: Run¶: The latest run of this transform.

property stem_uid: str¶

Universal id characterizing the version family.

The full uid of a record is obtained via concatenating the stem uid and version information:

stem_uid = random_base62(n_char)  # a random base62 sequence of length 12 (transform) or 16 (artifact, collection)
version_uid = "0000"  # an auto-incrementing 4-digit base62 number
uid = f"{stem_uid}{version_uid}"  # concatenate the stem_uid & version_uid

property versions: QuerySet¶

Lists all records of the same version family.

>>> new_artifact = ln.Artifact(df2, revises=artifact).save()
>>> new_artifact.versions()

Simple fields¶

uid: str¶: Universal id.

key: str | None¶

A name or “/”-separated path-like string.

All transforms with the same key are part of the same version family.

description: str | None¶: A description.

type: TransformType¶: TransformType (default "pipeline").

source_code: str | None¶: Source code of the transform.

hash: str | None¶: Hash of the source code.

reference: str | None¶: Reference for the transform, e.g., a URL.

reference_type: str | None¶: Reference type of the transform, e.g., ‘url’.

config: str | None¶: Optional configuration for the transform.

is_flow: bool¶: Whether this transform is a flow orchestrating other transforms.

flow: Transform | None¶: The top-level transform that orchestrates or contextualizes this transform.

environment: Transform | None¶: An environment for executing the transform.

created_at: datetime¶: Time of creation of record.

updated_at: datetime¶: Time of last update to record.

version: str | None¶

Version (default None).

Defines version of a family of records characterized by the same stem_uid.

Consider using semantic versioning with Python versioning.

is_latest: bool¶: Boolean flag that indicates whether a record is the latest in its version family.

is_locked: bool¶: Whether the record is locked for edits.

Relational fields¶

branch: Branch¶

Life cycle state of record.

branch.name can be “main” (default branch), “trash” (trash), branch.name = "archive" (archived), or any other user-created branch typically planned for merging onto main after review.

space: Space¶: The space in which the record lives.

created_by: User¶: Creator of record.

ulabels: ULabel¶: ULabel annotations of this transform.

linked_in_records: Record¶: This transform is linked in these records as a value.

predecessors: Transform¶

Preceding transforms.

Allows manually defining preceding transforms. Is typically not necessary as data lineage is automatically tracked via runs whenever an artifact or collection serves as an input for a run.

runs: Run¶: Runs of this transform.

steps: Transform¶: Steps defined within this flow.

successors: Transform¶

Subsequent transforms.

See predecessors.

records¶

Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.

In the example:

class Pizza(Model):
    toppings = ManyToManyField(Topping, related_name='pizzas')

Pizza.toppings and Topping.pizzas are ManyToManyDescriptor instances.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

references: Reference¶: Linked references.

projects: Project¶: Linked projects.

blocks: TransformBlock¶: Blocks that annotate this artifact.

Class methods¶

classmethod from_git(url, path, key=None, version=None, entrypoint=None, branch=None, skip_hash_lookup=False)¶

Create a transform from a path in a git repository.

Parameters:

url (str) – URL of the git repository.
path (str) – Path to the file within the repository.
key (str | None, default: None) – Optional key for the transform.
version (str | None, default: None) – Optional version tag to checkout in the repository.
entrypoint (str | None, default: None) – Optional entrypoint for the transform.
branch (str | None, default: None) – Optional branch to checkout.
skip_hash_lookup (bool, default: False) – Skip the hash lookup so that a new transform is created even if a transform with the same hash already exists.

Return type:

Transform

Examples

Create from a Nextflow repo and auto-infer the commit hash from its latest version:

transform = ln.Transform.from_git(
    url="https://github.com/openproblems-bio/task_batch_integration",
    path="main.nf"
).save()

Create from a Nextflow repo and checkout a specific version:

transform = ln.Transform.from_git(
    url="https://github.com/openproblems-bio/task_batch_integration",
    path="main.nf",
    version="v2.0.0"
).save()
assert transform.version == "v2.0.0"

Create a sliding transform from a Nextflow repo’s dev branch. Unlike a regular transform, a sliding transform doesn’t pin a specific source code state, but adapts to whatever the referenced state on the branch is:

transform = ln.Transform.from_git(
    url="https://github.com/openproblems-bio/task_batch_integration",
    path="main.nf",
    branch="dev",
    version="dev",
).save()

Notes

A regular transform pins a specific source code state through its commit hash:

transform.source_code
#> repo: https://github.com/openproblems-bio/task_batch_integration
#> path: main.nf
#> commit: 68eb2ecc52990617dbb6d1bb5c7158d9893796bb

A sliding transform infers the source code state from a branch:

transform.source_code
#> repo: https://github.com/openproblems-bio/task_batch_integration
#> path: main.nf
#> branch: dev

If an entrypoint is provided, it is added to the source code below the path, e.g.:

transform.source_code
#> repo: https://github.com/openproblems-bio/task_batch_integration
#> path: main.nf
#> entrypoint: myentrypoint
#> commit: 68eb2ecc52990617dbb6d1bb5c7158d9893796bb

classmethod filter(*queries, **expressions)¶

Query records.

Parameters:

queries – One or multiple Q objects.
expressions – Fields and values passed as Django query expressions.

Return type:

QuerySet

See also

Guide: Query & search registries
Django documentation: Queries

Examples

>>> ln.Project(name="my label").save()
>>> ln.Project.filter(name__startswith="my").to_dataframe()

classmethod get(idlike=None, **expressions)¶

Get a single record.

Parameters:

idlike (int | str | None, default: None) – Either a uid stub, uid or an integer id.
expressions – Fields and values passed as Django query expressions.

Raises:

lamindb.errors.DoesNotExist – In case no matching record is found.

Return type:

SQLRecord

See also

Guide: Query & search registries
Django documentation: Queries

Examples

record = ln.Record.get("FvtpPJLJ")
record = ln.Record.get(name="my-label")

classmethod to_dataframe(include=None, features=False, limit=100)¶

Evaluate and convert to pd.DataFrame.

By default, maps simple fields and foreign keys onto DataFrame columns.

Guide: Query & search registries

Parameters:

include (str | list[str] | None, default: None) – Related data to include as columns. Takes strings of form "records__name", "cell_types__name", etc. or a list of such strings. For Artifact, Record, and Run, can also pass "features" to include features with data types pointing to entities in the core schema. If "privates", includes private fields (fields starting with _).
features (bool | list[str], default: False) – Configure the features to include. Can be a feature name or a list of such names. If "queryset", infers the features used within the current queryset. Only available for Artifact, Record, and Run.
limit (int, default: 100) – Maximum number of rows to display. If None, includes all results.
order_by – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.

Return type:

DataFrame

Examples

Include the name of the creator:

ln.Record.to_dataframe(include="created_by__name"])

Include features:

ln.Artifact.to_dataframe(include="features")

Include selected features:

ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])

classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶

Search.

Parameters:

string (str) – The input string to match against the field ontology values.
field (str | DeferredAttribute | None, default: None) – The field or fields to search. Search all string fields by default.
limit (int | None, default: 20) – Maximum amount of top results to return.
case_sensitive (bool, default: False) – Whether the match is case sensitive.

Return type:

QuerySet

Returns:

A sorted DataFrame of search results with a score in column score. If return_queryset is True. QuerySet.

See also

filter() lookup()

Examples

records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save()
ln.Record.search("Label2")

classmethod lookup(field=None, return_field=None)¶

Return an auto-complete object for a field.

Parameters:

field (str | DeferredAttribute | None, default: None) – The field to look up the values for. Defaults to first string field.
return_field (str | DeferredAttribute | None, default: None) – The field to return. If None, returns the whole record.
keep – When multiple records are found for a lookup, how to return the records. - "first": return the first record. - "last": return the last record. - False: return all records.

Return type:

NamedTuple

Returns:

A NamedTuple of lookup information of the field values with a dictionary converter.

See also

search()

Examples

Lookup via auto-complete on .:

import bionty as bt
bt.Gene.from_source(symbol="ADGB-DT").save()
lookup = bt.Gene.lookup()
lookup.adgb_dt

Look up via auto-complete in dictionary:

lookup_dict = lookup.dict()
lookup_dict['ADGB-DT']

Look up via a specific field:

lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id")
genes.ensg00000002745

Return a specific field value instead of the full record:

lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")

classmethod connect(instance)¶

Query a non-default LaminDB instance.

Parameters:: instance (str | None) – An instance identifier of form “account_handle/instance_name”.
Return type:: QuerySet

Examples

ln.Record.connect("account_handle/instance_name").search("label7", field="name")

Methods¶

describe(return_str=False)¶

Describe record including relations.

Parameters:: return_str (bool, default: False) – Return a string instead of printing.
Return type:: None | str

view_lineage(with_successors=False, distance=5)¶

View lineage of transforms.

Note that this only accounts for manually defined predecessors and successors.

Auto-generate lineage through inputs and outputs of runs is not included.

restore()¶

Restore from trash onto the main branch.

Does not restore descendant records if the record is HasType with is_type = True.

Return type:: None

delete(permanent=None, **kwargs)¶

Delete record.

If record is HasType with is_type = True, deletes all descendant records, too.

Parameters:: permanent (bool | None, default: None) – Whether to permanently delete the record (skips trash). If None, performs soft delete if the record is not already in the trash.
Return type:: None

Examples

For any SQLRecord object record, call:

>>> record.delete()

save(*args, **kwargs)¶

Save.

Always saves to the default database.

Return type:: TypeVar(T, bound= SQLRecord)

refresh_from_db(using=None, fields=None, from_queryset=None)¶

Reload field values from the database.

By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.

Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.

When accessing deferred fields of an instance, the deferred loading of the field will call this method.

async arefresh_from_db(using=None, fields=None, from_queryset=None)¶