lamindb.Transform

class lamindb.Transform(name: str, key: str | None = None, version: str | None = None, type: TransformType | None = None, is_new_version_of: Transform | None = None)

Bases: Registry, HasParents, IsVersioned

Data transformations.

A transform can refer to a simple Python function, script. notebook, or a pipeline. If you execute a transform, you generate a run of a transform (Run). A run has input and output data.

A pipeline is typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, MetaFlow, redun, Airflow, …) and stored in a versioned repository.

Transforms are versioned so that a given transform maps 1:1 to a specific version of code. If you switch on sync_git_repo. ny script-like transform is synced its hashed state in a git repository.

If you execute a transform, you generate a Run record. The definition of transforms and runs is consistent the OpenLineage specification where a Transform record would be called a “job” and a Run record a “run”.

Parameters:
  • namestr A name or title.

  • keystr | None = None A short name or path-like semantic key.

  • versionstr | None = None A version.

  • typeTransformType | None = "pipeline" Either 'notebook', 'pipeline' or 'script'.

  • is_new_version_ofTransform | None = None An old version of the transform.

See also

track()

Globally track a script, notebook or pipeline run.

Run

Executions of transforms.

Notes

Examples

Create a transform for a pipeline:

>>> transform = ln.Transform(name="Cell Ranger", version="7.2.0", type="pipeline")
>>> transform.save()

Create a transform from a notebook:

>>> ln.track()

View parents of a transform:

>>> transform.view_parents()

Attributes

latest_run
stem_uid

Universal id characterizing the version family. str.

The full uid of a record is obtained via concatenating the stem uid and version information:

stem_uid = random_base62(n_char)  # a random base62 sequence of length n_char
version_uid = encode_base62(md5_hash(version))[:4]  # version is, e.g., "1" or "2.1.0" or "2022-03-01"
uid = f"{stem_uid}{version_uid}"  # concatenate the stem_uid & version_uid
versions

Lists all records of the same version family. QuerySet.

>>> new_artifact = ln.Artifact(df2, is_new_version_of=artifact)
>>> new_artifact.save()
>>> new_artifact.versions()

Fields

version CharField

Version (default None).

Defines version of a family of records characterized by the same stem_uid.

Consider using semantic versioning with Python versioning.

id AutoField

Internal id, valid only in one DB instance.

uid CharField

Universal id.

name CharField

A name or title. For instance. pipeline name, notebook title, etc.

key CharField

A key for concise reference & versioning (optional).

description CharField

A description (optional).

type CharField

Transform type (default "pipeline").

latest_report ForeignKey

Latest run report.

source_code ForeignKey

Source of the transform if stored as artifact within LaminDB.

reference CharField

Reference for the transform, e.g.. URL.

reference_type CharField

Type of reference, e.g., ‘url’ or ‘doi’.

ulabels ManyToManyField

Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.

In the example:

class Pizza(Model):
    toppings = ManyToManyField(Topping, related_name='pizzas')

Pizza.toppings and Topping.pizzas are ManyToManyDescriptor instances.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

parents ManyToManyField

Parent transforms (predecessors) in data flow.

These are auto-populated whenever a transform loads an artifact or collection as run input.

created_at DateTimeField

Time of creation of record.

updated_at DateTimeField

Time of last update to record.

created_by ForeignKey

Creator of record. User

Methods

delete()
Return type:

None

get_type_display(*, field=<django.db.models.fields.CharField: type>)