lamindb.Transform

class lamindb.Transform(name: str, key: str | None = None, version: str | None = None, type: TransformType | None = None, is_new_version_of: Transform | None = None)

Bases: Record, HasParents, IsVersioned

Data transformations.

A transform can refer to a Python function, a script, notebook, or a pipeline. If you execute a transform, you generate a run (Run). A run has input and output data.

A pipeline is typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, MetaFlow, redun, Airflow, …) and stored in a versioned repository.

Transforms are versioned so that a given transform maps 1:1 to a specific version of code.

Can I sync transforms to git?

If you switch on sync_git_repo a script-like transform is synched to its hashed state in a git repository upon calling ln.track().

The definition of transforms and runs is consistent the OpenLineage specification where a Transform record would be called a “job” and a Run record a “run”.

Parameters:
  • namestr A name or title.

  • keystr | None = None A short name or path-like semantic key.

  • versionstr | None = None A version.

  • typeTransformType | None = "pipeline" Either 'notebook', 'pipeline' or 'script'.

  • is_new_version_ofTransform | None = None An old version of the transform.

See also

track()

Globally track a script, notebook or pipeline run.

Run

Executions of transforms.

Notes

Examples

Create a transform for a pipeline:

>>> transform = ln.Transform(name="Cell Ranger", version="7.2.0", type="pipeline")
>>> transform.save()

Create a transform from a notebook:

>>> ln.track()

View parents of a transform:

>>> transform.view_parents()

Attributes

property latest_run: Run

Fields

version: str

Version (default None).

Defines version of a family of records characterized by the same stem_uid.

Consider using semantic versioning with Python versioning.

id: int

Internal id, valid only in one DB instance.

uid: str

Universal id.

name: str

A name or title. For instance. pipeline name, notebook title, etc.

key: str

A key for concise reference & versioning (optional).

description: str

A description (optional).

type: str

Transform type (default "pipeline").

latest_report: Artifact

Latest run report.

source_code: Artifact

Source of the transform if stored as artifact within LaminDB.

reference: str

Reference for the transform, e.g.. URL.

reference_type: str

Type of reference, e.g., ‘url’ or ‘doi’.

ulabels: ULabel

Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.

In the example:

class Pizza(Model):
    toppings = ManyToManyField(Topping, related_name='pizzas')

Pizza.toppings and Topping.pizzas are ManyToManyDescriptor instances.

Most of the implementation is delegated to a dynamically defined manager class built by create_forward_many_to_many_manager() defined below.

parents: Transform

Parent transforms (predecessors) in data flow.

These are auto-populated whenever a transform loads an artifact or collection as run input.

created_at: datetime

Time of creation of record.

updated_at: datetime

Time of last update to record.

created_by: User

Creator of record. User

Methods

delete()
Return type:

None

get_type_display(*, field=<django.db.models.fields.CharField: type>)