lamindb.Transform¶
- class lamindb.Transform(name: str, key: str | None = None, type: Literal['pipeline', 'notebook', 'upload', 'script', 'function', 'glue'] | None = None, revises: Transform | None = None)¶
Bases:
Record
,IsVersioned
Data transformations.
A “transform” can refer to a Python function, a script, a notebook, or a pipeline. If you execute a transform, you generate a run (
Run
). A run has inputs and outputs.A pipeline is typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, MetaFlow, redun, Airflow, …) and stored in a versioned repository.
Transforms are versioned so that a given transform version maps on a given source code version.
Can I sync transforms to git?
If you switch on
sync_git_repo
a script-like transform is synched to its hashed state in a git repository upon callingln.track()
.>>> ln.settings.sync_git_repo = "https://github.com/laminlabs/lamindb" >>> ln.track()
The definition of transforms and runs is consistent the OpenLineage specification where a
Transform
record would be called a “job” and aRun
record a “run”.- Parameters:
name –
str
A name or title.key –
str | None = None
A short name or path-like semantic key.type –
TransformType | None = "pipeline"
SeeTransformType
.revises –
Transform | None = None
An old version of the transform.
Notes
Examples
Create a transform for a pipeline:
>>> transform = ln.Transform(name="Cell Ranger", version="7.2.0", type="pipeline").save()
Create a transform from a notebook:
>>> ln.track()
View predecessors of a transform:
>>> transform.view_lineage()
Attributes¶
- property stem_uid: str¶
Universal id characterizing the version family.
The full uid of a record is obtained via concatenating the stem uid and version information:
stem_uid = random_base62(n_char) # a random base62 sequence of length 12 (transform) or 16 (artifact, collection) version_uid = "0000" # an auto-incrementing 4-digit base62 number uid = f"{stem_uid}{version_uid}" # concatenate the stem_uid & version_uid
- property versions: QuerySet¶
Lists all records of the same version family.
>>> new_artifact = ln.Artifact(df2, revises=artifact) >>> new_artifact.save() >>> new_artifact.versions()
Simple fields¶
-
uid:
str
¶ Universal id.
-
name:
str
¶ A name or title. For instance, a pipeline name, notebook title, etc.
-
key:
str
¶ A key for concise reference & versioning (optional).
-
description:
str
¶ A description (optional).
-
type:
Literal
['pipeline'
,'notebook'
,'upload'
,'script'
,'function'
,'glue'
]¶ TransformType
(default"pipeline"
).
-
source_code:
str
|None
¶ Source code of the transform.
Changed in version 0.75: The
source_code
field is no longer an artifact, but a text field.
-
hash:
str
|None
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
reference:
str
¶ Reference for the transform, e.g.. URL.
-
reference_type:
str
¶ A wrapper for a deferred-loading field. When the value is read from this object the first time, the query is executed.
-
created_at:
datetime
¶ Time of creation of record.
-
updated_at:
datetime
¶ Time of last update to record.
-
version:
str
¶ Version (default
None
).Defines version of a family of records characterized by the same
stem_uid
.Consider using semantic versioning with Python versioning.
-
is_latest:
bool
¶ Boolean flag that indicates whether a record is the latest in its version family.
Relational fields¶
-
predecessors:
Transform
¶ Preceding transforms.
These are auto-populated whenever an artifact or collection serves as a run input, e.g.,
artifact.run
andartifact.transform
get populated & saved.The table provides a convenience method to query for the predecessors that bypassed querying the
Run
.
-
successors:
Transform
¶ Subsequent transforms.
See
predecessors
.
-
output_artifacts:
Artifact
¶ The artifacts generated by all runs of this transform.
If you’re looking for the outputs of a single run, see
lamindb.Run.output_artifacts
.
-
output_collections:
Collection
¶ The collections generated by all runs of this transform.
If you’re looking for the outputs of a single run, see
lamindb.Run.output_collections
.
Class methods¶
- classmethod df(include=None, join='inner', limit=100)¶
Convert to
pd.DataFrame
.By default, shows all direct fields, except
updated_at
.Use parameter
include
to include other fields.- Parameters:
include (
str
|list
[str
] |None
, default:None
) – Related fields to include as columns. Takes strings of form"labels__name"
,"cell_types__name"
, etc. or a list of such strings.join (
str
, default:'inner'
) – Thejoin
parameter ofpandas
.limit (
int
, default:100
) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.
- Return type:
DataFrame
Examples
>>> labels = [ln.ULabel(name="Label {i}") for i in range(3)] >>> ln.save(labels) >>> ln.ULabel.filter().df(include=["created_by__name"])
- classmethod filter(*queries, **expressions)¶
Query records.
- Parameters:
queries – One or multiple
Q
objects.expressions – Fields and values passed as Django query expressions.
- Return type:
QuerySet
- Returns:
A
QuerySet
.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ln.ULabel(name="my ulabel").save() >>> ulabel = ln.ULabel.get(name="my ulabel")
- classmethod get(idlike=None, **expressions)¶
Get a single record.
- Parameters:
idlike (
int
|str
|None
, default:None
) – Either a uid stub, uid or an integer id.expressions – Fields and values passed as Django query expressions.
- Return type:
- Returns:
A record.
- Raises:
lamindb.core.exceptions.DoesNotExist – In case no matching record is found.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ulabel = ln.ULabel.get("2riu039") >>> ulabel = ln.ULabel.get(name="my-label")
- classmethod lookup(field=None, return_field=None)¶
Return an auto-complete object for a field.
- Parameters:
field (
str
|DeferredAttribute
|None
, default:None
) – The field to look up the values for. Defaults to first string field.return_field (
str
|DeferredAttribute
|None
, default:None
) – The field to return. IfNone
, returns the whole record.
- Return type:
NamedTuple
- Returns:
A
NamedTuple
of lookup information of the field values with a dictionary converter.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> bt.Gene.from_source(symbol="ADGB-DT").save() >>> lookup = bt.Gene.lookup() >>> lookup.adgb_dt >>> lookup_dict = lookup.dict() >>> lookup_dict['ADGB-DT'] >>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") >>> genes.ensg00000002745 >>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
- classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
Search.
- Parameters:
string (
str
) – The input string to match against the field ontology values.field (
str
|DeferredAttribute
|None
, default:None
) – The field or fields to search. Search all string fields by default.limit (
int
|None
, default:20
) – Maximum amount of top results to return.case_sensitive (
bool
, default:False
) – Whether the match is case sensitive.
- Return type:
QuerySet
- Returns:
A sorted
DataFrame
of search results with a score in columnscore
. Ifreturn_queryset
isTrue
.QuerySet
.
Examples
>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name") >>> ln.save(ulabels) >>> ln.ULabel.search("ULabel2")
- classmethod using(instance)¶
Use a non-default LaminDB instance.
- Parameters:
instance (
str
|None
) – An instance identifier of form “account_handle/instance_name”.- Return type:
QuerySet
Examples
>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name") uid score name ULabel7 g7Hk9b2v 100.0 ULabel5 t4Jm6s0q 75.0 ULabel6 r2Xw8p1z 75.0
Methods¶
- delete()¶
- Return type:
None
- view_lineage(with_successors=False, distance=5)¶