lamindb.Transform¶
- class lamindb.Transform(key: str | None = None, type: TransformType | None = None, version: str | None = None, description: str | None = None, reference: str | None = None, reference_type: str | None = None, source_code: str | None = None, revises: Transform | None = None, skip_hash_lookup: bool = False)¶
Bases:
SQLRecord,IsVersionedData transformations such as scripts, notebooks, functions, or pipelines.
A
transformcan be a function, a script, a notebook, or a pipeline. If you execute a transform, you generate a run (Run). A run has inputs and outputs.Pipelines are typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, Dagster, redun, Airflow, …).
Transforms are versioned so that a given transform version maps on a given source code version.
Can I sync transforms to git?
If you switch on
sync_git_repoa script-like transform is synched to its hashed state in a git repository upon callingln.track():ln.settings.sync_git_repo = "https://github.com/laminlabs/lamindb" ln.track()
Alternatively, you create transforms that map pipelines via
Transform.from_git().The definition of transforms and runs is consistent with the OpenLineage specification where a
transformwould be called a “job” and aruna “run”.- Parameters:
key –
str | None = NoneA short name or path-like semantic key.type –
TransformType | None = "pipeline"SeeTransformType.version –
str | None = NoneA version string.description –
str | None = NoneA description.reference –
str | None = NoneA reference, e.g., a URL.reference_type –
str | None = NoneA reference type, e.g., ‘url’.source_code –
str | None = NoneSource code of the transform.revises –
Transform | None = NoneAn old version of the transform.skip_hash_lookup –
bool = FalseSkip the hash lookup so that a new transform is created even if a transform with the same hash already exists.
Notes
Examples
Create a transform for a pipeline:
transform = ln.Transform(key="Cell Ranger", version="7.2.0", type="pipeline").save()
Create a transform from a notebook:
ln.track()
Attributes¶
- property stem_uid: str¶
Universal id characterizing the version family.
The full uid of a record is obtained via concatenating the stem uid and version information:
stem_uid = random_base62(n_char) # a random base62 sequence of length 12 (transform) or 16 (artifact, collection) version_uid = "0000" # an auto-incrementing 4-digit base62 number uid = f"{stem_uid}{version_uid}" # concatenate the stem_uid & version_uid
Simple fields¶
- uid: str¶
Universal id.
- key: str | None¶
A name or “/”-separated path-like string.
All transforms with the same key are part of the same version family.
- description: str | None¶
A description.
- type: TransformType¶
TransformType(default"pipeline").
- source_code: str | None¶
Source code of the transform.
- hash: str | None¶
Hash of the source code.
- reference: str | None¶
Reference for the transform, e.g., a URL.
- reference_type: str | None¶
Reference type of the transform, e.g., ‘url’.
- config: str | None¶
Optional configuration for the transform.
- is_flow: bool¶
Whether this transform is a flow orchestrating other transforms.
- created_at: datetime¶
Time of creation of record.
- updated_at: datetime¶
Time of last update to record.
- version: str | None¶
Version (default
None).Defines version of a family of records characterized by the same
stem_uid.Consider using semantic versioning with Python versioning.
- is_latest: bool¶
Boolean flag that indicates whether a record is the latest in its version family.
- is_locked: bool¶
Whether the record is locked for edits.
Relational fields¶
-
branch:
Branch¶ Life cycle state of record.
branch.namecan be “main” (default branch), “trash” (trash),branch.name = "archive"(archived), or any other user-created branch typically planned for merging onto main after review.
-
predecessors:
Transform¶ Preceding transforms.
Allows manually defining preceding transforms. Is typically not necessary as data lineage is automatically tracked via runs whenever an artifact or collection serves as an input for a run.
-
successors:
Transform¶ Subsequent transforms.
See
predecessors.
- records¶
Accessor to the related objects manager on the forward and reverse sides of a many-to-many relation.
In the example:
class Pizza(Model): toppings = ManyToManyField(Topping, related_name='pizzas')
Pizza.toppingsandTopping.pizzasareManyToManyDescriptorinstances.Most of the implementation is delegated to a dynamically defined manager class built by
create_forward_many_to_many_manager()defined below.
- blocks: TransformBlock¶
Blocks that annotate this artifact.
Class methods¶
- classmethod from_git(url, path, key=None, version=None, entrypoint=None, branch=None, skip_hash_lookup=False)¶
Create a transform from a path in a git repository.
- Parameters:
url (
str) – URL of the git repository.path (
str) – Path to the file within the repository.key (
str|None, default:None) – Optional key for the transform.version (
str|None, default:None) – Optional version tag to checkout in the repository.entrypoint (
str|None, default:None) – Optional entrypoint for the transform.branch (
str|None, default:None) – Optional branch to checkout.skip_hash_lookup (
bool, default:False) – Skip the hash lookup so that a new transform is created even if a transform with the same hash already exists.
- Return type:
Examples
Create from a Nextflow repo and auto-infer the commit hash from its latest version:
transform = ln.Transform.from_git( url="https://github.com/openproblems-bio/task_batch_integration", path="main.nf" ).save()
Create from a Nextflow repo and checkout a specific version:
transform = ln.Transform.from_git( url="https://github.com/openproblems-bio/task_batch_integration", path="main.nf", version="v2.0.0" ).save() assert transform.version == "v2.0.0"
Create a sliding transform from a Nextflow repo’s
devbranch. Unlike a regular transform, a sliding transform doesn’t pin a specific source code state, but adapts to whatever the referenced state on the branch is:transform = ln.Transform.from_git( url="https://github.com/openproblems-bio/task_batch_integration", path="main.nf", branch="dev", version="dev", ).save()
Notes
A regular transform pins a specific source code state through its commit hash:
transform.source_code #> repo: https://github.com/openproblems-bio/task_batch_integration #> path: main.nf #> commit: 68eb2ecc52990617dbb6d1bb5c7158d9893796bb
A sliding transform infers the source code state from a branch:
transform.source_code #> repo: https://github.com/openproblems-bio/task_batch_integration #> path: main.nf #> branch: dev
If an entrypoint is provided, it is added to the source code below the path, e.g.:
transform.source_code #> repo: https://github.com/openproblems-bio/task_batch_integration #> path: main.nf #> entrypoint: myentrypoint #> commit: 68eb2ecc52990617dbb6d1bb5c7158d9893796bb
- classmethod filter(*queries, **expressions)¶
Query records.
- Parameters:
queries – One or multiple
Qobjects.expressions – Fields and values passed as Django query expressions.
- Return type:
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ln.Project(name="my label").save() >>> ln.Project.filter(name__startswith="my").to_dataframe()
- classmethod get(idlike=None, **expressions)¶
Get a single record.
- Parameters:
idlike (
int|str|None, default:None) – Either a uid stub, uid or an integer id.expressions – Fields and values passed as Django query expressions.
- Raises:
lamindb.errors.DoesNotExist – In case no matching record is found.
- Return type:
See also
Guide: Query & search registries
Django documentation: Queries
Examples
record = ln.Record.get("FvtpPJLJ") record = ln.Record.get(name="my-label")
- classmethod to_dataframe(include=None, features=False, limit=100)¶
Evaluate and convert to
pd.DataFrame.By default, maps simple fields and foreign keys onto
DataFramecolumns.Guide: Query & search registries
- Parameters:
include (
str|list[str] |None, default:None) – Related data to include as columns. Takes strings of form"records__name","cell_types__name", etc. or a list of such strings. ForArtifact,Record, andRun, can also pass"features"to include features with data types pointing to entities in the core schema. If"privates", includes private fields (fields starting with_).features (
bool|list[str], default:False) – Configure the features to include. Can be a feature name or a list of such names. If"queryset", infers the features used within the current queryset. Only available forArtifact,Record, andRun.limit (
int, default:100) – Maximum number of rows to display. IfNone, includes all results.order_by – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.
- Return type:
DataFrame
Examples
Include the name of the creator:
ln.Record.to_dataframe(include="created_by__name"])
Include features:
ln.Artifact.to_dataframe(include="features")
Include selected features:
ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])
- classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
Search.
- Parameters:
string (
str) – The input string to match against the field ontology values.field (
str|DeferredAttribute|None, default:None) – The field or fields to search. Search all string fields by default.limit (
int|None, default:20) – Maximum amount of top results to return.case_sensitive (
bool, default:False) – Whether the match is case sensitive.
- Return type:
- Returns:
A sorted
DataFrameof search results with a score in columnscore. Ifreturn_querysetisTrue.QuerySet.
Examples
records = ln.Record.from_values(["Label1", "Label2", "Label3"], field="name").save() ln.Record.search("Label2")
- classmethod lookup(field=None, return_field=None)¶
Return an auto-complete object for a field.
- Parameters:
field (
str|DeferredAttribute|None, default:None) – The field to look up the values for. Defaults to first string field.return_field (
str|DeferredAttribute|None, default:None) – The field to return. IfNone, returns the whole record.keep – When multiple records are found for a lookup, how to return the records. -
"first": return the first record. -"last": return the last record. -False: return all records.
- Return type:
NamedTuple- Returns:
A
NamedTupleof lookup information of the field values with a dictionary converter.
See also
Examples
Lookup via auto-complete on
.:import bionty as bt bt.Gene.from_source(symbol="ADGB-DT").save() lookup = bt.Gene.lookup() lookup.adgb_dt
Look up via auto-complete in dictionary:
lookup_dict = lookup.dict() lookup_dict['ADGB-DT']
Look up via a specific field:
lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") genes.ensg00000002745
Return a specific field value instead of the full record:
lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
Methods¶
- describe(return_str=False)¶
Describe record including relations.
- Parameters:
return_str (
bool, default:False) – Return a string instead of printing.- Return type:
None|str
- view_lineage(with_successors=False, distance=5)¶
View lineage of transforms.
Note that this only accounts for manually defined predecessors and successors.
Auto-generate lineage through inputs and outputs of runs is not included.
- restore()¶
Restore from trash onto the main branch.
Does not restore descendant records if the record is
HasTypewithis_type = True.- Return type:
None
- delete(permanent=None, **kwargs)¶
Delete record.
If record is
HasTypewithis_type = True, deletes all descendant records, too.- Parameters:
permanent (
bool|None, default:None) – Whether to permanently delete the record (skips trash). IfNone, performs soft delete if the record is not already in the trash.- Return type:
None
Examples
For any
SQLRecordobjectrecord, call:>>> record.delete()
- save(*args, **kwargs)¶
Save.
Always saves to the default database.
- Return type:
TypeVar(T, bound= SQLRecord)
- refresh_from_db(using=None, fields=None, from_queryset=None)¶
Reload field values from the database.
By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.
Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.
When accessing deferred fields of an instance, the deferred loading of the field will call this method.
- async arefresh_from_db(using=None, fields=None, from_queryset=None)¶