lamindb.Transform¶
- class lamindb.Transform(key: str | None = None, type: TransformType | None = None, version: str | None = None, description: str | None = None, reference: str | None = None, reference_type: str | None = None, source_code: str | None = None, revises: Transform | None = None)¶
Bases:
SQLRecord,IsVersionedData transformations such as scripts, notebooks, functions, or pipelines.
A
transformcan be a function, a script, a notebook, or a pipeline. If you execute a transform, you generate a run (Run). A run has inputs and outputs.Pipelines are typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, Dagster, redun, Airflow, …).
Transforms are versioned so that a given transform version maps on a given source code version.
Can I sync transforms to git?
If you switch on
sync_git_repoa script-like transform is synched to its hashed state in a git repository upon callingln.track():ln.settings.sync_git_repo = "https://github.com/laminlabs/lamindb" ln.track()
Alternatively, you create transforms that map pipelines via
Transform.from_git().The definition of transforms and runs is consistent with the OpenLineage specification where a
transformwould be called a “job” and aruna “run”.- Parameters:
key –
str | None = NoneA short name or path-like semantic key.type –
TransformType | None = "pipeline"SeeTransformType.version –
str | None = NoneA version string.description –
str | None = NoneA description.reference –
str | None = NoneA reference, e.g., a URL.reference_type –
str | None = NoneA reference type, e.g., ‘url’.source_code –
str | None = NoneSource code of the transform.revises –
Transform | None = NoneAn old version of the transform.
Notes
Examples
Create a transform for a pipeline:
transform = ln.Transform(key="Cell Ranger", version="7.2.0", type="pipeline").save()
Create a transform from a notebook:
ln.track()
Attributes¶
- property stem_uid: str¶
Universal id characterizing the version family.
The full uid of a record is obtained via concatenating the stem uid and version information:
stem_uid = random_base62(n_char) # a random base62 sequence of length 12 (transform) or 16 (artifact, collection) version_uid = "0000" # an auto-incrementing 4-digit base62 number uid = f"{stem_uid}{version_uid}" # concatenate the stem_uid & version_uid
Simple fields¶
- uid: str¶
Universal id.
- key: str | None¶
A name or “/”-separated path-like string.
All transforms with the same key are part of the same version family.
- description: str | None¶
A description.
- type: TransformType¶
TransformType(default"pipeline").
- source_code: str | None¶
Source code of the transform.
- hash: str | None¶
Hash of the source code.
- reference: str | None¶
Reference for the transform, e.g., a URL.
- reference_type: str | None¶
Reference type of the transform, e.g., ‘url’.
- created_at: datetime¶
Time of creation of record.
- updated_at: datetime¶
Time of last update to record.
- version: str | None¶
Version (default
None).Defines version of a family of records characterized by the same
stem_uid.Consider using semantic versioning with Python versioning.
- is_latest: bool¶
Boolean flag that indicates whether a record is the latest in its version family.
- is_locked: bool¶
Whether the record is locked for edits.
Relational fields¶
-
predecessors:
Transform¶ Preceding transforms.
Allows to _manually_ define predecessors. Is typically not necessary as data lineage is automatically tracked via runs whenever an artifact or collection serves as an input for a run.
-
successors:
Transform¶ Subsequent transforms.
See
predecessors.
- blocks: TransformBlock¶
Blocks that annotate this artifact.
Class methods¶
- classmethod from_git(url, path, key=None, version=None, entrypoint=None, branch=None)¶
Create a transform from a path in a git repository.
- Parameters:
url (
str) – URL of the git repository.path (
str) – Path to the file within the repository.key (
str|None, default:None) – Optional key for the transform.version (
str|None, default:None) – Optional version tag to checkout in the repository.entrypoint (
str|None, default:None) – Optional entrypoint for the transform.branch (
str|None, default:None) – Optional branch to checkout.
- Return type:
Examples
Create from a Nextflow repo and auto-infer the commit hash from its latest version:
transform = ln.Transform.from_git( url="https://github.com/openproblems-bio/task_batch_integration", path="main.nf" ).save()
Create from a Nextflow repo and checkout a specific version:
transform = ln.Transform.from_git( url="https://github.com/openproblems-bio/task_batch_integration", path="main.nf", version="v2.0.0" ).save() assert transform.version == "v2.0.0"
Create a sliding transform from a Nextflow repo’s
devbranch. Unlike a regular transform, a sliding transform doesn’t pin a specific source code state, but adapts to whatever the referenced state on the branch is:transform = ln.Transform.from_git( url="https://github.com/openproblems-bio/task_batch_integration", path="main.nf", branch="dev", version="dev", ).save()
Notes
A regular transform pins a specific source code state through its commit hash:
transform.source_code #> repo: https://github.com/openproblems-bio/task_batch_integration #> path: main.nf #> commit: 68eb2ecc52990617dbb6d1bb5c7158d9893796bb
A sliding transform infers the source code state from a branch:
transform.source_code #> repo: https://github.com/openproblems-bio/task_batch_integration #> path: main.nf #> branch: dev
If an entrypoint is provided, it is added to the source code below the path, e.g.:
transform.source_code #> repo: https://github.com/openproblems-bio/task_batch_integration #> path: main.nf #> entrypoint: myentrypoint #> commit: 68eb2ecc52990617dbb6d1bb5c7158d9893796bb
- classmethod filter(*queries, **expressions)¶
Query records.
- Parameters:
queries – One or multiple
Qobjects.expressions – Fields and values passed as Django query expressions.
- Return type:
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ln.Project(name="my label").save() >>> ln.Project.filter(name__startswith="my").to_dataframe()
- classmethod get(idlike=None, **expressions)¶
Get a single record.
- Parameters:
idlike (
int|str|None, default:None) – Either a uid stub, uid or an integer id.expressions – Fields and values passed as Django query expressions.
- Raises:
lamindb.errors.DoesNotExist – In case no matching record is found.
- Return type:
See also
Guide: Query & search registries
Django documentation: Queries
Examples
ulabel = ln.ULabel.get("FvtpPJLJ") ulabel = ln.ULabel.get(name="my-label")
- classmethod to_dataframe(include=None, features=False, limit=100)¶
Evaluate and convert to
pd.DataFrame.By default, maps simple fields and foreign keys onto
DataFramecolumns.Guide: Query & search registries
- Parameters:
include (
str|list[str] |None, default:None) – Related data to include as columns. Takes strings of form"records__name","cell_types__name", etc. or a list of such strings. ForArtifact,Record, andRun, can also pass"features"to include features with data types pointing to entities in the core schema. If"privates", includes private fields (fields starting with_).features (
bool|list[str], default:False) – Configure the features to include. Can be a feature name or a list of such names. If"queryset", infers the features used within the current queryset. Only available forArtifact,Record, andRun.limit (
int, default:100) – Maximum number of rows to display. IfNone, includes all results.order_by – Field name to order the records by. Prefix with ‘-’ for descending order. Defaults to ‘-id’ to get the most recent records. This argument is ignored if the queryset is already ordered or if the specified field does not exist.
- Return type:
DataFrame
Examples
Include the name of the creator:
ln.Record.to_dataframe(include="created_by__name"])
Include features:
ln.Artifact.to_dataframe(include="features")
Include selected features:
ln.Artifact.to_dataframe(features=["cell_type_by_expert", "cell_type_by_model"])
- classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
Search.
- Parameters:
string (
str) – The input string to match against the field ontology values.field (
str|DeferredAttribute|None, default:None) – The field or fields to search. Search all string fields by default.limit (
int|None, default:20) – Maximum amount of top results to return.case_sensitive (
bool, default:False) – Whether the match is case sensitive.
- Return type:
- Returns:
A sorted
DataFrameof search results with a score in columnscore. Ifreturn_querysetisTrue.QuerySet.
Examples
>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name") >>> ln.save(ulabels) >>> ln.ULabel.search("ULabel2")
- classmethod lookup(field=None, return_field=None)¶
Return an auto-complete object for a field.
- Parameters:
field (
str|DeferredAttribute|None, default:None) – The field to look up the values for. Defaults to first string field.return_field (
str|DeferredAttribute|None, default:None) – The field to return. IfNone, returns the whole record.keep – When multiple records are found for a lookup, how to return the records. -
"first": return the first record. -"last": return the last record. -False: return all records.
- Return type:
NamedTuple- Returns:
A
NamedTupleof lookup information of the field values with a dictionary converter.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> bt.Gene.from_source(symbol="ADGB-DT").save() >>> lookup = bt.Gene.lookup() >>> lookup.adgb_dt >>> lookup_dict = lookup.dict() >>> lookup_dict['ADGB-DT'] >>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") >>> genes.ensg00000002745 >>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
- classmethod using(instance)¶
Use a non-default LaminDB instance.
- Parameters:
instance (
str|None) – An instance identifier of form “account_handle/instance_name”.- Return type:
Examples
>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name") uid score name ULabel7 g7Hk9b2v 100.0 ULabel5 t4Jm6s0q 75.0 ULabel6 r2Xw8p1z 75.0
Methods¶
- describe(return_str=False)¶
Describe record including relations.
- Parameters:
return_str (
bool, default:False) – Return a string instead of printing.- Return type:
None|str
- view_lineage(with_successors=False, distance=5)¶
View lineage of transforms.
Note that this only accounts for manually defined predecessors and successors.
Auto-generate lineage through inputs and outputs of runs is not included.
- restore()¶
Restore from trash onto the main branch.
- Return type:
None
- delete(permanent=None, **kwargs)¶
Delete record.
- Parameters:
permanent (
bool|None, default:None) – Whether to permanently delete the record (skips trash). IfNone, performs soft delete if the record is not already in the trash.- Return type:
None
Examples
For any
SQLRecordobjectrecord, call:>>> record.delete()
- save(*args, **kwargs)¶
Save.
Always saves to the default database.
- Return type:
TypeVar(T, bound= SQLRecord)