lamindb.core.SOMACurator

class lamindb.core.SOMACurator(experiment_uri, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), organism=None, sources=None, exclude=None, using_key=None)

Bases: BaseCurator

Curation flow for tiledbsoma.

See also Curator.

Parameters:
  • experiment_uri (UPathStr | Artifact) – A local or cloud path to a tiledbsoma.Experiment.

  • var_index (dict[str, tuple[str, FieldAttr]]) – The registry fields for mapping the .var indices for measurements. Should be in the form {"measurement name": ("var column", field)}. These keys should be used in the flattened form ('{measurement name}__{column name in .var}') in .standardize or .add_new_from, see the output of .var_index.

  • categoricals (dict[str, FieldAttr] | None, default: None) – A dictionary mapping categorical .obs columns to a registry field.

  • obs_columns (FieldAttr, default: FieldAttr(Feature.name)) – The registry field for mapping the names of the .obs columns.

  • organism (str | None, default: None) – The organism name.

  • sources (dict[str, Record] | None, default: None) – A dictionary mapping .obs columns to Source records.

  • exclude (dict[str, str | list[str]] | None, default: None) – A dictionary mapping column names to values to exclude from validation. When specific Source instances are pinned and may lack default values (e.g., “unknown” or “na”), using the exclude parameter ensures they are not validated.

Examples

>>> import bionty as bt
>>> curator = ln.Curator.from_tiledbsoma(
...     "./my_array_store.tiledbsoma",
...     var_index={"RNA": ("var_id", bt.Gene.symbol)},
...     categoricals={
...         "cell_type_ontology_id": bt.CellType.ontology_id,
...         "donor_id": ln.ULabel.name
...     },
...     organism="human",
... )

Attributes

property categoricals: dict[str, DeferredAttribute]

Return the obs fields to validate against.

property non_validated: dict[str, list]

Return the non-validated features and labels.

property var_index: dict[str, DeferredAttribute]

Return the registry fields with flattened keys to validate variables indices against.

Methods

add_new_from(key)

Add validated & new categories.

Parameters:

key (str) – The key referencing the slot in the tiledbsoma store. It should be '{measurement name}__{column name in .var}' for columns in .var or a column name in .obs.

Return type:

None

lookup(using_key=None, public=False)

Lookup categories.

Parameters:

using_key (str | None, default: None) – The instance where the lookup is performed. if “public”, the lookup is performed on the public reference.

Return type:

CurateLookup

save_artifact(description=None, key=None, revises=None, run=None)

Save the validated tiledbsoma store and metadata.

Parameters:
  • description (str | None, default: None) – A description of the tiledbsoma store.

  • key (str | None, default: None) – A path-like key to reference artifact in default storage, e.g., "myfolder/mystore.tiledbsoma". Artifacts with the same key form a revision family.

  • revises (Artifact | None, default: None) – Previous version of the artifact. Triggers a revision.

  • run (Run | None, default: None) – The run that creates the artifact.

Return type:

Artifact

Returns:

A saved artifact record.

standardize(key)

Replace synonyms with standardized values.

Modifies the dataset inplace.

Parameters:

key (str) – The key referencing the slot in the tiledbsoma store. It should be '{measurement name}__{column name in .var}' for columns in .var or a column name in .obs.

validate()

Validate categories.