lamindb.core.MuDataCurator

class lamindb.core.MuDataCurator(mdata, var_index, categoricals=None, using_key='default', verbosity='hint', organism=None, sources=None, exclude=None)

Bases: object

Curation flow for a MuData object.

See also Curator.

Note that if genes or other measurements are removed from the MuData object, the object should be recreated using from_mudata().

Parameters:
  • mdata (MuData) – The MuData object to curate.

  • var_index (dict[str, dict[str, DeferredAttribute]]) – The registry field for mapping the .var index for each modality. For example: {"modality_1": bt.Gene.ensembl_gene_id, "modality_2": ln.CellMarker.name}

  • categoricals (dict[str, DeferredAttribute] | None, default: None) – A dictionary mapping .obs.columns to a registry field. Use modality keys to specify categoricals for MuData slots such as "rna:cell_type": bt.CellType.name".

  • using_key (str, default: 'default') – A reference LaminDB instance.

  • verbosity (str, default: 'hint') – The verbosity level.

  • organism (str | None, default: None) – The organism name.

  • sources (dict[str, Record] | None, default: None) – A dictionary mapping .obs.columns to Source records.

  • exclude (dict | None, default: None) – A dictionary mapping column names to values to exclude.

Examples

>>> import bionty as bt
>>> curate = ln.Curator.from_mudata(
...     mdata,
...     var_index={
...         "rna": bt.Gene.ensembl_gene_id,
...         "adt": ln.CellMarker.name
...     },
...     categoricals={
...         "cell_type_ontology_id": bt.CellType.ontology_id,
...         "donor_id": ln.ULabel.name
...     },
...     organism="human",
... )

Attributes

property categoricals: dict

Return the obs fields to validate against.

property var_index: DeferredAttribute

Return the registry field to validate variables index against.

Methods

add_new_from(key, modality=None, organism=None, **kwargs)

Add validated & new categories.

Parameters:
  • key (str) – The key referencing the slot in the DataFrame.

  • modality (str | None, default: None) – The modality name.

  • organism (str | None, default: None) – The organism name.

  • **kwargs – Additional keyword arguments to pass to the registry model.

add_new_from_columns(modality, column_names=None, organism=None, **kwargs)

Update columns records.

Parameters:
  • modality (str) – The modality name.

  • column_names (list[str] | None, default: None) – The column names to save.

  • organism (str | None, default: None) – The organism name.

  • **kwargs – Additional keyword arguments to pass to the registry model.

add_new_from_var_index(modality, organism=None, **kwargs)

Update variable records.

Parameters:
  • modality (str) – The modality name.

  • organism (str | None, default: None) – The organism name.

  • **kwargs – Additional keyword arguments to pass to the registry model.

lookup(using_key=None, public=False)

Lookup categories.

Parameters:

using_key (str | None, default: None) – The instance where the lookup is performed. if None (default), the lookup is performed on the instance specified in “using_key” parameter of the validator. if “public”, the lookup is performed on the public reference.

Return type:

CurateLookup

save_artifact(description=None, key=None, revises=None, run=None)

Save the validated MuData and metadata.

Parameters:
  • description (str | None, default: None) – str | None = None A description of the MuData object.

  • key (str | None, default: None) – str | None = None A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a revision family.

  • revises (Artifact | None, default: None) – Artifact | None = None Previous version of the artifact. Triggers a revision.

  • run (Run | None, default: None) – Run | None = None The run that creates the artifact.

Return type:

Artifact

Returns:

A saved artifact record.

validate(organism=None)

Validate categories.

Return type:

bool