lamindb.core.MuDataCurator

class lamindb.core.MuDataCurator(mdata, var_index, categoricals=None, using_key=None, verbosity='hint', organism=None, sources=None, exclude=None)

Bases: object

Curation flow for a MuData object.

See also Curator.

Note that if genes or other measurements are removed from the MuData object, the object should be recreated using from_mudata().

Parameters:
  • mdata (MuData) – The MuData object to curate.

  • var_index (dict[str, DeferredAttribute]) – The registry field for mapping the .var index for each modality. For example: {"modality_1": bt.Gene.ensembl_gene_id, "modality_2": ln.CellMarker.name}

  • categoricals (dict[str, DeferredAttribute] | None, default: None) – A dictionary mapping .obs.columns to a registry field. Use modality keys to specify categoricals for MuData slots such as "rna:cell_type": bt.CellType.name".

  • using_key (str | None, default: None) – A reference LaminDB instance.

  • verbosity (str, default: 'hint') – The verbosity level.

  • organism (str | None, default: None) – The organism name.

  • sources (dict[str, Record] | None, default: None) – A dictionary mapping .obs.columns to Source records.

  • exclude (dict | None, default: None) – A dictionary mapping column names to values to exclude from validation. When specific Source instances are pinned and may lack default values (e.g., “unknown” or “na”), using the exclude parameter ensures they are not validated.

Examples

>>> import bionty as bt
>>> curator = ln.Curator.from_mudata(
...     mdata,
...     var_index={
...         "rna": bt.Gene.ensembl_gene_id,
...         "adt": ln.CellMarker.name
...     },
...     categoricals={
...         "cell_type_ontology_id": bt.CellType.ontology_id,
...         "donor_id": ln.ULabel.name
...     },
...     organism="human",
... )

Attributes

property categoricals: dict

Return the obs fields to validate against.

property non_validated: dict[str, dict[str, list[str]]]

Return the non-validated features and labels.

property var_index: DeferredAttribute

Return the registry field to validate variables index against.

Methods

add_new_from(key, modality=None, organism=None, **kwargs)

Add validated & new categories.

Parameters:
  • key (str) – The key referencing the slot in the DataFrame.

  • modality (str | None, default: None) – The modality name.

  • organism (str | None, default: None) – The organism name.

  • **kwargs – Additional keyword arguments to pass to create new records.

add_new_from_columns(modality, column_names=None, organism=None, **kwargs)

Update columns records.

add_new_from_var_index(modality, organism=None, **kwargs)

Update variable records.

Parameters:
  • modality (str) – The modality name.

  • organism (str | None, default: None) – The organism name.

  • **kwargs – Additional keyword arguments to pass to create new records.

lookup(using_key=None, public=False)

Lookup categories.

Parameters:

using_key (str | None, default: None) – The instance where the lookup is performed. if “public”, the lookup is performed on the public reference.

Return type:

CurateLookup

save_artifact(description=None, key=None, revises=None, run=None)

Save the validated MuData and metadata.

Parameters:
  • description (str | None, default: None) – A description of the MuData object.

  • key (str | None, default: None) – A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a revision family.

  • revises (Artifact | None, default: None) – Previous version of the artifact. Triggers a revision.

  • run (Run | None, default: None) – The run that creates the artifact.

Return type:

Artifact

Returns:

A saved artifact record.

standardize(key, modality=None)

Replace synonyms with standardized values.

Parameters:
  • key (str) – The key referencing the slot in the MuData.

  • modality (str | None, default: None) – The modality name.

Inplace modification of the dataset.

validate(organism=None)

Validate categories.

Return type:

bool