lamindb.curators.MuDataCatManager

class lamindb.curators.MuDataCatManager(mdata, var_index=None, categoricals=None, verbosity='hint', organism=None, sources=None)

Bases: CatManager

Curation flow for a MuData object.

Parameters:
  • mdata (MuData | Artifact) – The MuData object to curate.

  • var_index (dict[str, FieldAttr] | None, default: None) – The registry field for mapping the .var index for each modality. For example: {"modality_1": bt.Gene.ensembl_gene_id, "modality_2": CellMarker.name}

  • categoricals (dict[str, FieldAttr] | None, default: None) – A dictionary mapping .obs.columns to a registry field. Use modality keys to specify categoricals for MuData slots such as "rna:cell_type": bt.CellType.name".

  • verbosity (str, default: 'hint') – The verbosity level.

  • organism (str | None, default: None) – The organism name.

  • sources (dict[str, Record] | None, default: None) – A dictionary mapping .obs.columns to Source records.

Example:

import lamindb as ln
import bionty as bt

curator = ln.curators.MuDataCatManager(
    mdata,
    var_index={
        "rna": bt.Gene.ensembl_gene_id,
        "adt": CellMarker.name
    },
    categoricals={
        "cell_type_ontology_id": bt.CellType.ontology_id,
        "donor_id": ULabel.name
    },
)

Attributes

property categoricals: dict

Return the obs fields to validate against.

property non_validated: dict[str, dict[str, list[str]]]

Return the non-validated features and labels.

property var_index: DeferredAttribute

Return the registry field to validate variables index against.

Class methods

classmethod from_anndata(data, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), verbosity='hint', organism=None, sources=None)
Return type:

AnnDataCatManager

classmethod from_df(df, categoricals=None, columns=FieldAttr(Feature.name), verbosity='hint', organism=None)
Return type:

DataFrameCatManager

classmethod from_mudata(mdata, var_index, categoricals=None, verbosity='hint', organism=None)
Return type:

MuDataCatManager

classmethod from_spatialdata(sdata, var_index, categoricals=None, organism=None, sources=None, verbosity='hint', *, sample_metadata_key='sample')
classmethod from_tiledbsoma(experiment_uri, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), organism=None, sources=None)
Return type:

TiledbsomaCatManager

Methods

add_new_from(key, modality=None, **kwargs)

Add validated & new categories.

Parameters:
  • key (str) – The key referencing the slot in the DataFrame.

  • modality (str | None, default: None) – The modality name.

  • organism – The organism name.

  • **kwargs – Additional keyword arguments to pass to create new records.

add_new_from_columns(modality, column_names=None, **kwargs)
add_new_from_var_index(modality, **kwargs)

Update variable records.

Parameters:
  • modality (str) – The modality name.

  • organism – The organism name.

  • **kwargs – Additional keyword arguments to pass to create new records.

lookup(public=False)

Lookup categories.

Parameters:

public (bool, default: False) – Perform lookup on public source ontologies.

Return type:

CurateLookup

save_artifact(*, key=None, description=None, revises=None, run=None)

Save an annotated artifact.

Parameters:
  • key (str | None, default: None) – A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a version family.

  • description (str | None, default: None) – A description.

  • revises (Artifact | None, default: None) – Previous version of the artifact. Is an alternative way to passing key to trigger a new version.

  • run (Run | None, default: None) – The run that creates the artifact.

Return type:

Artifact

Returns:

A saved artifact record.

standardize(key, modality=None)

Replace synonyms with standardized values.

Parameters:
  • key (str) – The key referencing the slot in the MuData.

  • modality (str | None, default: None) – The modality name.

Inplace modification of the dataset.

validate()

Validate categories.

Return type:

bool