lamindb.core.AnnDataCatManager

class lamindb.core.AnnDataCatManager(data, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), verbosity='hint', organism=None, sources=None, exclude=None)

Bases: CatManager

Manage categorical curation.

Parameters:
  • data (AnnData | Artifact) – The AnnData object or an AnnData-like path.

  • var_index (DeferredAttribute) – The registry field for mapping the .var index.

  • categoricals (dict[str, DeferredAttribute] | None, default: None) – A dictionary mapping .obs.columns to a registry field.

  • obs_columns (DeferredAttribute, default: FieldAttr(Feature.name)) – The registry field for mapping the .obs.columns.

  • verbosity (str, default: 'hint') – The verbosity level.

  • organism (str | None, default: None) – The organism name.

  • sources (dict[str, Record] | None, default: None) – A dictionary mapping .obs.columns to Source records.

  • exclude (dict | None, default: None) – A dictionary mapping column names to values to exclude from validation. When specific Source instances are pinned and may lack default values (e.g., “unknown” or “na”), using the exclude parameter ensures they are not validated.

Examples

>>> import bionty as bt
>>> curator = ln.Curator.from_anndata(
...     adata,
...     var_index=bt.Gene.ensembl_gene_id,
...     categoricals={
...         "cell_type_ontology_id": bt.CellType.ontology_id,
...         "donor_id": ULabel.name
...     },
...     organism="human",
... )

Attributes

property categoricals: dict

Return the obs fields to validate against.

property non_validated: dict[str, list[str]]

Return the non-validated features and labels.

property var_index: DeferredAttribute

Return the registry field to validate variables index against.

Class methods

classmethod from_anndata(data, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), verbosity='hint', organism=None, sources=None)
Return type:

AnnDataCatManager

classmethod from_df(df, categoricals=None, columns=FieldAttr(Feature.name), verbosity='hint', organism=None)
Return type:

DataFrameCatManager

classmethod from_mudata(mdata, var_index, categoricals=None, verbosity='hint', organism=None)
Return type:

MuDataCatManager

classmethod from_spatialdata(sdata, var_index, categoricals=None, organism=None, sources=None, exclude=None, verbosity='hint', *, sample_metadata_key='sample')
classmethod from_tiledbsoma(experiment_uri, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), organism=None, sources=None, exclude=None)
Return type:

TiledbsomaCatManager

Methods

add_new_from(key, **kwargs)

Add validated & new categories.

Parameters:
  • key (str) – The key referencing the slot in the DataFrame from which to draw terms.

  • organism – The organism name.

  • **kwargs – Additional keyword arguments to pass to create new records

add_new_from_var_index(**kwargs)

Update variable records.

Parameters:
  • organism – The organism name.

  • **kwargs – Additional keyword arguments to pass to create new records.

lookup(public=False)

Lookup categories.

Parameters:

public (bool, default: False) – If “public”, the lookup is performed on the public reference.

Return type:

CurateLookup

save_artifact(*, key=None, description=None, revises=None, run=None)

Save an annotated artifact.

Parameters:
  • key (str | None, default: None) – A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a version family.

  • description (str | None, default: None) – A description.

  • revises (Artifact | None, default: None) – Previous version of the artifact. Is an alternative way to passing key to trigger a new version.

  • run (Run | None, default: None) – The run that creates the artifact.

Return type:

Artifact

Returns:

A saved artifact record.

standardize(key)

Replace synonyms with standardized values.

Parameters:

key (str) –

The key referencing the slot in adata.obs from which to draw terms. Same as the key in categoricals.

  • If “var_index”, standardize the var.index.

  • If “all”, standardize all obs columns and var.index.

Inplace modification of the dataset.

validate()

Validate categories.

This method also registers the validated records in the current instance.

Parameters:

organism – The organism name.

Return type:

bool

Returns:

Whether the AnnData object is validated.