lamindb.core.AnnDataCurator

class lamindb.core.AnnDataCurator(data, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), using_key='default', verbosity='hint', organism=None, sources=None, exclude=None)

Bases: DataFrameCurator

Curation flow for AnnData.

See also Curator.

Note that if genes are removed from the AnnData object, the object should be recreated using from_anndata().

See Curate AnnData based on the CELLxGENE schema for instructions on how to curate against a specific cellxgene schema version.

Parameters:
  • data (ad.AnnData | UPathStr) – The AnnData object or an AnnData-like path.

  • var_index (FieldAttr) – The registry field for mapping the .var index.

  • categoricals (dict[str, FieldAttr] | None, default: None) – A dictionary mapping .obs.columns to a registry field.

  • using_key (str, default: 'default') – A reference LaminDB instance.

  • verbosity (str, default: 'hint') – The verbosity level.

  • organism (str | None, default: None) – The organism name.

  • sources (dict[str, Record] | None, default: None) – A dictionary mapping .obs.columns to Source records.

  • exclude (dict | None, default: None) – A dictionary mapping column names to values to exclude.

Examples

>>> import bionty as bt
>>> curate = ln.Curator.from_anndata(
...     adata,
...     var_index=bt.Gene.ensembl_gene_id,
...     categoricals={
...         "cell_type_ontology_id": bt.CellType.ontology_id,
...         "donor_id": ln.ULabel.name
...     },
...     organism="human",
... )

Attributes

property categoricals: dict

Return the obs fields to validate against.

property fields: dict

Return the columns fields to validate against.

property non_validated: list

Return the non-validated features and labels.

property var_index: DeferredAttribute

Return the registry field to validate variables index against.

Methods

add_new_from(key, organism=None, **kwargs)

Add validated & new categories.

Parameters:
  • key (str) – The key referencing the slot in the DataFrame from which to draw terms.

  • organism (str | None, default: None) – The organism name.

  • **kwargs – Additional keyword arguments to pass to the registry model.

add_new_from_columns(organism=None, **kwargs)

Add validated & new column names to its registry.

Parameters:
  • organism (str | None, default: None) – The organism name.

  • **kwargs – Additional keyword arguments to pass to the registry model.

add_new_from_var_index(organism=None, **kwargs)

Update variable records.

Parameters:
  • organism (str | None, default: None) – The organism name.

  • **kwargs – Additional keyword arguments to pass to the registry model.

add_validated_from(key, organism=None)

Add validated categories.

Parameters:
  • key (str) – The key referencing the slot in the DataFrame.

  • organism (str | None, default: None) – The organism name.

add_validated_from_var_index(organism=None)

Add validated variable records.

Parameters:

organism (str | None, default: None) – The organism name.

clean_up_failed_runs()

Clean up previous failed runs that don’t save any outputs.

lookup(using_key=None)

Lookup categories.

Parameters:

using_key (str | None, default: None) – The instance where the lookup is performed. if None (default), the lookup is performed on the instance specified in “using” parameter of the validator. if “public”, the lookup is performed on the public reference.

Return type:

CurateLookup

save_artifact(description=None, **kwargs)

Save the validated AnnData and metadata.

Parameters:
  • description (str | None, default: None) – Description of the AnnData object.

  • **kwargs – Object level metadata.

Return type:

Artifact

Returns:

A saved artifact record.

validate(organism=None)

Validate categories.

Parameters:

organism (str | None, default: None) – The organism name.

Return type:

bool

Returns:

Whether the AnnData object is validated.