lamindb.core.AnnDataCurator¶
- class lamindb.core.AnnDataCurator(data, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), using_key='default', verbosity='hint', organism=None, sources=None, exclude=None)¶
Bases:
DataFrameCurator
Curation flow for
AnnData
.See also
Curator
.Note that if genes are removed from the AnnData object, the object should be recreated using
from_anndata()
.See Curate AnnData based on the CELLxGENE schema for instructions on how to curate against a specific cellxgene schema version.
- Parameters:
data (ad.AnnData | UPathStr) – The AnnData object or an AnnData-like path.
var_index (FieldAttr) – The registry field for mapping the
.var
index.categoricals (dict[str, FieldAttr] | None, default:
None
) – A dictionary mapping.obs.columns
to a registry field.using_key (str, default:
'default'
) – A reference LaminDB instance.verbosity (str, default:
'hint'
) – The verbosity level.organism (str | None, default:
None
) – The organism name.sources (dict[str, Record] | None, default:
None
) – A dictionary mapping.obs.columns
to Source records.exclude (dict | None, default:
None
) – A dictionary mapping column names to values to exclude.
Examples
>>> import bionty as bt >>> curate = ln.Curator.from_anndata( ... adata, ... var_index=bt.Gene.ensembl_gene_id, ... categoricals={ ... "cell_type_ontology_id": bt.CellType.ontology_id, ... "donor_id": ln.ULabel.name ... }, ... organism="human", ... )
Attributes¶
- property categoricals: dict¶
Return the obs fields to validate against.
- property fields: dict¶
Return the columns fields to validate against.
- property non_validated: list¶
Return the non-validated features and labels.
- property var_index: DeferredAttribute¶
Return the registry field to validate variables index against.
Methods¶
- add_new_from(key, organism=None, **kwargs)¶
Add validated & new categories.
- Parameters:
key (
str
) – The key referencing the slot in the DataFrame from which to draw terms.organism (
str
|None
, default:None
) – The organism name.**kwargs – Additional keyword arguments to pass to the registry model.
- add_new_from_columns(organism=None, **kwargs)¶
Add validated & new column names to its registry.
- Parameters:
organism (
str
|None
, default:None
) – The organism name.**kwargs – Additional keyword arguments to pass to the registry model.
- add_new_from_var_index(organism=None, **kwargs)¶
Update variable records.
- Parameters:
organism (
str
|None
, default:None
) – The organism name.**kwargs – Additional keyword arguments to pass to the registry model.
- add_validated_from(key, organism=None)¶
Add validated categories.
- Parameters:
key (
str
) – The key referencing the slot in the DataFrame.organism (
str
|None
, default:None
) – The organism name.
- add_validated_from_var_index(organism=None)¶
Add validated variable records.
- Parameters:
organism (
str
|None
, default:None
) – The organism name.
- clean_up_failed_runs()¶
Clean up previous failed runs that don’t save any outputs.
- lookup(using_key=None)¶
Lookup categories.
- Parameters:
using_key (
str
|None
, default:None
) – The instance where the lookup is performed. if None (default), the lookup is performed on the instance specified in “using” parameter of the validator. if “public”, the lookup is performed on the public reference.- Return type:
- save_artifact(description=None, **kwargs)¶
Save the validated
AnnData
and metadata.- Parameters:
description (
str
|None
, default:None
) – Description of theAnnData
object.**kwargs – Object level metadata.
- Return type:
- Returns:
A saved artifact record.
- validate(organism=None)¶
Validate categories.
- Parameters:
organism (
str
|None
, default:None
) – The organism name.- Return type:
bool
- Returns:
Whether the AnnData object is validated.