lamindb.core.AnnDataCurator¶
- class lamindb.core.AnnDataCurator(data, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), using_key=None, verbosity='hint', organism=None, sources=None, exclude=None)¶
Bases:
DataFrameCurator
Curation flow for
AnnData
.See also
Curator
.Note that if genes are removed from the AnnData object, the object should be recreated using
from_anndata()
.See Curate AnnData based on the CELLxGENE schema for instructions on how to curate against a specific cellxgene schema version.
- Parameters:
data (ad.AnnData | UPathStr) – The AnnData object or an AnnData-like path.
var_index (FieldAttr) – The registry field for mapping the
.var
index.categoricals (dict[str, FieldAttr] | None, default:
None
) – A dictionary mapping.obs.columns
to a registry field.obs_columns (FieldAttr, default:
FieldAttr(Feature.name)
) – The registry field for mapping the.obs.columns
.using_key (str | None, default:
None
) – A reference LaminDB instance.verbosity (str, default:
'hint'
) – The verbosity level.organism (str | None, default:
None
) – The organism name.sources (dict[str, Record] | None, default:
None
) – A dictionary mapping.obs.columns
to Source records.exclude (dict | None, default:
None
) – A dictionary mapping column names to values to exclude from validation. When specificSource
instances are pinned and may lack default values (e.g., “unknown” or “na”), using the exclude parameter ensures they are not validated.
Examples
>>> import bionty as bt >>> curator = ln.Curator.from_anndata( ... adata, ... var_index=bt.Gene.ensembl_gene_id, ... categoricals={ ... "cell_type_ontology_id": bt.CellType.ontology_id, ... "donor_id": ln.ULabel.name ... }, ... organism="human", ... )
Attributes¶
- property categoricals: dict¶
Return the obs fields to validate against.
- property fields: dict¶
Return the columns fields to validate against.
- property non_validated: dict[str, list[str]]¶
Return the non-validated features and labels.
- property var_index: DeferredAttribute¶
Return the registry field to validate variables index against.
Methods¶
- add_new_from(key, organism=None, **kwargs)¶
Add validated & new categories.
- Parameters:
key (
str
) – The key referencing the slot in the DataFrame from which to draw terms.organism (
str
|None
, default:None
) – The organism name.**kwargs – Additional keyword arguments to pass to create new records
- add_new_from_var_index(organism=None, **kwargs)¶
Update variable records.
- Parameters:
organism (
str
|None
, default:None
) – The organism name.**kwargs – Additional keyword arguments to pass to create new records.
- clean_up_failed_runs()¶
Clean up previous failed runs that don’t save any outputs.
- lookup(using_key=None, public=False)¶
Lookup categories.
- Parameters:
using_key (
str
|None
, default:None
) – The instance where the lookup is performed. if “public”, the lookup is performed on the public reference.- Return type:
- save_artifact(description=None, key=None, revises=None, run=None)¶
Save the validated
AnnData
and metadata.- Parameters:
description (
str
|None
, default:None
) – A description of theAnnData
object.key (
str
|None
, default:None
) – A path-like key to reference artifact in default storage, e.g.,"myfolder/myfile.fcs"
. Artifacts with the same key form a revision family.revises (
Artifact
|None
, default:None
) – Previous version of the artifact. Triggers a revision.run (
Run
|None
, default:None
) – The run that creates the artifact.
- Return type:
- Returns:
A saved artifact record.
- standardize(key)¶
Replace synonyms with standardized values.
- Parameters:
key (
str
) –The key referencing the slot in
adata.obs
from which to draw terms. Same as the key incategoricals
.If “var_index”, standardize the var.index.
If “all”, standardize all obs columns and var.index.
Inplace modification of the dataset.
- validate(organism=None)¶
Validate categories.
This method also registers the validated records in the current instance.
- Parameters:
organism (
str
|None
, default:None
) – The organism name.- Return type:
bool
- Returns:
Whether the AnnData object is validated.