lamindb.curators.DataFrameCatManager¶
- class lamindb.curators.DataFrameCatManager(df, columns=FieldAttr(Feature.name), categoricals=None, verbosity='hint', organism=None, sources=None)¶
Bases:
CatManager
Curation flow for a DataFrame object.
See also
Curator
.- Parameters:
df (
DataFrame
|Artifact
) – The DataFrame object to curate.columns (
DeferredAttribute
, default:FieldAttr(Feature.name)
) – The field attribute for the feature column.categoricals (
dict
[str
,DeferredAttribute
] |None
, default:None
) – A dictionary mapping column names to registry_field.verbosity (
str
, default:'hint'
) – The verbosity level.organism (
str
|None
, default:None
) – The organism name.sources (
dict
[str
,Record
] |None
, default:None
) – A dictionary mapping column names to Source records.
- Returns:
A curator object.
Example:
import lamindb as ln import bionty as bt curator = ln.curators.DataFrameCatManager( df, categoricals={ "cell_type_ontology_id": bt.CellType.ontology_id, "donor_id": ULabel.name } )
Attributes¶
- property categoricals: dict¶
Return the columns fields to validate against.
- property non_validated: dict[str, list[str]]¶
Return the non-validated features and labels.
Class methods¶
- classmethod from_anndata(data, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), verbosity='hint', organism=None, sources=None)¶
- Return type:
AnnDataCatManager
- classmethod from_df(df, categoricals=None, columns=FieldAttr(Feature.name), verbosity='hint', organism=None)¶
- Return type:
- classmethod from_mudata(mdata, var_index, categoricals=None, verbosity='hint', organism=None)¶
- Return type:
MuDataCatManager
- classmethod from_spatialdata(sdata, var_index, categoricals=None, organism=None, sources=None, verbosity='hint', *, sample_metadata_key='sample')¶
- classmethod from_tiledbsoma(experiment_uri, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), organism=None, sources=None)¶
- Return type:
Methods¶
- add_new_from(key, **kwargs)¶
Add validated & new categories.
- Parameters:
key (
str
) – The key referencing the slot in the DataFrame from which to draw terms.organism – The organism name.
**kwargs – Additional keyword arguments to pass to create new records
- add_new_from_columns(organism=None, **kwargs)¶
- clean_up_failed_runs()¶
Clean up previous failed runs that don’t save any outputs.
- lookup(public=False)¶
Lookup categories.
- Parameters:
public (
bool
, default:False
) – If “public”, the lookup is performed on the public reference.- Return type:
- save_artifact(*, key=None, description=None, revises=None, run=None)¶
Save an annotated artifact.
- Parameters:
key (
str
|None
, default:None
) – A path-like key to reference artifact in default storage, e.g.,"myfolder/myfile.fcs"
. Artifacts with the same key form a version family.description (
str
|None
, default:None
) – A description.revises (
Artifact
|None
, default:None
) – Previous version of the artifact. Is an alternative way to passingkey
to trigger a new version.run (
Run
|None
, default:None
) – The run that creates the artifact.
- Return type:
- Returns:
A saved artifact record.
- standardize(key)¶
Replace synonyms with standardized values.
Modifies the input dataset inplace.
- Parameters:
key (
str
) – The key referencing the column in the DataFrame to standardize.- Return type:
None
- validate()¶
Validate variables and categorical observations.
This method also registers the validated records in the current instance: - from public sources
- Parameters:
organism – The organism name.
- Return type:
bool
- Returns:
Whether the DataFrame is validated.