lamindb.core.DataFrameCurator¶
- class lamindb.core.DataFrameCurator(df, columns=FieldAttr(Feature.name), categoricals=None, using_key=None, verbosity='hint', organism=None, sources=None, exclude=None, check_valid_keys=True)¶
Bases:
BaseCurator
Curation flow for a DataFrame object.
See also
Curator
.- Parameters:
df (
DataFrame
) – The DataFrame object to curate.columns (
DeferredAttribute
, default:FieldAttr(Feature.name)
) – The field attribute for the feature column.categoricals (
dict
[str
,DeferredAttribute
] |None
, default:None
) – A dictionary mapping column names to registry_field.using_key (
str
|None
, default:None
) – The reference instance containing registries to validate against.verbosity (
str
, default:'hint'
) – The verbosity level.organism (
str
|None
, default:None
) – The organism name.sources (
dict
[str
,Record
] |None
, default:None
) – A dictionary mapping column names to Source records.exclude (
dict
|None
, default:None
) – A dictionary mapping column names to values to exclude from validation. When specificSource
instances are pinned and may lack default values (e.g., “unknown” or “na”), using the exclude parameter ensures they are not validated.
- Returns:
A curator object.
Examples
>>> import bionty as bt >>> curator = ln.Curator.from_df( ... df, ... categoricals={ ... "cell_type_ontology_id": bt.CellType.ontology_id, ... "donor_id": ln.ULabel.name ... } ... )
Attributes¶
- property fields: dict¶
Return the columns fields to validate against.
- property non_validated: dict[str, list[str]]¶
Return the non-validated features and labels.
Methods¶
- add_new_from(key, organism=None, **kwargs)¶
Add validated & new categories.
- Parameters:
key (
str
) – The key referencing the slot in the DataFrame from which to draw terms.organism (
str
|None
, default:None
) – The organism name.**kwargs – Additional keyword arguments to pass to create new records
- clean_up_failed_runs()¶
Clean up previous failed runs that don’t save any outputs.
- lookup(using_key=None, public=False)¶
Lookup categories.
- Parameters:
using_key (
str
|None
, default:None
) – The instance where the lookup is performed. if “public”, the lookup is performed on the public reference.- Return type:
- save_artifact(description=None, key=None, revises=None, run=None)¶
Save the validated DataFrame and metadata.
- Parameters:
description (
str
|None
, default:None
) – Description of the DataFrame object.key (
str
|None
, default:None
) – A path-like key to reference artifact in default storage, e.g.,"myfolder/myfile.fcs"
. Artifacts with the same key form a revision family.revises (
Artifact
|None
, default:None
) – Previous version of the artifact. Triggers a revision.run (
Run
|None
, default:None
) – The run that creates the artifact.
- Return type:
- Returns:
A saved artifact record.
- standardize(key)¶
Replace synonyms with standardized values.
Modifies the input dataset inplace.
- Parameters:
key (
str
) – The key referencing the column in the DataFrame to standardize.- Return type:
None
- validate(organism=None)¶
Validate variables and categorical observations.
This method also registers the validated records in the current instance: - from public sources - from the using_key instance
- Parameters:
organism (
str
|None
, default:None
) – The organism name.- Return type:
bool
- Returns:
Whether the DataFrame is validated.