lamindb.curators.core.DataFrameCatManager¶
- class lamindb.curators.core.DataFrameCatManager(df, columns_field=FieldAttr(Feature.name), columns_names=None, categoricals=None, sources=None, index=None, slot=None, maximal_set=False)¶
Bases:
object
Manage categoricals by updating registries.
This class is accessible from within a
DataFrameCurator
via the.cat
attribute.If you find non-validated values, you have two options:
new values found in the data can be registered via
DataFrameCurator.cat.add_new_from()
add_new_from()
non-validated values can be accessed via
DataFrameCurator.cat.add_new_from()
non_validated()
and addressed manually
Attributes¶
- property non_validated: dict[str, list[str]]¶
Return the non-validated features and labels.
Methods¶
- add_new_from(key, **kwargs)¶
Add validated & new categories.
- Parameters:
key (
str
) – The key referencing the slot in the DataFrame from which to draw terms.**kwargs – Additional keyword arguments to pass to create new records
- lookup(public=False)¶
Lookup categories.
- Parameters:
public (
bool
, default:False
) – If “public”, the lookup is performed on the public reference.- Return type:
- standardize(key)¶
Replace synonyms with standardized values.
Modifies the input dataset inplace.
- Parameters:
key (
str
) – The key referencing the column in the DataFrame to standardize.- Return type:
None
- validate()¶
Validate variables and categorical observations.
- Return type:
bool