lamindb.curators.core.DataFrameCatManager

class lamindb.curators.core.DataFrameCatManager(df, columns_field=FieldAttr(Feature.name), columns_names=None, categoricals=None, sources=None, index=None, slot=None, maximal_set=False)

Bases: object

Manage categoricals by updating registries.

This class is accessible from within a DataFrameCurator via the .cat attribute.

If you find non-validated values, you have two options:

  • new values found in the data can be registered via DataFrameCurator.cat.add_new_from() add_new_from()

  • non-validated values can be accessed via DataFrameCurator.cat.add_new_from() non_validated() and addressed manually

Attributes

property categoricals: list[Feature]

The categorical features.

property non_validated: dict[str, list[str]]

Return the non-validated features and labels.

Methods

add_new_from(key, **kwargs)

Add validated & new categories.

Parameters:
  • key (str) – The key referencing the slot in the DataFrame from which to draw terms.

  • **kwargs – Additional keyword arguments to pass to create new records

lookup(public=False)

Lookup categories.

Parameters:

public (bool, default: False) – If “public”, the lookup is performed on the public reference.

Return type:

CatLookup

standardize(key)

Replace synonyms with standardized values.

Modifies the input dataset inplace.

Parameters:

key (str) – The key referencing the column in the DataFrame to standardize.

Return type:

None

validate()

Validate variables and categorical observations.

Return type:

bool