lamindb.core.CatManager

class lamindb.core.CatManager(*, dataset, categoricals, sources, organism, exclude, columns_field=None)

Bases: object

Manage valid categoricals by updating registries.

A CatManager object makes it easy to validate, standardize & annotate datasets.

Example:

>>> cat_manager = ln.CatManager(
>>>     dataset,
>>>     # define validation criteria as mappings
>>>     columns=Feature.name,  # map column names
>>>     categoricals={"perturbation": ULabel.name},  # map categories
>>> )
>>> cat_manager.validate()  # validate the dataframe
>>> artifact = cat_manager.save_artifact(description="my RNA-seq")
>>> artifact.describe()  # see annotations

cat_manager.validate() maps values within df according to the mapping criteria and logs validated & problematic values.

If you find non-validated values, you have several options:

  • new values found in the data can be registered using add_new_from()

  • non-validated values can be accessed using non_validated() and addressed manually

Attributes

property categoricals: dict

Return the columns fields to validate against.

property non_validated: dict[str, list[str]]

Return the non-validated features and labels.

Class methods

classmethod from_anndata(data, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), verbosity='hint', organism=None, sources=None)
Return type:

AnnDataCatManager

classmethod from_df(df, categoricals=None, columns=FieldAttr(Feature.name), verbosity='hint', organism=None)
Return type:

DataFrameCatManager

classmethod from_mudata(mdata, var_index, categoricals=None, verbosity='hint', organism=None)
Return type:

MuDataCatManager

classmethod from_spatialdata(sdata, var_index, categoricals=None, organism=None, sources=None, exclude=None, verbosity='hint', *, sample_metadata_key='sample')
classmethod from_tiledbsoma(experiment_uri, var_index, categoricals=None, obs_columns=FieldAttr(Feature.name), organism=None, sources=None, exclude=None)
Return type:

TiledbsomaCatManager

Methods

save_artifact(*, key=None, description=None, revises=None, run=None)

Save an annotated artifact.

Parameters:
  • key (str | None, default: None) – A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a version family.

  • description (str | None, default: None) – A description.

  • revises (Artifact | None, default: None) – Previous version of the artifact. Is an alternative way to passing key to trigger a new version.

  • run (Run | None, default: None) – The run that creates the artifact.

Return type:

Artifact

Returns:

A saved artifact record.

standardize(key)

Replace synonyms with standardized values.

Inplace modification of the dataset.

Parameters:

key (str) – The name of the column to standardize.

Return type:

None

Returns:

None

validate()

Validate dataset.

This method also registers the validated records in the current instance.

Return type:

bool

Returns:

The boolean True if the dataset is validated. Otherwise, a string with the error message.