lamindb.curators.core

Curator utilities.

class lamindb.curators.core.Curator(dataset, schema, *, features=None)

Curator base class.

A Curator object makes it easy to validate, standardize & annotate datasets.

See:
validate()

Validate dataset against Schema.

Raises:

lamindb.errors.ValidationError – If validation fails.

Return type:

bool | str

save_artifact(*, key=None, description=None, revises=None, run=None)

Save an annotated artifact.

Parameters:
  • key (str | None, default: None) – A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a version family.

  • description (str | None, default: None) – A description.

  • revises (Artifact | None, default: None) – Previous version of the artifact. Is an alternative way to passing key to trigger a new version.

  • run (Run | None, default: None) – The run that creates the artifact.

Return type:

Artifact

Returns:

A saved artifact record.

class lamindb.curators.core.SlotsCurator(dataset, schema, *, features=None)

Curator for a dataset with slots.

Uses slots to specify which component contains which schema. Slots are keys that identify where features are stored within composite data structures.

Parameters:
  • dataset (Artifact | AnnData | TypeVar(MuData) | TypeVar(SpatialData) | Experiment) – The dataset to validate & annotate.

  • schema (Schema) – A Schema object that defines the validation constraints.

Attributes

property slots: dict[str, ComponentCurator]

Access sub curators by slot.

Methods

validate()

Validate dataset against Schema.

Raises:

lamindb.errors.ValidationError – If validation fails.

Return type:

None

save_artifact(*, key=None, description=None, revises=None, run=None)

Save an annotated artifact.

Parameters:
  • key (str | None, default: None) – A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a version family.

  • description (str | None, default: None) – A description.

  • revises (Artifact | None, default: None) – Previous version of the artifact. Is an alternative way to passing key to trigger a new version.

  • run (Run | None, default: None) – The run that creates the artifact.

Return type:

Artifact

Returns:

A saved artifact record.

class lamindb.curators.core.ComponentCurator(dataset, schema, slot=None)

Curator for DataFrame.

Provides all key functionality to validate Pandas DataFrames. This class is not user facing unlike DataFrameCurator which extends this class with functionality to validate the attrs slot.

Parameters:
  • dataset (DataFrame | Artifact) – The DataFrame-like object to validate & annotate.

  • schema (Schema) – A Schema object that defines the validation constraints.

  • slot (str | None, default: None) – Indicate the slot in a composite curator for a composite data structure.

Attributes

property cat: DataFrameCatManager

Manage categoricals by updating registries.

Methods

standardize()

Standardize the dataset. :rtype: None

  • Adds missing columns for features

  • Fills missing values for features with default values

validate()

Validate dataset against Schema.

Raises:

lamindb.errors.ValidationError – If validation fails.

Return type:

None

save_artifact(*, key=None, description=None, revises=None, run=None)

Save an annotated artifact.

Parameters:
  • key (str | None, default: None) – A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a version family.

  • description (str | None, default: None) – A description.

  • revises (Artifact | None, default: None) – Previous version of the artifact. Is an alternative way to passing key to trigger a new version.

  • run (Run | None, default: None) – The run that creates the artifact.

Return type:

Artifact

Returns:

A saved artifact record.

class lamindb.curators.core.CatVector(values_getter, field, key, values_setter=None, source=None, feature=None, cat_manager=None, filter_str='', subtypes_list=None, maximal_set=True)

Vector with categorical values.

Attributes

property is_validated: bool

Whether the vector is validated.

property values

Get the current values using the getter function.

Methods

validate()

Validate the vector.

Return type:

None

standardize()

Standardize the vector.

Return type:

None

add_new(**create_kwargs)

Add new values to the registry.

Return type:

None

class lamindb.curators.core.CatLookup(categoricals, slots=None, public=False, sources=None)

Lookup categories from the reference instance.

Parameters:
  • categoricals (list[Feature] | dict[str, DeferredAttribute]) – A dictionary of categorical fields to lookup.

  • slots (dict[str, DeferredAttribute], default: None) – A dictionary of slot fields to lookup.

  • public (bool, default: False) – Whether to lookup from the public instance. Defaults to False.

Example:

curator = ln.curators.DataFrameCurator(...)
curator.cat.lookup()["cell_type"].alveolar_type_1_fibroblast_cell
class lamindb.curators.core.DataFrameCatManager(df, columns_field=FieldAttr(Feature.name), categoricals=None, sources=None, index=None, slot=None, maximal_set=False)

Manage categoricals by updating registries.

This class is accessible from within a DataFrameCurator via the .cat attribute.

If you find non-validated values, you have two options:

  • new values found in the data can be registered via DataFrameCurator.cat.add_new_from() add_new_from()

  • non-validated values can be accessed via DataFrameCurator.cat.add_new_from() non_validated() and addressed manually

Attributes

property categoricals: list[Feature]

The categorical features.

property non_validated: dict[str, list[str]]

Return the non-validated features and labels.

Methods

lookup(public=False)

Lookup categories.

Parameters:

public (bool, default: False) – If “public”, the lookup is performed on the public reference.

Return type:

CatLookup

validate()

Validate variables and categorical observations.

Return type:

bool

standardize(key)

Replace synonyms with standardized values.

Modifies the input dataset inplace.

Parameters:

key (str) – The key referencing the column in the DataFrame to standardize.

Return type:

None

add_new_from(key, **kwargs)

Add validated & new categories.

Parameters:
  • key (str) – The key referencing the slot in the DataFrame from which to draw terms.

  • **kwargs – Additional keyword arguments to pass to create new records