lamindb.curators.core¶
Curator utilities.
- class lamindb.curators.core.Curator(dataset, schema, *, features=None)¶
Curator base class.
A
Curatorobject makes it easy to validate, standardize & annotate datasets.- validate()¶
Validate dataset against Schema.
- Raises:
lamindb.errors.ValidationError – If validation fails.
- Return type:
bool|str
- save_artifact(*, key=None, description=None, revises=None, run=None)¶
Save an annotated artifact.
- Parameters:
key (
str|None, default:None) – A path-like key to reference artifact in default storage, e.g.,"myfolder/myfile.fcs". Artifacts with the same key form a version family.description (
str|None, default:None) – A description.revises (
Artifact|None, default:None) – Previous version of the artifact. Is an alternative way to passingkeyto trigger a new version.run (
Run|None, default:None) – The run that creates the artifact.
- Return type:
- Returns:
A saved artifact record.
- class lamindb.curators.core.SlotsCurator(dataset, schema, *, features=None)¶
Curator for a dataset with slots.
Uses slots to specify which component contains which schema. Slots are keys that identify where features are stored within composite data structures.
- Parameters:
Attributes¶
- property slots: dict[str, ComponentCurator]¶
Access sub curators by slot.
Methods¶
- validate()¶
Validate dataset against Schema.
- Raises:
lamindb.errors.ValidationError – If validation fails.
- Return type:
None
- save_artifact(*, key=None, description=None, revises=None, run=None)¶
Save an annotated artifact.
- Parameters:
key (
str|None, default:None) – A path-like key to reference artifact in default storage, e.g.,"myfolder/myfile.fcs". Artifacts with the same key form a version family.description (
str|None, default:None) – A description.revises (
Artifact|None, default:None) – Previous version of the artifact. Is an alternative way to passingkeyto trigger a new version.run (
Run|None, default:None) – The run that creates the artifact.
- Return type:
- Returns:
A saved artifact record.
- class lamindb.curators.core.ComponentCurator(dataset, schema, slot=None)¶
Curator for
DataFrame.Provides all key functionality to validate Pandas DataFrames. This class is not user facing unlike
DataFrameCuratorwhich extends this class with functionality to validate theattrsslot.- Parameters:
Attributes¶
- property cat: DataFrameCatManager¶
Manage categoricals by updating registries.
Methods¶
- standardize()¶
Standardize the dataset. :rtype:
NoneAdds missing columns for features
Fills missing values for features with default values
- validate()¶
Validate dataset against Schema.
- Raises:
lamindb.errors.ValidationError – If validation fails.
- Return type:
None
- save_artifact(*, key=None, description=None, revises=None, run=None)¶
Save an annotated artifact.
- Parameters:
key (
str|None, default:None) – A path-like key to reference artifact in default storage, e.g.,"myfolder/myfile.fcs". Artifacts with the same key form a version family.description (
str|None, default:None) – A description.revises (
Artifact|None, default:None) – Previous version of the artifact. Is an alternative way to passingkeyto trigger a new version.run (
Run|None, default:None) – The run that creates the artifact.
- Return type:
- Returns:
A saved artifact record.
- class lamindb.curators.core.CatVector(values_getter, field, key, values_setter=None, source=None, feature=None, cat_manager=None, filter_str='', subtypes_list=None, maximal_set=True)¶
Vector with categorical values.
Attributes¶
- property is_validated: bool¶
Whether the vector is validated.
- property values¶
Get the current values using the getter function.
Methods¶
- validate()¶
Validate the vector.
- Return type:
None
- standardize()¶
Standardize the vector.
- Return type:
None
- add_new(**create_kwargs)¶
Add new values to the registry.
- Return type:
None
- class lamindb.curators.core.CatLookup(categoricals, slots=None, public=False, sources=None)¶
Lookup categories from the reference instance.
- Parameters:
categoricals (
list[Feature] |dict[str,DeferredAttribute]) – A dictionary of categorical fields to lookup.slots (
dict[str,DeferredAttribute], default:None) – A dictionary of slot fields to lookup.public (
bool, default:False) – Whether to lookup from the public instance. Defaults to False.
Example:
curator = ln.curators.DataFrameCurator(...) curator.cat.lookup()["cell_type"].alveolar_type_1_fibroblast_cell
- class lamindb.curators.core.DataFrameCatManager(df, columns_field=FieldAttr(Feature.name), categoricals=None, sources=None, index=None, slot=None, maximal_set=False)¶
Manage categoricals by updating registries.
This class is accessible from within a
DataFrameCuratorvia the.catattribute.If you find non-validated values, you have two options:
new values found in the data can be registered via
DataFrameCurator.cat.add_new_from()add_new_from()non-validated values can be accessed via
DataFrameCurator.cat.add_new_from()non_validated()and addressed manually
Attributes¶
- property non_validated: dict[str, list[str]]¶
Return the non-validated features and labels.
Methods¶
- lookup(public=False)¶
Lookup categories.
- Parameters:
public (
bool, default:False) – If “public”, the lookup is performed on the public reference.- Return type:
- validate()¶
Validate variables and categorical observations.
- Return type:
bool
- standardize(key)¶
Replace synonyms with standardized values.
Modifies the input dataset inplace.
- Parameters:
key (
str) – The key referencing the column in the DataFrame to standardize.- Return type:
None
- add_new_from(key, **kwargs)¶
Add validated & new categories.
- Parameters:
key (
str) – The key referencing the slot in the DataFrame from which to draw terms.**kwargs – Additional keyword arguments to pass to create new records