lamindb.curators.DataFrameCurator

class lamindb.curators.DataFrameCurator(dataset, schema)

Bases: Curator

Curator for a DataFrame object.

See also Curator and Schema.

Added in version 1.1.0.

Parameters:
  • dataset (DataFrame | Artifact) – The DataFrame-like object to validate & annotate.

  • schema (Schema) – A Schema object that defines the validation constraints.

Example:

import lamindb as ln
import bionty as bt

# define valid labels
cell_medium = ln.ULabel(name="CellMedium", is_type=True).save()
ln.ULabel(name="DMSO", type=cell_medium).save()
ln.ULabel(name="IFNG", type=cell_medium).save()
bt.CellType.from_source(name="B cell").save()
bt.CellType.from_source(name="T cell").save()

# define schema
schema = ln.Schema(
    name="small_dataset1_obs_level_metadata",
    features=[
        ln.Feature(name="cell_medium", dtype="cat[ULabel[CellMedium]]").save(),
        ln.Feature(name="sample_note", dtype=str).save(),
        ln.Feature(name="cell_type_by_expert", dtype=bt.CellType).save(),
        ln.Feature(name="cell_type_by_model", dtype=bt.CellType).save(),
    ],
).save()

# curate a DataFrame
df = datasets.small_dataset1(otype="DataFrame")
curator = ln.curators.DataFrameCurator(df, schema)
artifact = curator.save_artifact(key="example_datasets/dataset1.parquet")
assert artifact.schema == schema

Attributes

property cat: CatManager

Manage categoricals by updating registries.

Methods

save_artifact(*, key=None, description=None, revises=None, run=None)

Save an annotated artifact.

Parameters:
  • key (str | None, default: None) – A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a version family.

  • description (str | None, default: None) – A description.

  • revises (Artifact | None, default: None) – Previous version of the artifact. Is an alternative way to passing key to trigger a new version.

  • run (Run | None, default: None) – The run that creates the artifact.

Returns:

A saved artifact record.

standardize()

Standardize the dataset. :rtype: None

  • Adds missing columns for features

  • Fills missing values for features with default values

validate()

Validate dataset.

Raises:

lamindb.errors.ValidationError – If validation fails.

Return type:

None