lamindb.curators.DataFrameCurator¶
- class lamindb.curators.DataFrameCurator(dataset, schema)¶
Bases:
Curator
Curator for a DataFrame object.
Added in version 1.1.0.
- Parameters:
Example:
import lamindb as ln import bionty as bt # define valid labels cell_medium = ln.ULabel(name="CellMedium", is_type=True).save() ln.ULabel(name="DMSO", type=cell_medium).save() ln.ULabel(name="IFNG", type=cell_medium).save() bt.CellType.from_source(name="B cell").save() bt.CellType.from_source(name="T cell").save() # define schema schema = ln.Schema( name="small_dataset1_obs_level_metadata", features=[ ln.Feature(name="cell_medium", dtype="cat[ULabel[CellMedium]]").save(), ln.Feature(name="sample_note", dtype=str).save(), ln.Feature(name="cell_type_by_expert", dtype=bt.CellType).save(), ln.Feature(name="cell_type_by_model", dtype=bt.CellType).save(), ], ).save() # curate a DataFrame df = datasets.small_dataset1(otype="DataFrame") curator = ln.curators.DataFrameCurator(df, schema) artifact = curator.save_artifact(key="example_datasets/dataset1.parquet") assert artifact.schema == schema
Attributes¶
- property cat: CatManager¶
Manage categoricals by updating registries.
Methods¶
- save_artifact(*, key=None, description=None, revises=None, run=None)¶
Save an annotated artifact.
- Parameters:
key (
str
|None
, default:None
) – A path-like key to reference artifact in default storage, e.g.,"myfolder/myfile.fcs"
. Artifacts with the same key form a version family.description (
str
|None
, default:None
) – A description.revises (
Artifact
|None
, default:None
) – Previous version of the artifact. Is an alternative way to passingkey
to trigger a new version.run (
Run
|None
, default:None
) – The run that creates the artifact.
- Returns:
A saved artifact record.
- standardize()¶
Standardize the dataset. :rtype:
None
Adds missing columns for features
Fills missing values for features with default values
- validate()¶
Validate dataset.
- Raises:
lamindb.errors.ValidationError – If validation fails.
- Return type:
None