lamindb.curators.DataFrameCurator¶
- class lamindb.curators.DataFrameCurator(dataset, schema, slot=None)¶
Bases:
Curator
Curator for
DataFrame
.- Parameters:
Example
For simple example using a flexible schema, see
from_df()
.Here is an example that enforces a minimal set of columns in the dataframe.
import lamindb as ln schema = ln.core.datasets.mini_immuno.define_mini_immuno_schema_flexible() df = ln.core.datasets.small_dataset1(otype="DataFrame") df.pop("donor") # remove donor column to trigger validation error try: artifact = ln.Artifact.from_df( df, key="examples/dataset1.parquet", schema=schema ).save() except ln.errors.ValidationError as error: print(error)
Under-the-hood, this used the following schema.
import lamindb as ln schema = ln.Schema( name="Mini immuno schema", features=[ ln.Feature.get(name="perturbation"), ln.Feature.get(name="cell_type_by_model"), ln.Feature.get(name="assay_oid"), ln.Feature.get(name="donor"), ln.Feature.get(name="concentration"), ln.Feature.get(name="treatment_time_h"), ], flexible=True, # _additional_ columns in a dataframe are validated & annotated ).save()
Valid features & labels were defined as:
import lamindb as ln import bionty as bt # define valid labels perturbation_type = ln.ULabel(name="Perturbation", is_type=True).save() ln.ULabel(name="DMSO", type=perturbation_type).save() ln.ULabel(name="IFNG", type=perturbation_type).save() bt.CellType.from_source(name="B cell").save() bt.CellType.from_source(name="T cell").save() # define valid features ln.Feature(name="perturbation", dtype=perturbation_type).save() ln.Feature(name="cell_type_by_model", dtype=bt.CellType).save() ln.Feature(name="cell_type_by_expert", dtype=bt.CellType).save() ln.Feature(name="assay_oid", dtype=bt.ExperimentalFactor.ontology_id).save() ln.Feature(name="donor", dtype=str, nullable=True).save() ln.Feature(name="concentration", dtype=str).save() ln.Feature(name="treatment_time_h", dtype="num", coerce_dtype=True).save()
Attributes¶
- property cat: DataFrameCatManager¶
Manage categoricals by updating registries.
Methods¶
- save_artifact(*, key=None, description=None, revises=None, run=None)¶
Save an annotated artifact.
- Parameters:
key (
str
|None
, default:None
) – A path-like key to reference artifact in default storage, e.g.,"myfolder/myfile.fcs"
. Artifacts with the same key form a version family.description (
str
|None
, default:None
) – A description.revises (
Artifact
|None
, default:None
) – Previous version of the artifact. Is an alternative way to passingkey
to trigger a new version.run (
Run
|None
, default:None
) – The run that creates the artifact.
- Return type:
- Returns:
A saved artifact record.
- standardize()¶
Standardize the dataset. :rtype:
None
Adds missing columns for features
Fills missing values for features with default values
- validate()¶
Validate dataset against Schema.
- Raises:
lamindb.errors.ValidationError – If validation fails.
- Return type:
None