lamindb.curators .md

Curators.

High-level curators

class lamindb.curators.DataFrameCurator(dataset, schema, *, slot=None, features=None, require_saved_schema=True)

Curator for DataFrame.

Parameters:
  • dataset (DataFrame | Artifact) – The DataFrame-like object to validate & annotate.

  • schema (Schema) – A Schema object that defines the validation constraints.

  • slot (str | None, default: None) – Indicate the slot in a composite curator for a composite data structure.

  • require_saved_schema (bool, default: True) – Whether the schema must be saved before curation.

Examples

For a simple example using a flexible schema, see from_dataframe().

Here is an example that enforces a minimal set of columns in the dataframe.

import lamindb as ln

schema = ln.examples.datasets.mini_immuno.define_mini_immuno_schema_flexible()
df = ln.examples.datasets.mini_immuno.get_dataset1(otype="DataFrame")
df.pop("donor")  # remove donor column to trigger validation error
try:
    artifact = ln.Artifact.from_dataframe(
        df, key="examples/dataset1.parquet", schema=schema
    ).save()
except ln.errors.ValidationError as error:
    print(error)

Under-the-hood, this used the following schema.

import lamindb as ln

schema = ln.Schema(
    name="Mini immuno schema",
    features=[
        ln.Feature.get(name="perturbation"),
        ln.Feature.get(name="cell_type_by_model"),
        ln.Feature.get(name="assay_oid"),
        ln.Feature.get(name="donor"),
        ln.Feature.get(name="concentration"),
        ln.Feature.get(name="treatment_time_h"),
    ],
    flexible=True,  # _additional_ columns in a dataframe are validated & annotated
).save()

Valid features & labels were defined as:

import bionty as bt

import lamindb as ln

# define valid labels
perturbation_type = ln.Record(name="Perturbation", is_type=True).save()
ln.Record(name="DMSO", type=perturbation_type).save()
ln.Record(name="IFNG", type=perturbation_type).save()
bt.CellType.from_source(name="B cell").save()
bt.CellType.from_source(name="T cell").save()

# define valid features
ln.Feature(name="perturbation", dtype=perturbation_type).save()
ln.Feature(name="cell_type_by_expert", dtype=bt.CellType).save()
ln.Feature(name="cell_type_by_model", dtype=bt.CellType).save()
ln.Feature(name="assay_oid", dtype=bt.ExperimentalFactor.ontology_id).save()
ln.Feature(name="concentration", dtype=str).save()
ln.Feature(name="treatment_time_h", dtype="num", coerce=True).save()
ln.Feature(name="donor", dtype=str, nullable=True).save()
ln.Feature(name="donor_ethnicity", dtype=list[bt.Ethnicity]).save()

It is also possible to curate the attrs slot.

import lamindb as ln

from .define_schema_df_metadata import study_metadata_schema

df = ln.examples.datasets.mini_immuno.get_dataset1(otype="DataFrame")
schema = ln.Schema(
    features=[ln.Feature(name="perturbation", dtype="str").save()],
    slots={"attrs": study_metadata_schema},
    otype="DataFrame",
).save()
curator = ln.curators.DataFrameCurator(df, schema=schema)
curator.validate()
artifact = curator.save_artifact(key="examples/df_with_attrs.parquet")
artifact.describe()
property cat: DataFrameCatManager

Manage categoricals by updating registries.

standardize()

Standardize the dataset. :rtype: None

  • Adds missing columns for features

  • Fills missing values for features with default values

validate()

Validate dataset against Schema.

Raises:

lamindb.errors.ValidationError – If validation fails.

.

Return type:

None

save_artifact(*, key=None, description=None, revises=None, run=None)

Save an annotated artifact.

Parameters:
  • key (default: None) – A path-like key to reference artifact in default storage, e.g., "myfolder/myfile.fcs". Artifacts with the same key form a version family.

  • description (default: None) – A description.

  • revises (default: None) – Previous version of the artifact. Is an alternative way to passing key to trigger a new version.

  • run (default: None) – The run that creates the artifact.

Return type:

Artifact

Returns:

A saved artifact record.

.

class lamindb.curators.AnnDataCurator(dataset, schema)

Curator for AnnData.

Uses slots to specify which component contains which schema. Slots are keys that identify where features are stored within composite data structures.

Parameters:
  • dataset (AnnData | Artifact) – The AnnData-like object to validate & annotate.

  • schema (Schema) – A Schema object that defines the validation constraints.

Examples

Curate Ensembl gene IDs and valid features in obs:

curate_anndata_flexible.py
import lamindb as ln

ln.examples.datasets.mini_immuno.define_features_labels()
adata = ln.examples.datasets.mini_immuno.get_dataset1(otype="AnnData")
artifact = ln.Artifact.from_anndata(
    adata,
    key="examples/mini_immuno.h5ad",
    schema="ensembl_gene_ids_and_valid_features_in_obs",
).save()
artifact.describe()

Curate uns dictionary:

curate_anndata_uns.py
import lamindb as ln

ln.examples.datasets.mini_immuno.define_features_labels()
adata = ln.examples.datasets.mini_immuno.get_dataset1(otype="AnnData")
schema = ln.Schema.get(name="Study metadata schema")
artifact = ln.Artifact.from_anndata(
    adata, schema=schema, key="examples/mini_immuno_uns.h5ad"
)
artifact.describe()

See also

from_anndata().

class lamindb.curators.MuDataCurator(dataset, schema)

Curator for MuData.

Uses slots to specify which component contains which schema. Slots are keys that identify where features are stored within composite data structures.

Parameters:
  • dataset (MuData | Artifact) – The MuData-like object to validate & annotate.

  • schema (Schema) – A Schema object that defines the validation constraints.

Example

curate_mudata.py
import lamindb as ln
import bionty as bt

from docs.scripts.define_schema_df_metadata import study_metadata_schema

# define labels
perturbation = ln.Record(name="Perturbation", is_type=True).save()
ln.Record(name="Perturbed", type=perturbation).save()
ln.Record(name="NT", type=perturbation).save()

replicate = ln.Record(name="Replicate", is_type=True).save()
ln.Record(name="rep1", type=replicate).save()
ln.Record(name="rep2", type=replicate).save()
ln.Record(name="rep3", type=replicate).save()

# define the global obs schema
obs_schema = ln.Schema(
    name="mudata_papalexi21_subset_obs_schema",
    features=[
        ln.Feature(name="perturbation", dtype="cat[Record[Perturbation]]").save(),
        ln.Feature(name="replicate", dtype="cat[Record[Replicate]]").save(),
    ],
).save()

# define the ['rna'].obs schema
obs_schema_rna = ln.Schema(
    name="mudata_papalexi21_subset_rna_obs_schema",
    features=[
        ln.Feature(name="nCount_RNA", dtype=int).save(),
        ln.Feature(name="nFeature_RNA", dtype=int).save(),
        ln.Feature(name="percent.mito", dtype=float).save(),
    ],
).save()

# define the ['hto'].obs schema
obs_schema_hto = ln.Schema(
    name="mudata_papalexi21_subset_hto_obs_schema",
    features=[
        ln.Feature(name="nCount_HTO", dtype=float).save(),
        ln.Feature(name="nFeature_HTO", dtype=int).save(),
        ln.Feature(name="technique", dtype=bt.ExperimentalFactor).save(),
    ],
).save()

# define ['rna'].var schema
var_schema_rna = ln.Schema(
    name="mudata_papalexi21_subset_rna_var_schema",
    itype=bt.Gene.symbol,
    dtype=float,
).save()

# define composite schema
mudata_schema = ln.Schema(
    name="mudata_papalexi21_subset_mudata_schema",
    otype="MuData",
    slots={
        "obs": obs_schema,
        "rna:obs": obs_schema_rna,
        "hto:obs": obs_schema_hto,
        "rna:var": var_schema_rna,
        "uns:study_metadata": study_metadata_schema,
    },
).save()

# curate a MuData
mdata = ln.examples.datasets.mudata_papalexi21_subset(with_uns=True)
bt.settings.organism = "human"  # set the organism to map gene symbols
curator = ln.curators.MuDataCurator(mdata, mudata_schema)
artifact = curator.save_artifact(key="examples/mudata_papalexi21_subset.h5mu")
assert artifact.schema == mudata_schema

See also

from_mudata().

class lamindb.curators.SpatialDataCurator(dataset, schema)

Curator for SpatialData.

Uses slots to specify which component contains which schema. Slots are keys that identify where features are stored within composite data structures.

Parameters:
  • dataset (SpatialData | Artifact) – The SpatialData-like object to validate & annotate.

  • schema (Schema) – A Schema object that defines the validation constraints.

Example

curate_spatialdata.py
import lamindb as ln

spatialdata = ln.examples.datasets.spatialdata_blobs()
sdata_schema = ln.Schema.get(name="spatialdata_blobs_schema")
curator = ln.curators.SpatialDataCurator(spatialdata, sdata_schema)
try:
    curator.validate()
except ln.errors.ValidationError:
    pass

spatialdata.tables["table"].var.drop(index="ENSG00000999999", inplace=True)

# validate again (must pass now) and save artifact
artifact = ln.Artifact.from_spatialdata(
    spatialdata, key="examples/spatialdata1.zarr", schema=sdata_schema
).save()
artifact.describe()

See also

from_spatialdata().

class lamindb.curators.TiledbsomaExperimentCurator(dataset, schema)

Curator for tiledbsoma.Experiment.

Uses slots to specify which component contains which schema. Slots are keys that identify where features are stored within composite data structures.

Parameters:
  • dataset (SOMAExperiment | Artifact) – The tiledbsoma.Experiment object.

  • schema (Schema) – A Schema object that defines the validation constraints.

Example

curate_soma_experiment.py
import lamindb as ln
import bionty as bt
import tiledbsoma as soma
import tiledbsoma.io

adata = ln.examples.datasets.mini_immuno.get_dataset1(otype="AnnData")
tiledbsoma.io.from_anndata("small_dataset.tiledbsoma", adata, measurement_name="RNA")

obs_schema = ln.Schema(
    name="soma_obs_schema",
    features=[
        ln.Feature(name="cell_type_by_expert", dtype=bt.CellType).save(),
        ln.Feature(name="cell_type_by_model", dtype=bt.CellType).save(),
    ],
).save()

var_schema = ln.Schema(
    name="soma_var_schema",
    features=[
        ln.Feature(name="var_id", dtype=bt.Gene.ensembl_gene_id).save(),
    ],
    coerce=True,
).save()

soma_schema = ln.Schema(
    name="soma_experiment_schema",
    otype="tiledbsoma",
    slots={
        "obs": obs_schema,
        "ms:RNA.T": var_schema,
    },
).save()

with soma.Experiment.open("small_dataset.tiledbsoma") as experiment:
    curator = ln.curators.TiledbsomaExperimentCurator(experiment, soma_schema)
    curator.validate()
    artifact = curator.save_artifact(
        key="examples/soma_experiment.tiledbsoma",
        description="SOMA experiment with schema validation",
    )
assert artifact.schema == soma_schema
artifact.describe()

See also

from_tiledbsoma().

Low-level module

core

Curator utilities.