Jupyter Notebook

Spatial

Here, you’ll learn how to manage spatial datasets:

  1. curate and ingest spatial data (spatial1/4)

  2. query & analyze spatial datasets (spatial2/4)

  3. load the collection into memory & train a ML model (spatial3/4)

  4. create and share interactive visualizations with vitessce (spatial4/4)

Spatial omics data integrates molecular profiling (e.g., transcriptomics, proteomics) with spatial information, preserving the spatial organization of cells and tissues. It enables high-resolution mapping of molecular activity within biological contexts, crucial for understanding cellular interactions and microenvironments.

Many different spatial technologies such as multiplexed imaging, spatial transcriptomics, spatial proteomics, whole-slide imaging, spatial metabolomics, and 3D tissue reconstruction exist which can all be stored in the SpatialData data framework. For more details we refer to the original publication:

Marconato, L., Palla, G., Yamauchi, K.A. et al. SpatialData: an open and universal data framework for spatial omics. Nat Methods 22, 58–62 (2025). https://doi.org/10.1038/s41592-024-02212-x

Note

A collection of curated spatial datasets in SpatialData format is available on the scverse/spatialdata-db instance.

spatial data vs SpatialData terminology

When we mention spatial data, we refer to data from spatial assays, such as spatial transcriptomics or proteomics, that includes spatial coordinates to represent the organization of molecular features in tissue. When we refer SpatialData, we mean spatial omics data stored in the scverse SpatialData framework.

# pip install 'lamindb[jupyter,bionty]' spatialdata spatialdata-plot
!lamin init --storage ./test-spatial --modules bionty
import lamindb as ln
import bionty as bt
import warnings

warnings.filterwarnings("ignore")

spatial_guide_datasets = ln.Project(name="spatial guide datasets").save()
ln.track(project=spatial_guide_datasets.name)

Creating artifacts

lamindb provides a from_spatialdata() method to create Artifact from SpatialData objects.

example_blobs_sdata = ln.core.datasets.spatialdata_blobs()
example_blobs_sdata
blobs_af = ln.Artifact.from_spatialdata(
    example_blobs_sdata, key="example_blobs.zarr"
).save()
blobs_af
# SpatialData Artifacts can easily be loaded back into memory
example_blobs_in_memory = blobs_af.load()
example_blobs_in_memory
# SpatialData artifacts have built-in lineage tracking like all other artifacts
blobs_af.view_lineage()

Validating annotations

For the remainder of the guide, we will work with two 10X Xenium and a 10X Visium H&E image dataset.

More details can be found in the ingestion notebook.

Metadata is stored in two places in the SpatialData object:

  1. Dataset level metadata is stored in sdata.attrs["sample"].

  2. Measurement specific metadata is stored in the associated tables in sdata.tables.

Define a schema

We define a lamindb.Schema to curate both sample and table metadata.

Curating different spatial technologies

Reading different spatial technologies into SpatialData objects can result in very different objects with different metadata. Therefore, it can be useful to define technology specific Schemas by reusing Schema components.

# define sample schema
spatial_sample_schema = ln.Schema(
    name="Spatial sample level",
    features=[
        ln.Feature(name="organism", dtype=bt.Organism).save(),
        ln.Feature(name="assay", dtype=bt.ExperimentalFactor).save(),
        ln.Feature(name="disease", dtype=bt.Disease).save(),
        ln.Feature(name="tissue", dtype=bt.Tissue).save(),
    ],
    coerce_dtype=True,
).save()

# define table obs schema
spatial_obs_schema = ln.Schema(
    name="Spatial obs level",
    features=[
        ln.Feature(name="celltype_major", dtype=bt.CellType, nullable=True)
        .save()
        .with_config(optional=True),
    ],
    coerce_dtype=True,
).save()

# define table var schema
spatial_var_schema = ln.Schema(
    name="Spatial var level", itype=bt.Gene.ensembl_gene_id, dtype=int
).save()

# define composite schema
spatial_schema = ln.Schema(
    name="Spatial schema",
    otype="SpatialData",
    components={
        "sample": spatial_sample_schema,
        "table:obs": spatial_obs_schema,
        "table:var": spatial_var_schema,
    },
).save()

Curate a Xenium dataset

# load first of two cropped Xenium datasets
xenium_aligned_1_sdata = (
    ln.Artifact.using("laminlabs/lamindata")
    .get(key="xenium_aligned_1_guide_min.zarr")
    .load()
)
xenium_aligned_1_sdata
xenium_curator = ln.curators.SpatialDataCurator(xenium_aligned_1_sdata, spatial_schema)
try:
    xenium_curator.validate()
except ln.errors.ValidationError as e:
    print(e)
xenium_aligned_1_sdata.tables["table"].obs["celltype_major"] = (
    xenium_aligned_1_sdata.tables["table"]
    .obs["celltype_major"]
    .replace(
        {
            "CAFs": "cancer associated fibroblast",
            "Endothelial": "endothelial cell",
            "Myeloid": "myeloid cell",
            "PVL": "perivascular cell",
            "T-cells": "T cell",
            "B-cells": "B cell",
            "Normal Epithelial": "epithelial cell",
            "Plasmablasts": "plasmablast",
            "Cancer Epithelial": "neoplastic epithelial cell",
        }
    )
)
try:
    xenium_curator.validate()
except ln.errors.ValidationError as e:
    print(e)
xenium_curator.slots["table:obs"].cat.add_new_from("celltype_major")
xenium_1_curated_af = xenium_curator.save_artifact(key="xenium1.zarr")
xenium_1_curated_af.describe()

Curate additional Xenium datasets

We can reuse the same curator for a second Xenium dataset:

xenium_aligned_2_sdata = (
    ln.Artifact.using("laminlabs/lamindata")
    .get(key="xenium_aligned_2_guide_min.zarr")
    .load()
)

xenium_aligned_2_sdata.tables["table"].obs["celltype_major"] = (
    xenium_aligned_2_sdata.tables["table"]
    .obs["celltype_major"]
    .replace(
        {
            "CAFs": "cancer associated fibroblast",
            "Endothelial": "endothelial cell",
            "Myeloid": "myeloid cell",
            "PVL": "perivascular cell",
            "T-cells": "T cell",
            "B-cells": "B cell",
            "Normal Epithelial": "epithelial cell",
            "Plasmablasts": "plasmablast",
            "Cancer Epithelial": "neoplastic epithelial cell",
        }
    )
)
xenium_curator = ln.curators.SpatialDataCurator(xenium_aligned_2_sdata, spatial_schema)
try:
    xenium_curator.validate()
except ln.errors.ValidationError as e:
    print(e)
xenium_2_curated_af = xenium_curator.save_artifact(key="xenium2.zarr")

Curate Visium datasets

Analogously, we can define a Schema and Curator for Visium datasets:

visium_aligned_sdata = (
    ln.Artifact.using("laminlabs/lamindata")
    .get(key="visium_aligned_guide_min.zarr")
    .load()
)
visium_aligned_sdata
visium_curator = ln.curators.SpatialDataCurator(visium_aligned_sdata, spatial_schema)
try:
    visium_curator.validate()
except ln.errors.ValidationError as e:
    print(e)
visium_curator.slots["table:var"].cat.add_new_from("columns")
visium_curated_af = visium_curator.save_artifact(key="visium.zarr")
visium_curated_af.describe()

Overview of the curated datasets

visium_curated_af.view_lineage()
ln.Artifact.df(features=True, include=["hash", "size"])
ln.finish()