Spatial¶

Here, you’ll learn how to manage spatial datasets:

curate and ingest spatial data ()
query & analyze spatial datasets ()
load the collection into memory & train a ML model ()
create and share interactive visualizations with vitessce ()

Spatial omics data integrates molecular profiling (e.g., transcriptomics, proteomics) with spatial information, preserving the spatial organization of cells and tissues. It enables high-resolution mapping of molecular activity within biological contexts, crucial for understanding cellular interactions and microenvironments.

Many different spatial technologies such as multiplexed imaging, spatial transcriptomics, spatial proteomics, whole-slide imaging, spatial metabolomics, and 3D tissue reconstruction exist which can all be stored in the SpatialData data framework. For more details we refer to the original publication:

Marconato, L., Palla, G., Yamauchi, K.A. et al. SpatialData: an open and universal data framework for spatial omics. Nat Methods 22, 58–62 (2025). https://doi.org/10.1038/s41592-024-02212-x

Note

A collection of curated spatial datasets in SpatialData format is available on the scverse/spatialdata-db instance.

# pip install 'lamindb[jupyter,bionty]' spatialdata spatialdata-plot
!lamin init --storage ./test-spatial --modules bionty

import lamindb as ln
import bionty as bt
import spatialdata as sd
import warnings

warnings.filterwarnings("ignore")

spatial_guide_datasets = ln.Project(name="spatial guide datasets").save()
ln.track(project=spatial_guide_datasets)

Creating artifacts¶

You can use from_spatialdata() method to create an Artifact object from a SpatialData object.

example_blobs_sdata = ln.core.datasets.spatialdata_blobs()
example_blobs_sdata

blobs_af = ln.Artifact.from_spatialdata(
    example_blobs_sdata, key="example_blobs.zarr"
).save()
blobs_af

To retrieve the object back from the database you can, e.g., query by key.

example_blobs_sdata = ln.Artifact.get(key="example_blobs.zarr")
local_zarr_path = blobs_af.cache()  # returns a local path to the cached .zarr store
example_blobs_sdata = (
    blobs_af.load()  # calls sd.read_zarr() on a locally cached .zarr store
)

To see data lineage.

blobs_af.view_lineage()

Curating artifacts¶

For the remainder of the guide, we will work with two 10X Xenium and a 10X Visium H&E image datasets that were ingested in raw form here.

Metadata is stored in two places in the SpatialData object:

Dataset level metadata is stored in sdata.attrs["sample"].
Measurement specific metadata is stored in the associated tables in sdata.tables.

Define a schema¶

We define a lamindb.Schema to curate both sample and table metadata.

# define features
ln.Feature(name="organism", dtype=bt.Organism).save()
ln.Feature(name="assay", dtype=bt.ExperimentalFactor).save()
ln.Feature(name="disease", dtype=bt.Disease).save()
ln.Feature(name="tissue", dtype=bt.Tissue).save()
ln.Feature(name="celltype_major", dtype=bt.CellType, nullable=True).save()

# define simple schemas
flexible_metadata_schema = ln.Schema(
    name="Flexible metadata", itype=ln.Feature, coerce_dtype=True
).save()
ensembl_gene_ids = ln.Schema(
    name="Spatial var level (Ensembl gene id)", itype=bt.Gene.ensembl_gene_id
).save()

# define composite schema
spatial_schema = ln.Schema(
    name="Spatialdata schema (flexible)",
    otype="SpatialData",
    slots={
        "attrs:sample": flexible_metadata_schema,
        "tables:table:obs": flexible_metadata_schema,
        "tables:table:var.T": ensembl_gene_ids,
    },
).save()

Curate a Xenium dataset¶

# load first of two cropped Xenium datasets
xenium_aligned_1_sdata = (
    ln.Artifact.using("laminlabs/lamindata")
    .get(key="xenium_aligned_1_guide_min.zarr")
    .load()
)
xenium_aligned_1_sdata

xenium_curator = ln.curators.SpatialDataCurator(xenium_aligned_1_sdata, spatial_schema)
try:
    xenium_curator.validate()
except ln.errors.ValidationError as error:
    print(error)

xenium_aligned_1_sdata.tables["table"].obs["celltype_major"] = (
    xenium_aligned_1_sdata.tables["table"]
    .obs["celltype_major"]
    .replace(
        {
            "CAFs": "cancer associated fibroblast",
            "Endothelial": "endothelial cell",
            "Myeloid": "myeloid cell",
            "PVL": "perivascular cell",
            "T-cells": "T cell",
            "B-cells": "B cell",
            "Normal Epithelial": "epithelial cell",
            "Plasmablasts": "plasmablast",
            "Cancer Epithelial": "neoplastic epithelial cell",
        }
    )
)

try:
    xenium_curator.validate()
except ln.errors.ValidationError as error:
    print(error)

xenium_curator.slots["tables:table:obs"].cat.add_new_from("celltype_major")

xenium_1_curated_af = xenium_curator.save_artifact(key="xenium1.zarr")

xenium_1_curated_af.describe()

Show code cell output Hide code cell output

Artifact .zarr · SpatialData · dataset
├── General
│   ├── key: xenium1.zarr
│   ├── uid: MvKvZxGKjCktXDVT0000          hash: aD4ScrMXwZlbzxoOzh5-dw
│   ├── size: 33.5 MB                      transform: spatial.ipynb
│   ├── space: all                         branch: all
│   ├── created_by: testuser1              created_at: 2025-07-29 19:22:37
│   ├── n_files: 148
│   └── storage path: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-spatial/xenium1.zarr
├── Dataset features
│   ├── attrs:sample • 4                [Feature]                                                                  
│   │   assay                           cat[bionty.ExperimentalFactor]     10x Xenium                              
│   │   disease                         cat[bionty.Disease]                ductal breast carcinoma in situ         
│   │   organism                        cat[bionty.Organism]               human                                   
│   │   tissue                          cat[bionty.Tissue]                 breast                                  
│   ├── tables:table:obs • 1            [Feature]                                                                  
│   │   celltype_major                  cat[bionty.CellType]               B cell, T cell, cancer associated fibro…
│   └── tables:table:var.T • 313        [bionty.Gene.ensembl_gene_id]                                              
│       ABCC11                          num                                                                        
│       ACTA2                           num                                                                        
│       ACTG2                           num                                                                        
│       ADAM9                           num                                                                        
│       ADGRE5                          num                                                                        
│       ADH1B                           num                                                                        
│       ADIPOQ                          num                                                                        
│       AGR3                            num                                                                        
│       AHSP                            num                                                                        
│       AIF1                            num                                                                        
│       AKR1C1                          num                                                                        
│       AKR1C3                          num                                                                        
│       ALDH1A3                         num                                                                        
│       ANGPT2                          num                                                                        
│       ANKRD28                         num                                                                        
│       ANKRD29                         num                                                                        
│       ANKRD30A                        num                                                                        
│       APOBEC3A                        num                                                                        
│       APOBEC3B                        num                                                                        
│       APOC1                           num                                                                        
└── Labels
    └── .projects                       Project                            spatial guide datasets                  
        .organisms                      bionty.Organism                    human                                   
        .tissues                        bionty.Tissue                      breast                                  
        .cell_types                     bionty.CellType                    endothelial cell, myeloid cell, perivas…
        .diseases                       bionty.Disease                     ductal breast carcinoma in situ         
        .experimental_factors           bionty.ExperimentalFactor          10x Xenium

Curate additional Xenium datasets¶

We can reuse the same curator for a second Xenium dataset:

xenium_aligned_2_sdata = (
    ln.Artifact.using("laminlabs/lamindata")
    .get(key="xenium_aligned_2_guide_min.zarr")
    .load()
)

xenium_aligned_2_sdata.tables["table"].obs["celltype_major"] = (
    xenium_aligned_2_sdata.tables["table"]
    .obs["celltype_major"]
    .replace(
        {
            "CAFs": "cancer associated fibroblast",
            "Endothelial": "endothelial cell",
            "Myeloid": "myeloid cell",
            "PVL": "perivascular cell",
            "T-cells": "T cell",
            "B-cells": "B cell",
            "Normal Epithelial": "epithelial cell",
            "Plasmablasts": "plasmablast",
            "Cancer Epithelial": "neoplastic epithelial cell",
        }
    )
)

xenium_2_curated_af = ln.Artifact.from_spatialdata(
    xenium_aligned_2_sdata, key="xenium2.zarr", schema=spatial_schema
).save()

Show code cell output Hide code cell output

INFO     The SpatialData object is not self-contained (i.e. it contains some elements that are Dask-backed from    
         locations outside /home/runner/.cache/lamindb/7EO3dqoA8ZmWbpEJ0000.zarr). Please see the documentation of 
         `is_self_contained()` to understand the implications of working with SpatialData objects that are not     
         self-contained.                                                                                           

INFO     The Zarr backing store has been changed from                                                              
         /home/runner/.cache/lamindb/lamindata/xenium_aligned_2_guide_min.zarr the new file path:                  
         /home/runner/.cache/lamindb/7EO3dqoA8ZmWbpEJ0000.zarr                                                     

INFO     The SpatialData object is not self-contained (i.e. it contains some elements that are Dask-backed from    
         locations outside /home/runner/.cache/lamindb/7EO3dqoA8ZmWbpEJ0000.zarr). Please see the documentation of 
         `is_self_contained()` to understand the implications of working with SpatialData objects that are not     
         self-contained.                                                                                           

! 1 term not validated in feature 'columns' in slot 'attrs:sample': 'panel'
    → fix typos, remove non-existent values, or save terms via: curator.slots['attrs:sample'].cat.add_new_from('columns')

! 10 terms not validated in feature 'columns' in slot 'tables:table:obs': 'cell_id', 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'region', 'dataset', 'celltype_minor'
    → fix typos, remove non-existent values, or save terms via: curator.slots['tables:table:obs'].cat.add_new_from('columns')

→ returning existing schema with same hash: Schema(uid='BXqb9h441UhgKgy3', n=4, is_type=False, itype='Feature', hash='K_VgCYT4ZU-lBVR8qMFfJQ', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-07-29 19:22:37 UTC)

→ returning existing schema with same hash: Schema(uid='Afuf6en0dFWwQY2h', n=1, is_type=False, itype='Feature', hash='oCTgxOAqEDU6ZWiLlzG4rw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-07-29 19:22:37 UTC)

→ returning existing schema with same hash: Schema(uid='LoP5qyKc4cPnC43H', n=313, is_type=False, itype='bionty.Gene.ensembl_gene_id', dtype='num', hash='FFFt-2qmlVALrsMUPNoH0g', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-07-29 19:22:37 UTC)

xenium_2_curated_af.describe()

Show code cell output Hide code cell output

Artifact .zarr · SpatialData · dataset
├── General
│   ├── key: xenium2.zarr
│   ├── uid: 7EO3dqoA8ZmWbpEJ0000          hash: 0D80g3vvvHi1iQcJ3fa3KQ
│   ├── size: 38.9 MB                      transform: spatial.ipynb
│   ├── space: all                         branch: all
│   ├── created_by: testuser1              created_at: 2025-07-29 19:22:40
│   ├── n_files: 177
│   └── storage path: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-spatial/xenium2.zarr
├── Dataset features
│   ├── attrs:sample • 4                [Feature]                                                                  
│   │   assay                           cat[bionty.ExperimentalFactor]     10x Xenium                              
│   │   disease                         cat[bionty.Disease]                ductal breast carcinoma in situ         
│   │   organism                        cat[bionty.Organism]               human                                   
│   │   tissue                          cat[bionty.Tissue]                 breast                                  
│   ├── tables:table:obs • 1            [Feature]                                                                  
│   │   celltype_major                  cat[bionty.CellType]               B cell, T cell, cancer associated fibro…
│   └── tables:table:var.T • 313        [bionty.Gene.ensembl_gene_id]                                              
│       ABCC11                          num                                                                        
│       ACTA2                           num                                                                        
│       ACTG2                           num                                                                        
│       ADAM9                           num                                                                        
│       ADGRE5                          num                                                                        
│       ADH1B                           num                                                                        
│       ADIPOQ                          num                                                                        
│       AGR3                            num                                                                        
│       AHSP                            num                                                                        
│       AIF1                            num                                                                        
│       AKR1C1                          num                                                                        
│       AKR1C3                          num                                                                        
│       ALDH1A3                         num                                                                        
│       ANGPT2                          num                                                                        
│       ANKRD28                         num                                                                        
│       ANKRD29                         num                                                                        
│       ANKRD30A                        num                                                                        
│       APOBEC3A                        num                                                                        
│       APOBEC3B                        num                                                                        
│       APOC1                           num                                                                        
└── Labels
    └── .projects                       Project                            spatial guide datasets                  
        .organisms                      bionty.Organism                    human                                   
        .tissues                        bionty.Tissue                      breast                                  
        .cell_types                     bionty.CellType                    endothelial cell, myeloid cell, perivas…
        .diseases                       bionty.Disease                     ductal breast carcinoma in situ         
        .experimental_factors           bionty.ExperimentalFactor          10x Xenium

Curate Visium datasets¶

Analogously, we can define a Schema and Curator for Visium datasets:

visium_aligned_sdata = (
    ln.Artifact.using("laminlabs/lamindata")
    .get(key="visium_aligned_guide_min.zarr")
    .load()
)
visium_aligned_sdata

visium_curated_af = ln.Artifact.from_spatialdata(
    visium_aligned_sdata, key="visium.zarr", schema=spatial_schema
).save()

Show code cell output Hide code cell output

INFO     The SpatialData object is not self-contained (i.e. it contains some elements that are Dask-backed from    
         locations outside /home/runner/.cache/lamindb/1LR9NwyOVKhyhEFn0000.zarr). Please see the documentation of 
         `is_self_contained()` to understand the implications of working with SpatialData objects that are not     
         self-contained.                                                                                           

INFO     The Zarr backing store has been changed from                                                              
         /home/runner/.cache/lamindb/lamindata/visium_aligned_guide_min.zarr the new file path:                    
         /home/runner/.cache/lamindb/1LR9NwyOVKhyhEFn0000.zarr                                                     

INFO     The SpatialData object is not self-contained (i.e. it contains some elements that are Dask-backed from    
         locations outside /home/runner/.cache/lamindb/1LR9NwyOVKhyhEFn0000.zarr). Please see the documentation of 
         `is_self_contained()` to understand the implications of working with SpatialData objects that are not     
         self-contained.                                                                                           

! 7 terms not validated in feature 'columns' in slot 'tables:table:obs': 'in_tissue', 'array_row', 'array_col', 'spot_id', 'region', 'dataset', 'clone'
    → fix typos, remove non-existent values, or save terms via: curator.slots['tables:table:obs'].cat.add_new_from('columns')

! no values were validated for columns!

! Starting bulk_create for 17761 Gene records in batches of 10000

! 17 terms not validated in feature 'columns' in slot 'tables:table:var.T': 'ENSG00000284824', 'ENSG00000240224', 'ENSG00000243135', 'ENSG00000112096', 'ENSG00000285162', 'ENSG00000183729', 'ENSG00000285447', 'ENSG00000130723', 'ENSG00000274897', 'ENSG00000215271', 'ENSG00000221995', 'ENSG00000183791', 'ENSG00000263264', 'ENSG00000182584', 'ENSG00000184258', 'ENSG00000277203', 'ENSG00000286265'
    → fix typos, remove non-existent values, or save terms via: curator.slots['tables:table:var.T'].cat.add_new_from('columns')

→ returning existing schema with same hash: Schema(uid='BXqb9h441UhgKgy3', n=4, is_type=False, itype='Feature', hash='K_VgCYT4ZU-lBVR8qMFfJQ', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-07-29 19:22:37 UTC)

→ returning existing schema with same hash: Schema(uid='nh5gXKqBiYwFqlAO', name='Flexible metadata', is_type=False, itype='Feature', hash='jKTX5yzmVwIdJdHH2ZfMAA', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-07-29 19:22:27 UTC)

→ not annotating with 18068 features for slot tables:table:var.T as it exceeds 1000 (ln.settings.annotation.n_max_records)

visium_curated_af.describe()

Show code cell output Hide code cell output

Artifact .zarr · SpatialData · dataset
├── General
│   ├── key: visium.zarr
│   ├── uid: 1LR9NwyOVKhyhEFn0000          hash: GriII9tOLvc_1zBZnzvoew
│   ├── size: 5.5 MB                       transform: spatial.ipynb
│   ├── space: all                         branch: all
│   ├── created_by: testuser1              created_at: 2025-07-29 19:22:52
│   ├── n_files: 136
│   └── storage path: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-spatial/visium.zarr
├── Dataset features
│   ├── attrs:sample • 4                [Feature]                                                                  
│   │   assay                           cat[bionty.ExperimentalFactor]     Visium Spatial Gene Expression          
│   │   disease                         cat[bionty.Disease]                ductal breast carcinoma in situ         
│   │   organism                        cat[bionty.Organism]               human                                   
│   │   tissue                          cat[bionty.Tissue]                 breast                                  
│   ├── tables:table:obs • -1           [Feature]                                                                  
│   └── tables:table:var.T • 18068      [bionty.Gene.ensembl_gene_id]                                              
└── Labels
    └── .projects                       Project                            spatial guide datasets                  
        .organisms                      bionty.Organism                    human                                   
        .tissues                        bionty.Tissue                      breast                                  
        .diseases                       bionty.Disease                     ductal breast carcinoma in situ         
        .experimental_factors           bionty.ExperimentalFactor          Visium Spatial Gene Expression

Overview of the curated datasets¶

visium_curated_af.view_lineage()

_images/c6e60cfb3f8bc8dbcc060307e5f8c0d7f9a704adfd3e324424d0c7145235647b.svg

ln.Artifact.df(features=True, include=["hash", "size"])

→ queried for all categorical features with dtype ULabel or Record and non-categorical features: (0) []

	uid	key	size	hash
id
7	1LR9NwyOVKhyhEFn0000	visium.zarr	5810515	GriII9tOLvc_1zBZnzvoew
5	7EO3dqoA8ZmWbpEJ0000	xenium2.zarr	40823410	0D80g3vvvHi1iQcJ3fa3KQ
3	MvKvZxGKjCktXDVT0000	xenium1.zarr	35116259	aD4ScrMXwZlbzxoOzh5-dw
1	3bh0TBW1dtiOjUsG0000	example_blobs.zarr	12122461	F3Z07qDz0IT1WauiNHNwMg
4	KFhRNPqcdoxBCNZt0001	xenium_aligned_2_guide_min.zarr	40822308	oH569Lh4koYRB1I6AatnGQ
2	kVMuYil81BHTwQ9G0001	xenium_aligned_1_guide_min.zarr	35115305	8f1qC6IkpSvFw2H8TdhplQ
6	bjH534dxVi1drmLZ0001	visium_aligned_guide_min.zarr	5809684	a8rVkf_kjp9To9KI06i03g

ln.finish()