Spatial¶
Here, you’ll learn how to manage spatial datasets:
Spatial omics data integrates molecular profiling (e.g., transcriptomics, proteomics) with spatial information, preserving the spatial organization of cells and tissues. It enables high-resolution mapping of molecular activity within biological contexts, crucial for understanding cellular interactions and microenvironments.
Many different spatial technologies such as multiplexed imaging, spatial transcriptomics, spatial proteomics, whole-slide imaging, spatial metabolomics, and 3D tissue reconstruction exist which can all be stored in the SpatialData data framework. For more details we refer to the original publication:
Marconato, L., Palla, G., Yamauchi, K.A. et al. SpatialData: an open and universal data framework for spatial omics. Nat Methods 22, 58–62 (2025). https://doi.org/10.1038/s41592-024-02212-x
Note
A collection of curated spatial datasets in SpatialData format is available on the scverse/spatialdata-db instance.
spatial data vs SpatialData terminology
When we mention spatial data, we refer to data from spatial assays, such as spatial transcriptomics or proteomics, that includes spatial coordinates to represent the organization of molecular features in tissue. When we refer SpatialData, we mean spatial omics data stored in the scverse SpatialData framework.
# pip install 'lamindb[jupyter,bionty]' spatialdata spatialdata-plot
!lamin init --storage ./test-spatial --modules bionty
Show code cell output
→ initialized lamindb: testuser1/test-spatial
import lamindb as ln
import bionty as bt
import spatialdata as sd
import warnings
warnings.filterwarnings("ignore")
spatial_guide_datasets = ln.Project(name="spatial guide datasets").save()
ln.track(project=spatial_guide_datasets.name)
Show code cell output
→ connected lamindb: testuser1/test-spatial
→ created Transform('cKAZGIOEX1NM0000'), started new Run('DUOkj2vH...') at 2025-04-18 11:45:41 UTC
→ notebook imports: bionty==1.3.0 lamindb==1.4.0 spatialdata==0.3.0
Creating artifacts¶
You can use from_spatialdata()
method to create an Artifact
object from a SpatialData
object.
example_blobs_sdata = ln.core.datasets.spatialdata_blobs()
example_blobs_sdata
Show code cell output
SpatialData object
├── Images
│ ├── 'blobs_image': DataArray[cyx] (3, 512, 512)
│ └── 'blobs_multiscale_image': DataTree[cyx] (3, 512, 512), (3, 256, 256), (3, 128, 128)
├── Labels
│ ├── 'blobs_labels': DataArray[yx] (512, 512)
│ └── 'blobs_multiscale_labels': DataTree[yx] (512, 512), (256, 256), (128, 128)
├── Points
│ └── 'blobs_points': DataFrame with shape: (<Delayed>, 4) (2D points)
├── Shapes
│ ├── 'blobs_circles': GeoDataFrame shape: (5, 2) (2D shapes)
│ ├── 'blobs_multipolygons': GeoDataFrame shape: (2, 1) (2D shapes)
│ └── 'blobs_polygons': GeoDataFrame shape: (5, 1) (2D shapes)
└── Tables
└── 'table': AnnData (26, 3)
with coordinate systems:
▸ 'global', with elements:
blobs_image (Images), blobs_multiscale_image (Images), blobs_labels (Labels), blobs_multiscale_labels (Labels), blobs_points (Points), blobs_circles (Shapes), blobs_multipolygons (Shapes), blobs_polygons (Shapes)
blobs_af = ln.Artifact.from_spatialdata(
example_blobs_sdata, key="example_blobs.zarr"
).save()
blobs_af
Show code cell output
INFO The Zarr backing store has been changed from None the new file path:
/home/runner/.cache/lamindb/aWDakDvmJ3W6DEKl0000.zarr
Artifact(uid='aWDakDvmJ3W6DEKl0000', is_latest=True, key='example_blobs.zarr', suffix='.zarr', kind='dataset', otype='SpatialData', size=12121376, hash='VfyqWKmYtGl46BehCw_UQw', n_files=113, space_id=1, storage_id=1, run_id=1, created_by_id=1, created_at=2025-04-18 11:45:44 UTC)
To retrieve the object back from the database you can, e.g., query by key
.
example_blobs_sdata = ln.Artifact.get(key="example_blobs.zarr")
local_zarr_path = blobs_af.cache() # returns a local path to the cached .zarr store
example_blobs_sdata = (
blobs_af.load() # calls sd.read_zarr() on a locally cached .zarr store
)
To see data lineage.
blobs_af.view_lineage()
Validating annotations¶
For the remainder of the guide, we will work with two 10X Xenium and a 10X Visium H&E image dataset.
More details can be found in the ingestion notebook.
Metadata is stored in two places in the SpatialData object:
Dataset level metadata is stored in
sdata.attrs["sample"]
.Measurement specific metadata is stored in the associated tables in
sdata.tables
.
Define a schema¶
We define a lamindb.Schema
to curate both sample and table metadata.
Curating different spatial technologies
Reading different spatial technologies into SpatialData objects can result in very different objects with different metadata. Therefore, it can be useful to define technology specific Schemas by reusing Schema components.
# define sample schema
spatial_sample_schema = ln.Schema(
name="Spatial sample level",
features=[
ln.Feature(name="organism", dtype=bt.Organism).save(),
ln.Feature(name="assay", dtype=bt.ExperimentalFactor).save(),
ln.Feature(name="disease", dtype=bt.Disease).save(),
ln.Feature(name="tissue", dtype=bt.Tissue).save(),
],
coerce_dtype=True,
).save()
# define table obs schema
spatial_obs_schema = ln.Schema(
name="Spatial obs level",
features=[
ln.Feature(name="celltype_major", dtype=bt.CellType, nullable=True)
.save()
.with_config(optional=True),
],
coerce_dtype=True,
).save()
# define table var schema
spatial_var_schema = ln.Schema(
name="Spatial var level", itype=bt.Gene.ensembl_gene_id, dtype=int
).save()
# define composite schema
spatial_schema = ln.Schema(
name="Spatial schema",
otype="SpatialData",
components={
"sample": spatial_sample_schema,
"table:obs": spatial_obs_schema,
"table:var": spatial_var_schema,
},
).save()
Curate a Xenium dataset¶
# load first of two cropped Xenium datasets
xenium_aligned_1_sdata = (
ln.Artifact.using("laminlabs/lamindata")
.get(key="xenium_aligned_1_guide_min.zarr")
.load()
)
xenium_aligned_1_sdata
Show code cell output
→ completing transfer to track Artifact('kVMuYil8') as input
→ mapped records:
→ transferred records: Artifact(uid='kVMuYil81BHTwQ9G0001'), Storage(uid='D9BilDV2')
SpatialData object, with associated Zarr store: /home/runner/.cache/lamindb/lamindata/xenium_aligned_1_guide_min.zarr
├── Images
│ ├── 'morphology_focus': DataTree[cyx] (1, 2310, 3027), (1, 1155, 1514), (1, 578, 757), (1, 288, 379), (1, 145, 189)
│ └── 'morphology_mip': DataTree[cyx] (1, 2310, 3027), (1, 1155, 1514), (1, 578, 757), (1, 288, 379), (1, 145, 189)
├── Points
│ └── 'transcripts': DataFrame with shape: (<Delayed>, 8) (3D points)
├── Shapes
│ ├── 'cell_boundaries': GeoDataFrame shape: (1899, 1) (2D shapes)
│ └── 'cell_circles': GeoDataFrame shape: (1812, 2) (2D shapes)
└── Tables
└── 'table': AnnData (1812, 313)
with coordinate systems:
▸ 'aligned', with elements:
morphology_focus (Images), morphology_mip (Images), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes)
▸ 'global', with elements:
morphology_focus (Images), morphology_mip (Images), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes)
xenium_curator = ln.curators.SpatialDataCurator(xenium_aligned_1_sdata, spatial_schema)
try:
xenium_curator.validate()
except ln.errors.ValidationError as e:
print(e)
Show code cell output
! 9 terms are not validated: 'CAFs', 'Endothelial', 'Myeloid', 'PVL', 'T-cells', 'B-cells', 'Normal Epithelial', 'Plasmablasts', 'Cancer Epithelial'
→ fix typos, remove non-existent values, or save terms via .add_new_from("celltype_major")
xenium_aligned_1_sdata.tables["table"].obs["celltype_major"] = (
xenium_aligned_1_sdata.tables["table"]
.obs["celltype_major"]
.replace(
{
"CAFs": "cancer associated fibroblast",
"Endothelial": "endothelial cell",
"Myeloid": "myeloid cell",
"PVL": "perivascular cell",
"T-cells": "T cell",
"B-cells": "B cell",
"Normal Epithelial": "epithelial cell",
"Plasmablasts": "plasmablast",
"Cancer Epithelial": "neoplastic epithelial cell",
}
)
)
try:
xenium_curator.validate()
except ln.errors.ValidationError as e:
print(e)
Show code cell output
! 2 terms are not validated: 'cancer associated fibroblast', 'neoplastic epithelial cell'
→ fix typos, remove non-existent values, or save terms via .add_new_from("celltype_major")
xenium_curator.slots["table:obs"].cat.add_new_from("celltype_major")
xenium_1_curated_af = xenium_curator.save_artifact(key="xenium1.zarr")
Show code cell output
INFO The SpatialData object is not self-contained (i.e. it contains some elements that are Dask-backed from
locations outside /home/runner/.cache/lamindb/Nbgkw6rtcBS7FhZk0000.zarr). Please see the documentation of
`is_self_contained()` to understand the implications of working with SpatialData objects that are not
self-contained.
INFO The Zarr backing store has been changed from
/home/runner/.cache/lamindb/lamindata/xenium_aligned_1_guide_min.zarr the new file path:
/home/runner/.cache/lamindb/Nbgkw6rtcBS7FhZk0000.zarr
→ returning existing schema with same hash: Schema(uid='V1Bw52ddb8UY6Ao0dHtV', name='Spatial sample level', n=4, itype='Feature', is_type=False, hash='K3ADJU0uMiuKGmn_E_LAHw', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-18 11:45:45 UTC)
! 10 unique terms (90.90%) are not validated for name: 'cell_id', 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'region', 'dataset', 'celltype_minor'
xenium_1_curated_af.describe()
Show code cell output
Artifact .zarr/SpatialData ├── General │ ├── .uid = 'Nbgkw6rtcBS7FhZk0000' │ ├── .key = 'xenium1.zarr' │ ├── .size = 35115343 │ ├── .hash = '4vw9wN84jSk1iUUN0hFMEg' │ ├── .n_files = 145 │ ├── .path = /home/runner/work/lamin-usecases/lamin-usecases/docs/test-spatial/.lamindb/Nbgkw6rtcBS7FhZk.zarr │ ├── .created_by = testuser1 (Test User1) │ ├── .created_at = 2025-04-18 11:45:59 │ └── .transform = 'Spatial' ├── Dataset features │ ├── sample • 4 [Feature] │ │ assay cat[bionty.ExperimentalF… 10x Xenium │ │ disease cat[bionty.Disease] ductal breast carcinoma in situ │ │ organism cat[bionty.Organism] human │ │ tissue cat[bionty.Tissue] breast │ ├── ['table'].var • 313 [bionty.Gene] │ │ ABCC11 float │ │ ACTA2 float │ │ ACTG2 float │ │ ADAM9 float │ │ ADGRE5 float │ │ ADH1B float │ │ ADIPOQ float │ │ AGR3 float │ │ AHSP float │ │ AIF1 float │ │ AKR1C1 float │ │ AKR1C3 float │ │ ALDH1A3 float │ │ ANGPT2 float │ │ ANKRD28 float │ │ ANKRD29 float │ │ ANKRD30A float │ │ APOBEC3A float │ │ APOBEC3B float │ │ APOC1 float │ └── ['table'].obs • 1 [Feature] │ celltype_major cat[bionty.CellType] B cell, T cell, cancer associated fibrob… └── Labels └── .projects Project spatial guide datasets .organisms bionty.Organism human .tissues bionty.Tissue breast .cell_types bionty.CellType endothelial cell, myeloid cell, perivasc… .diseases bionty.Disease ductal breast carcinoma in situ .experimental_factors bionty.ExperimentalFactor 10x Xenium
Curate additional Xenium datasets¶
We can reuse the same curator for a second Xenium dataset:
xenium_aligned_2_sdata = (
ln.Artifact.using("laminlabs/lamindata")
.get(key="xenium_aligned_2_guide_min.zarr")
.load()
)
xenium_aligned_2_sdata.tables["table"].obs["celltype_major"] = (
xenium_aligned_2_sdata.tables["table"]
.obs["celltype_major"]
.replace(
{
"CAFs": "cancer associated fibroblast",
"Endothelial": "endothelial cell",
"Myeloid": "myeloid cell",
"PVL": "perivascular cell",
"T-cells": "T cell",
"B-cells": "B cell",
"Normal Epithelial": "epithelial cell",
"Plasmablasts": "plasmablast",
"Cancer Epithelial": "neoplastic epithelial cell",
}
)
)
Show code cell output
→ completing transfer to track Artifact('KFhRNPqc') as input
→ mapped records:
→ transferred records: Artifact(uid='KFhRNPqcdoxBCNZt0001')
xenium_curator = ln.curators.SpatialDataCurator(xenium_aligned_2_sdata, spatial_schema)
try:
xenium_curator.validate()
except ln.errors.ValidationError as e:
print(e)
xenium_2_curated_af = xenium_curator.save_artifact(key="xenium2.zarr")
Show code cell output
INFO The SpatialData object is not self-contained (i.e. it contains some elements that are Dask-backed from
locations outside /home/runner/.cache/lamindb/9i9RUtVUHWRoF9Je0000.zarr). Please see the documentation of
`is_self_contained()` to understand the implications of working with SpatialData objects that are not
self-contained.
INFO The Zarr backing store has been changed from
/home/runner/.cache/lamindb/lamindata/xenium_aligned_2_guide_min.zarr the new file path:
/home/runner/.cache/lamindb/9i9RUtVUHWRoF9Je0000.zarr
→ returning existing schema with same hash: Schema(uid='V1Bw52ddb8UY6Ao0dHtV', name='Spatial sample level', n=4, itype='Feature', is_type=False, hash='K3ADJU0uMiuKGmn_E_LAHw', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-18 11:45:45 UTC)
→ returning existing schema with same hash: Schema(uid='9UsotM9WH27ZWm2t9NPL', n=313, itype='bionty.Gene', is_type=False, dtype='float', hash='NWHyMKFRHimy-JOh9_oFCw', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-18 11:46:00 UTC)
! 10 unique terms (90.90%) are not validated for name: 'cell_id', 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'region', 'dataset', 'celltype_minor'
→ returning existing schema with same hash: Schema(uid='fEsMuYFG9HQwazxcgOk0', n=1, itype='Feature', is_type=False, otype='DataFrame', hash='K72fJ6oeytorwjkpgdkW-Q', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-18 11:46:00 UTC)
Curate Visium datasets¶
Analogously, we can define a Schema and Curator for Visium datasets:
visium_aligned_sdata = (
ln.Artifact.using("laminlabs/lamindata")
.get(key="visium_aligned_guide_min.zarr")
.load()
)
visium_aligned_sdata
Show code cell output
→ completing transfer to track Artifact('bjH534dx') as input
→ mapped records:
→ transferred records: Artifact(uid='bjH534dxVi1drmLZ0001')
SpatialData object, with associated Zarr store: /home/runner/.cache/lamindb/lamindata/visium_aligned_guide_min.zarr
├── Images
│ ├── 'CytAssist_FFPE_Human_Breast_Cancer_full_image': DataTree[cyx] (3, 1213, 952), (3, 607, 476), (3, 303, 238), (3, 152, 119), (3, 76, 60)
│ ├── 'CytAssist_FFPE_Human_Breast_Cancer_hires_image': DataArray[cyx] (3, 113, 88)
│ └── 'CytAssist_FFPE_Human_Breast_Cancer_lowres_image': DataArray[cyx] (3, 34, 27)
├── Shapes
│ └── 'CytAssist_FFPE_Human_Breast_Cancer': GeoDataFrame shape: (37, 2) (2D shapes)
└── Tables
└── 'table': AnnData (37, 18085)
with coordinate systems:
▸ 'aligned', with elements:
CytAssist_FFPE_Human_Breast_Cancer_full_image (Images), CytAssist_FFPE_Human_Breast_Cancer_hires_image (Images), CytAssist_FFPE_Human_Breast_Cancer_lowres_image (Images), CytAssist_FFPE_Human_Breast_Cancer (Shapes)
▸ 'downscaled_hires', with elements:
CytAssist_FFPE_Human_Breast_Cancer_hires_image (Images), CytAssist_FFPE_Human_Breast_Cancer (Shapes)
▸ 'downscaled_lowres', with elements:
CytAssist_FFPE_Human_Breast_Cancer_lowres_image (Images), CytAssist_FFPE_Human_Breast_Cancer (Shapes)
▸ 'global', with elements:
CytAssist_FFPE_Human_Breast_Cancer_full_image (Images), CytAssist_FFPE_Human_Breast_Cancer_hires_image (Images), CytAssist_FFPE_Human_Breast_Cancer_lowres_image (Images), CytAssist_FFPE_Human_Breast_Cancer (Shapes)
visium_curator = ln.curators.SpatialDataCurator(visium_aligned_sdata, spatial_schema)
try:
visium_curator.validate()
except ln.errors.ValidationError as e:
print(e)
Show code cell output
! 17 terms are not validated: 'ENSG00000284824', 'ENSG00000240224', 'ENSG00000243135', 'ENSG00000112096', 'ENSG00000285162', 'ENSG00000183729', 'ENSG00000285447', 'ENSG00000130723', 'ENSG00000274897', 'ENSG00000215271', 'ENSG00000221995', 'ENSG00000183791', 'ENSG00000263264', 'ENSG00000182584', 'ENSG00000184258', 'ENSG00000277203', 'ENSG00000286265'
→ fix typos, remove non-existent values, or save terms via .add_new_from("columns")
visium_curator.slots["table:var"].cat.add_new_from("columns")
visium_curated_af = visium_curator.save_artifact(key="visium.zarr")
Show code cell output
INFO The SpatialData object is not self-contained (i.e. it contains some elements that are Dask-backed from
locations outside /home/runner/.cache/lamindb/4c3NwhMsXMS0GdVd0000.zarr). Please see the documentation of
`is_self_contained()` to understand the implications of working with SpatialData objects that are not
self-contained.
INFO The Zarr backing store has been changed from
/home/runner/.cache/lamindb/lamindata/visium_aligned_guide_min.zarr the new file path:
/home/runner/.cache/lamindb/4c3NwhMsXMS0GdVd0000.zarr
→ returning existing schema with same hash: Schema(uid='V1Bw52ddb8UY6Ao0dHtV', name='Spatial sample level', n=4, itype='Feature', is_type=False, hash='K3ADJU0uMiuKGmn_E_LAHw', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=1, created_at=2025-04-18 11:45:45 UTC)
! 7 unique terms (100.00%) are not validated for name: 'in_tissue', 'array_row', 'array_col', 'spot_id', 'region', 'dataset', 'clone'
! no validated features, skip creating schema
visium_curated_af.describe()
Show code cell output
Artifact .zarr/SpatialData ├── General │ ├── .uid = '4c3NwhMsXMS0GdVd0000' │ ├── .key = 'visium.zarr' │ ├── .size = 5809684 │ ├── .hash = '1JoqBRVDICsVM8jXPwR7QA' │ ├── .n_files = 133 │ ├── .path = /home/runner/work/lamin-usecases/lamin-usecases/docs/test-spatial/.lamindb/4c3NwhMsXMS0GdVd.zarr │ ├── .created_by = testuser1 (Test User1) │ ├── .created_at = 2025-04-18 11:46:31 │ └── .transform = 'Spatial' ├── Dataset features │ ├── sample • 4 [Feature] │ │ assay cat[bionty.ExperimentalF… Visium Spatial Gene Expression │ │ disease cat[bionty.Disease] ductal breast carcinoma in situ │ │ organism cat[bionty.Organism] human │ │ tissue cat[bionty.Tissue] breast │ └── ['table'].var • 18085 [bionty.Gene] │ ABCC11 float │ ACTA2 float │ ACTG2 float │ ADAM9 float │ ADGRE5 float │ ADH1B float │ ADIPOQ float │ AGR3 float │ AHSP float │ AIF1 float │ AKR1C3 float │ ALDH1A3 float │ ANKRD28 float │ ANKRD29 float │ ANKRD30A float │ APOBEC3A float │ APOBEC3B float │ APOC1 float │ AQP1 float │ AQP3 float └── Labels └── .projects Project spatial guide datasets .organisms bionty.Organism human .tissues bionty.Tissue breast .diseases bionty.Disease ductal breast carcinoma in situ .experimental_factors bionty.ExperimentalFactor Visium Spatial Gene Expression
Overview of the curated datasets¶
visium_curated_af.view_lineage()
ln.Artifact.df(features=True, include=["hash", "size"])
uid | key | description | assay | disease | celltype_major | tissue | organism | size | hash | |
---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||
7 | 4c3NwhMsXMS0GdVd0000 | visium.zarr | None | {Visium Spatial Gene Expression} | {ductal breast carcinoma in situ} | NaN | {breast} | {human} | 5809684 | 1JoqBRVDICsVM8jXPwR7QA |
5 | 9i9RUtVUHWRoF9Je0000 | xenium2.zarr | None | {10x Xenium} | {ductal breast carcinoma in situ} | {myeloid cell, T cell, plasmablast, cancer ass... | {breast} | {human} | 40822346 | urst38Sf2bS5nj3RMES0Sw |
3 | Nbgkw6rtcBS7FhZk0000 | xenium1.zarr | None | {10x Xenium} | {ductal breast carcinoma in situ} | {myeloid cell, T cell, plasmablast, cancer ass... | {breast} | {human} | 35115343 | 4vw9wN84jSk1iUUN0hFMEg |
1 | aWDakDvmJ3W6DEKl0000 | example_blobs.zarr | None | NaN | NaN | NaN | NaN | NaN | 12121376 | VfyqWKmYtGl46BehCw_UQw |
4 | KFhRNPqcdoxBCNZt0001 | xenium_aligned_2_guide_min.zarr | None | NaN | NaN | NaN | NaN | NaN | 40822308 | oH569Lh4koYRB1I6AatnGQ |
2 | kVMuYil81BHTwQ9G0001 | xenium_aligned_1_guide_min.zarr | None | NaN | NaN | NaN | NaN | NaN | 35115305 | 8f1qC6IkpSvFw2H8TdhplQ |
6 | bjH534dxVi1drmLZ0001 | visium_aligned_guide_min.zarr | None | NaN | NaN | NaN | NaN | NaN | 5809684 | a8rVkf_kjp9To9KI06i03g |
ln.finish()
Show code cell output
→ finished Run('DUOkj2vH') after 52s at 2025-04-18 11:46:34 UTC