Jupyter Notebook

Query and analyze spatial data

After having created a SpatialData collection, we briefly discuss how to query and analyze spatial data.

import lamindb as ln
import bionty as bt
import squidpy as sq
import scanpy as sc
import spatialdata_plot
import warnings

warnings.filterwarnings("ignore")

ln.track(project="spatial guide datasets")
Hide code cell output
 connected lamindb: testuser1/test-spatial
/opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/site-packages/xarray_schema/__init__.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  from pkg_resources import DistributionNotFound, get_distribution
/opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/site-packages/leidenalg/VertexPartition.py:413: SyntaxWarning: invalid escape sequence '\m'
  .. math:: Q = \\frac{1}{m} \\sum_{ij} \\left(A_{ij} - \\frac{k_i^\mathrm{out} k_j^\mathrm{in}}{m} \\right)\\delta(\\sigma_i, \\sigma_j),
/opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/site-packages/leidenalg/VertexPartition.py:788: SyntaxWarning: invalid escape sequence '\m'
  .. math:: Q = \\sum_{ij} \\left(A_{ij} - \\gamma \\frac{k_i^\mathrm{out} k_j^\mathrm{in}}{m} \\right)\\delta(\\sigma_i, \\sigma_j),
/opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/site-packages/leidenalg/Optimiser.py:27: SyntaxWarning: invalid escape sequence '\g'
  implementation therefore does not guarantee subpartition :math:`\gamma`-density.
/opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/site-packages/leidenalg/Optimiser.py:346: SyntaxWarning: invalid escape sequence '\s'
  .. math:: Q = \sum_k \\lambda_k Q_k.
/opt/hostedtoolcache/Python/3.12.11/x64/lib/python3.12/site-packages/anndata/__init__.py:44: FutureWarning: Importing read_text from `anndata` is deprecated. Import anndata.io.read_text instead.
  return module_get_attr_redirect(attr_name, deprecated_mapping=_DEPRECATED)
 created Transform('j0gdiU9PjDew0000', key='spatial2.ipynb'), started new Run('UgSw0biGuFt8Z149') at 2025-10-27 08:28:44 UTC
 notebook imports: bionty==1.8.1 lamindb==1.14a1 scanpy==1.11.5 spatialdata-plot==0.2.12 squidpy==1.6.5
 recommendation: to identify the notebook across renames, pass the uid: ln.track("j0gdiU9PjDew", project="spatial guide datasets")

Query by data lineage

Query the transform, e.g., by key:

transform = ln.Transform.get(key="spatial.ipynb")
transform
Hide code cell output
Transform(uid='vq3bMXCZZxS50000', version=None, is_latest=True, key='spatial.ipynb', description='Spatial', type='notebook', hash='JGB3PlXvSOkCeBoYn6lp2w', reference=None, reference_type=None, branch_id=1, space_id=1, created_by_id=1, created_at=2025-10-27 08:27:45 UTC, is_locked=False)

Query the artifacts:

ln.Artifact.filter(transform=transform).to_dataframe()
Hide code cell output
uid key description suffix kind otype size hash n_files n_observations version is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
7 bGvOJEbvJaA67Xjv0000 visium.zarr None .zarr dataset SpatialData 5810515 WNbsIuuiLw965rs_YOT2Qw 136 None None True False 2025-10-27 08:28:23.309000+00:00 1 1 1 1 3.0 1
5 yly8yT4kNV5mplTA0000 xenium2.zarr None .zarr dataset SpatialData 40823410 noI1oD6jyNbhK3yysxHjAw 177 None None True False 2025-10-27 08:28:09.807000+00:00 1 1 1 1 3.0 1
3 J3tzZDmJYJVL1sdh0000 xenium1.zarr None .zarr dataset SpatialData 35116259 68_dzfaDidoKacs0glIGNg 148 None None True False 2025-10-27 08:28:05.740000+00:00 1 1 1 1 3.0 1
1 B9NnVk2zs9W1G2lb0000 example_blobs.zarr None .zarr dataset SpatialData 12122461 LZ9HLHkw8HhoOzS7Vt_VSg 116 None None True False 2025-10-27 08:27:47.562000+00:00 1 1 1 1 NaN 1

Query by biological metadata

Query all visium datasets.

all_xenium_data = ln.Artifact.filter(experimental_factors__name="10x Xenium")
all_xenium_data.to_dataframe()
Hide code cell output
uid key description suffix kind otype size hash n_files n_observations version is_latest is_locked created_at branch_id space_id storage_id run_id schema_id created_by_id
id
5 yly8yT4kNV5mplTA0000 xenium2.zarr None .zarr dataset SpatialData 40823410 noI1oD6jyNbhK3yysxHjAw 177 None None True False 2025-10-27 08:28:09.807000+00:00 1 1 1 1 3 1
3 J3tzZDmJYJVL1sdh0000 xenium1.zarr None .zarr dataset SpatialData 35116259 68_dzfaDidoKacs0glIGNg 148 None None True False 2025-10-27 08:28:05.740000+00:00 1 1 1 1 3 1

Query all artifacts that measured the “celltype_major” feature:

# Only returns the Xenium datasets as the Visium dataset did not have annotated cell types
feature_cell_type_major = ln.Feature.get(name="celltype_major")
query_set = ln.Artifact.filter(feature_sets__features=feature_cell_type_major).all()
xenium_1_af, xenium_2_af = query_set[0], query_set[1]
xenium_1_af.describe()
Hide code cell output
Artifact: xenium1.zarr (0000)
├── uid: J3tzZDmJYJVL1sdh0000            run: ASPVZJL (spatial.ipynb)
kind: dataset                        otype: SpatialData          
hash: 68_dzfaDidoKacs0glIGNg         size: 33.5 MB               
branch: main                         space: all                  
created_at: 2025-10-27 08:28:05 UTC  created_by: testuser1       
n_files: 148                                                     
├── storage/path: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-spatial/.lamindb/J3tzZDmJYJVL1sdh.zarr
├── Dataset features
├── attrs:sample (4)                                                                                           
│   assay                           bionty.ExperimentalFactor          10x Xenium                              
│   disease                         bionty.Disease                     ductal breast carcinoma in situ         
│   organism                        bionty.Organism                    human                                   
│   tissue                          bionty.Tissue                      breast                                  
├── tables:table:obs (1)                                                                                       
│   celltype_major                  bionty.CellType                    B cell, T cell, cancer associated fibro…
└── tables:table:var.T (313 biont…                                                                             
    ABCC11                          num                                                                        
    ACTA2                           num                                                                        
    ACTG2                           num                                                                        
    ADAM9                           num                                                                        
    ADGRE5                          num                                                                        
    ADH1B                           num                                                                        
    ADIPOQ                          num                                                                        
    AGR3                            num                                                                        
    AHSP                            num                                                                        
    AIF1                            num                                                                        
    AKR1C1                          num                                                                        
    AKR1C3                          num                                                                        
    ALDH1A3                         num                                                                        
    ANGPT2                          num                                                                        
    ANKRD28                         num                                                                        
    ANKRD29                         num                                                                        
    ANKRD30A                        num                                                                        
    APOBEC3A                        num                                                                        
    APOBEC3B                        num                                                                        
    APOC1                           num                                                                        
└── Labels
    └── .projects                       Project                            spatial guide datasets                  
        .organisms                      bionty.Organism                    human                                   
        .tissues                        bionty.Tissue                      breast                                  
        .cell_types                     bionty.CellType                    endothelial cell, myeloid cell, perivas…
        .diseases                       bionty.Disease                     ductal breast carcinoma in situ         
        .experimental_factors           bionty.ExperimentalFactor          10x Xenium                              
xenium_1_af.view_lineage()
Hide code cell output
_images/ee999c61f4695eefe14a182fd8d156ec6cc358fae9f0c5fcb08c742155734118.svg
xenium_2_af.describe()
Hide code cell output
Artifact: xenium2.zarr (0000)
├── uid: yly8yT4kNV5mplTA0000            run: ASPVZJL (spatial.ipynb)
kind: dataset                        otype: SpatialData          
hash: noI1oD6jyNbhK3yysxHjAw         size: 38.9 MB               
branch: main                         space: all                  
created_at: 2025-10-27 08:28:09 UTC  created_by: testuser1       
n_files: 177                                                     
├── storage/path: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-spatial/.lamindb/yly8yT4kNV5mplTA.zarr
├── Dataset features
├── attrs:sample (4)                                                                                           
│   assay                           bionty.ExperimentalFactor          10x Xenium                              
│   disease                         bionty.Disease                     ductal breast carcinoma in situ         
│   organism                        bionty.Organism                    human                                   
│   tissue                          bionty.Tissue                      breast                                  
├── tables:table:obs (1)                                                                                       
│   celltype_major                  bionty.CellType                    B cell, T cell, cancer associated fibro…
└── tables:table:var.T (313 biont…                                                                             
    ABCC11                          num                                                                        
    ACTA2                           num                                                                        
    ACTG2                           num                                                                        
    ADAM9                           num                                                                        
    ADGRE5                          num                                                                        
    ADH1B                           num                                                                        
    ADIPOQ                          num                                                                        
    AGR3                            num                                                                        
    AHSP                            num                                                                        
    AIF1                            num                                                                        
    AKR1C1                          num                                                                        
    AKR1C3                          num                                                                        
    ALDH1A3                         num                                                                        
    ANGPT2                          num                                                                        
    ANKRD28                         num                                                                        
    ANKRD29                         num                                                                        
    ANKRD30A                        num                                                                        
    APOBEC3A                        num                                                                        
    APOBEC3B                        num                                                                        
    APOC1                           num                                                                        
└── Labels
    └── .projects                       Project                            spatial guide datasets                  
        .organisms                      bionty.Organism                    human                                   
        .tissues                        bionty.Tissue                      breast                                  
        .cell_types                     bionty.CellType                    endothelial cell, myeloid cell, perivas…
        .diseases                       bionty.Disease                     ductal breast carcinoma in situ         
        .experimental_factors           bionty.ExperimentalFactor          10x Xenium                              
xenium_2_af.view_lineage()
Hide code cell output
_images/ada8d6441dbbff98abacfde4b419d577a068ce9865b51bbd53306abe1f1eee0c.svg

Analyze spatial data

Spatial data datasets stored as SpatialData objects can easily be examined and analyzed through the SpatialData framework, squidpy, and scanpy:

xenium_1_sd = xenium_1_af.load()
xenium_1_sd
Hide code cell output
version mismatch: detected: RasterFormatV02, requested: FormatV04
version mismatch: detected: RasterFormatV02, requested: FormatV04
SpatialData object, with associated Zarr store: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-spatial/.lamindb/J3tzZDmJYJVL1sdh.zarr
├── Images
│     ├── 'morphology_focus': DataTree[cyx] (1, 2310, 3027), (1, 1155, 1514), (1, 578, 757), (1, 288, 379), (1, 145, 189)
│     └── 'morphology_mip': DataTree[cyx] (1, 2310, 3027), (1, 1155, 1514), (1, 578, 757), (1, 288, 379), (1, 145, 189)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 8) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (1899, 1) (2D shapes)
│     └── 'cell_circles': GeoDataFrame shape: (1812, 2) (2D shapes)
└── Tables
      └── 'table': AnnData (1812, 313)
with coordinate systems:
    ▸ 'aligned', with elements:
        morphology_focus (Images), morphology_mip (Images), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes)
    ▸ 'global', with elements:
        morphology_focus (Images), morphology_mip (Images), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes)

Use spatialdata-plot to get an overview of the dataset:

xenium_1_sd.pl.render_images(element="morphology_focus").pl.render_shapes(
    fill_alpha=0, outline_alpha=0.2
).pl.show(coordinate_systems="aligned")
_images/3c09bf16b0658e5d3eb5f739e879b6f3a36d90c2d83c578b1c9d4d4bef16514a.png

For any Xenium analysis we would use the AnnData object, which contains the count matrix, cell and gene annotations. It is stored in the spatialdata.tables slot:

xenium_adata = xenium_1_sd.tables["table"]
xenium_adata
Hide code cell output
AnnData object with n_obs × n_vars = 1812 × 313
    obs: 'cell_id', 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'region', 'dataset', 'celltype_major', 'celltype_minor'
    var: 'symbols', 'feature_types', 'genome'
    uns: 'spatialdata_attrs'
    obsm: 'spatial'
xenium_adata.obs
Hide code cell output
cell_id transcript_counts control_probe_counts control_codeword_counts total_counts cell_area nucleus_area region dataset celltype_major celltype_minor
92782 92783 271 1 0 272 401.484219 27.048594 cell_circles xe_rep1 cancer associated fibroblast CAFs myCAF-like
92783 92784 110 0 0 110 163.826875 21.900781 cell_circles xe_rep1 cancer associated fibroblast CAFs myCAF-like
92784 92785 158 1 0 159 262.583594 7.225000 cell_circles xe_rep1 cancer associated fibroblast CAFs myCAF-like
92785 92786 236 3 0 239 512.207344 17.701250 cell_circles xe_rep1 cancer associated fibroblast CAFs myCAF-like
92786 92787 133 0 0 133 361.250000 20.997656 cell_circles xe_rep1 endothelial cell Endothelial Lymphatic LYVE1
... ... ... ... ... ... ... ... ... ... ... ...
95912 95913 138 0 0 138 317.358125 29.125781 cell_circles xe_rep1 T cell T cells CD4+
95913 95914 148 0 0 148 174.393438 21.404063 cell_circles xe_rep1 T cell T cells CD8+
95914 95915 152 0 0 152 275.724063 31.609375 cell_circles xe_rep1 cancer associated fibroblast CAFs myCAF-like
95915 95916 125 0 0 125 121.921875 28.222656 cell_circles xe_rep1 T cell T cells CD4+
95916 95917 135 0 0 135 115.374219 13.862969 cell_circles xe_rep1 myeloid cell Macrophage

1812 rows × 11 columns

Calculate the quality control metrics on the AnnData object using scanpy.pp.calculate_qc_metrics:

sc.pp.calculate_qc_metrics(xenium_adata, percent_top=(10, 20, 50, 150), inplace=True)

The percentage of control probes and control codewords can be calculated from the obs slot:

cprobes = (
    xenium_adata.obs["control_probe_counts"].sum()
    / xenium_adata.obs["total_counts"].sum()
    * 100
)
cwords = (
    xenium_adata.obs["control_codeword_counts"].sum()
    / xenium_adata.obs["total_counts"].sum()
    * 100
)
print(f"Negative DNA probe count % : {cprobes}")
print(f"Negative decoding count % : {cwords}")
Hide code cell output
Negative DNA probe count % : 0.07469165751640662
Negative decoding count % : 0.004468731646280738

Visualize annotation on UMAP and spatial coordinates:

xenium_adata.layers["counts"] = xenium_adata.X.copy()
sc.pp.normalize_total(xenium_adata, inplace=True)
sc.pp.log1p(xenium_adata)
sc.pp.pca(xenium_adata)
sc.pp.neighbors(xenium_adata)
sc.tl.umap(xenium_adata)
sc.tl.leiden(xenium_adata)
sc.pl.umap(
    xenium_adata,
    color=[
        "total_counts",
        "n_genes_by_counts",
        "leiden",
    ],
    wspace=0.4,
)
_images/71e15a34987220ad53fb9ca0d907e60f75cd1d228d3c9ec904dc781cac3ad4af.png
sq.pl.spatial_scatter(
    xenium_adata,
    library_id="spatial",
    shape=None,
    color=[
        "leiden",
    ],
    wspace=0.4,
)
_images/529e8e7de581e509ecb46b93aa2011440190b6913a50f43c162235ba006da6a0.png

For a full tutorial on how to perform analysis of Xenium data, we refer to squidpy’s Xenium tutorial.

ln.finish()
Hide code cell output
 finished Run('UgSw0biGuFt8Z149') after 40s at 2025-10-27 08:29:24 UTC