Jupyter Notebook

Query and analyze spatial data

After having created a SpatialData collection, we briefly discuss how to query and analyze spatial data.

import lamindb as ln
import bionty as bt
import squidpy as sq
import scanpy as sc
import spatialdata_plot
import warnings

warnings.filterwarnings("ignore")

ln.track(project="spatial guide datasets")
Hide code cell output
 connected lamindb: testuser1/test-spatial
/opt/hostedtoolcache/Python/3.12.10/x64/lib/python3.12/site-packages/leidenalg/VertexPartition.py:413: SyntaxWarning: invalid escape sequence '\m'
  .. math:: Q = \\frac{1}{m} \\sum_{ij} \\left(A_{ij} - \\frac{k_i^\mathrm{out} k_j^\mathrm{in}}{m} \\right)\\delta(\\sigma_i, \\sigma_j),
/opt/hostedtoolcache/Python/3.12.10/x64/lib/python3.12/site-packages/leidenalg/VertexPartition.py:788: SyntaxWarning: invalid escape sequence '\m'
  .. math:: Q = \\sum_{ij} \\left(A_{ij} - \\gamma \\frac{k_i^\mathrm{out} k_j^\mathrm{in}}{m} \\right)\\delta(\\sigma_i, \\sigma_j),
/opt/hostedtoolcache/Python/3.12.10/x64/lib/python3.12/site-packages/leidenalg/Optimiser.py:27: SyntaxWarning: invalid escape sequence '\g'
  implementation therefore does not guarantee subpartition :math:`\gamma`-density.
/opt/hostedtoolcache/Python/3.12.10/x64/lib/python3.12/site-packages/leidenalg/Optimiser.py:346: SyntaxWarning: invalid escape sequence '\s'
  .. math:: Q = \sum_k \\lambda_k Q_k.
/opt/hostedtoolcache/Python/3.12.10/x64/lib/python3.12/site-packages/anndata/utils.py:434: FutureWarning: Importing read_text from `anndata` is deprecated. Import anndata.io.read_text instead.
  warnings.warn(msg, FutureWarning)
 created Transform('FtYAobcBHP4c0000'), started new Run('kNRE0J8p...') at 2025-04-18 11:46:53 UTC
 notebook imports: bionty==1.3.0 lamindb==1.4.0 scanpy==1.11.1 spatialdata-plot==0.2.9 squidpy==1.6.5

Query a SpatialData Collection

By provenance metadata

Query the transform, e.g., by key:

transform = ln.Transform.get(key="spatial.ipynb")
transform
Hide code cell output
Transform(uid='cKAZGIOEX1NM0000', is_latest=True, key='spatial.ipynb', description='Spatial', type='notebook', hash='0r5jwbkkEa40exmY0q-17A', space_id=1, created_by_id=1, created_at=2025-04-18 11:45:41 UTC)

Query the artifact:

ln.Artifact.filter(transform=transform).df()
Hide code cell output
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux _branch_code
id
1 aWDakDvmJ3W6DEKl0000 example_blobs.zarr None .zarr dataset SpatialData 12121376 VfyqWKmYtGl46BehCw_UQw 113 None md5-d True True 1 1 NaN None True 1 2025-04-18 11:45:44.553000+00:00 1 None 1
3 Nbgkw6rtcBS7FhZk0000 xenium1.zarr None .zarr dataset SpatialData 35115343 4vw9wN84jSk1iUUN0hFMEg 145 None md5-d True True 1 1 4.0 None True 1 2025-04-18 11:45:59.921000+00:00 1 None 1
5 9i9RUtVUHWRoF9Je0000 xenium2.zarr None .zarr dataset SpatialData 40822346 urst38Sf2bS5nj3RMES0Sw 174 None md5-d True True 1 1 4.0 None True 1 2025-04-18 11:46:11.299000+00:00 1 None 1
7 4c3NwhMsXMS0GdVd0000 visium.zarr None .zarr dataset SpatialData 5809684 1JoqBRVDICsVM8jXPwR7QA 133 None md5-d True True 1 1 4.0 None True 1 2025-04-18 11:46:31.207000+00:00 1 None 1

By biological metadata

Spatial data stored in SpatialData format and curated with the SpatialDataCurator can easily be queried by the annotated features and labels. Although, we curated specific slots of SpatialData Artifacts, the labels are attached directly to the Artifact:

experimental_factors = bt.ExperimentalFactor.lookup()

# 10x xenium has a ln_ prefix because Python does not support numbers as attributes
all_xenium_data = ln.Artifact.filter(
    experimental_factors__name=experimental_factors.ln_10x_xenium
)
all_xenium_data.df()
Hide code cell output
uid id key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux _branch_code

Inspect artifact metadata

Query all artifacts that measured the “celltype_major” feature:

# Only returns the Xenium datasets as the Visium dataset did not have annotated cell types
query_set = ln.Artifact.filter(feature_sets__features__name="celltype_major").all()
xenium_1_af, xenium_2_af = query_set[0], query_set[1]
xenium_1_af.describe()
Hide code cell output
Artifact .zarr/SpatialData
├── General
│   ├── .uid = 'Nbgkw6rtcBS7FhZk0000'
│   ├── .key = 'xenium1.zarr'
│   ├── .size = 35115343
│   ├── .hash = '4vw9wN84jSk1iUUN0hFMEg'
│   ├── .n_files = 145
│   ├── .path = /home/runner/work/lamin-usecases/lamin-usecases/docs/test-spatial/.lamindb/Nbgkw6rtcBS7FhZk.zarr
│   ├── .created_by = testuser1 (Test User1)
│   ├── .created_at = 2025-04-18 11:45:59
│   └── .transform = 'Spatial'
├── Dataset features
│   ├── sample4                  [Feature]                                                           
│   │   assay                       cat[bionty.ExperimentalF…  10x Xenium                               
│   │   disease                     cat[bionty.Disease]        ductal breast carcinoma in situ          
│   │   organism                    cat[bionty.Organism]       human                                    
│   │   tissue                      cat[bionty.Tissue]         breast                                   
│   ├── ['table'].var313         [bionty.Gene]                                                       
│   │   ABCC11                      float                                                               
│   │   ACTA2                       float                                                               
│   │   ACTG2                       float                                                               
│   │   ADAM9                       float                                                               
│   │   ADGRE5                      float                                                               
│   │   ADH1B                       float                                                               
│   │   ADIPOQ                      float                                                               
│   │   AGR3                        float                                                               
│   │   AHSP                        float                                                               
│   │   AIF1                        float                                                               
│   │   AKR1C1                      float                                                               
│   │   AKR1C3                      float                                                               
│   │   ALDH1A3                     float                                                               
│   │   ANGPT2                      float                                                               
│   │   ANKRD28                     float                                                               
│   │   ANKRD29                     float                                                               
│   │   ANKRD30A                    float                                                               
│   │   APOBEC3A                    float                                                               
│   │   APOBEC3B                    float                                                               
│   │   APOC1                       float                                                               
│   └── ['table'].obs1           [Feature]                                                           
celltype_major              cat[bionty.CellType]       B cell, T cell, cancer associated fibrob…
└── Labels
    └── .projects                   Project                    spatial guide datasets                   
        .organisms                  bionty.Organism            human                                    
        .tissues                    bionty.Tissue              breast                                   
        .cell_types                 bionty.CellType            endothelial cell, myeloid cell, perivasc…
        .diseases                   bionty.Disease             ductal breast carcinoma in situ          
        .experimental_factors       bionty.ExperimentalFactor  10x Xenium                               
xenium_1_af.view_lineage()
Hide code cell output
_images/5b2eea5ea14de5425bb2817d8995bdcca0b5c4b8b1e73e54e9687b93790f79b5.svg
xenium_2_af.describe()
Hide code cell output
Artifact .zarr/SpatialData
├── General
│   ├── .uid = '9i9RUtVUHWRoF9Je0000'
│   ├── .key = 'xenium2.zarr'
│   ├── .size = 40822346
│   ├── .hash = 'urst38Sf2bS5nj3RMES0Sw'
│   ├── .n_files = 174
│   ├── .path = /home/runner/work/lamin-usecases/lamin-usecases/docs/test-spatial/.lamindb/9i9RUtVUHWRoF9Je.zarr
│   ├── .created_by = testuser1 (Test User1)
│   ├── .created_at = 2025-04-18 11:46:11
│   └── .transform = 'Spatial'
├── Dataset features
│   ├── sample4                  [Feature]                                                           
│   │   assay                       cat[bionty.ExperimentalF…  10x Xenium                               
│   │   disease                     cat[bionty.Disease]        ductal breast carcinoma in situ          
│   │   organism                    cat[bionty.Organism]       human                                    
│   │   tissue                      cat[bionty.Tissue]         breast                                   
│   ├── ['table'].var313         [bionty.Gene]                                                       
│   │   ABCC11                      float                                                               
│   │   ACTA2                       float                                                               
│   │   ACTG2                       float                                                               
│   │   ADAM9                       float                                                               
│   │   ADGRE5                      float                                                               
│   │   ADH1B                       float                                                               
│   │   ADIPOQ                      float                                                               
│   │   AGR3                        float                                                               
│   │   AHSP                        float                                                               
│   │   AIF1                        float                                                               
│   │   AKR1C1                      float                                                               
│   │   AKR1C3                      float                                                               
│   │   ALDH1A3                     float                                                               
│   │   ANGPT2                      float                                                               
│   │   ANKRD28                     float                                                               
│   │   ANKRD29                     float                                                               
│   │   ANKRD30A                    float                                                               
│   │   APOBEC3A                    float                                                               
│   │   APOBEC3B                    float                                                               
│   │   APOC1                       float                                                               
│   └── ['table'].obs1           [Feature]                                                           
celltype_major              cat[bionty.CellType]       B cell, T cell, cancer associated fibrob…
└── Labels
    └── .projects                   Project                    spatial guide datasets                   
        .organisms                  bionty.Organism            human                                    
        .tissues                    bionty.Tissue              breast                                   
        .cell_types                 bionty.CellType            endothelial cell, myeloid cell, perivasc…
        .diseases                   bionty.Disease             ductal breast carcinoma in situ          
        .experimental_factors       bionty.ExperimentalFactor  10x Xenium                               
xenium_2_af.view_lineage()
Hide code cell output
_images/38458aca3d65bb9e91fd6a6817d9e51ebc1156710f651bf57bdcae04fa1f0eee.svg

Analyze spatial data

Spatial data datasets stored as SpatialData objects can easily be examined and analyzed through the SpatialData framework, squidpy, and scanpy:

xenium_1_sd = xenium_1_af.load()
xenium_1_sd
Hide code cell output
SpatialData object, with associated Zarr store: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-spatial/.lamindb/Nbgkw6rtcBS7FhZk.zarr
├── Images
│     ├── 'morphology_focus': DataTree[cyx] (1, 2310, 3027), (1, 1155, 1514), (1, 578, 757), (1, 288, 379), (1, 145, 189)
│     └── 'morphology_mip': DataTree[cyx] (1, 2310, 3027), (1, 1155, 1514), (1, 578, 757), (1, 288, 379), (1, 145, 189)
├── Points
│     └── 'transcripts': DataFrame with shape: (<Delayed>, 8) (3D points)
├── Shapes
│     ├── 'cell_boundaries': GeoDataFrame shape: (1899, 1) (2D shapes)
│     └── 'cell_circles': GeoDataFrame shape: (1812, 2) (2D shapes)
└── Tables
      └── 'table': AnnData (1812, 313)
with coordinate systems:
    ▸ 'aligned', with elements:
        morphology_focus (Images), morphology_mip (Images), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes)
    ▸ 'global', with elements:
        morphology_focus (Images), morphology_mip (Images), transcripts (Points), cell_boundaries (Shapes), cell_circles (Shapes)

Use spatialdata-plot to get an overview of the dataset:

xenium_1_sd.pl.render_images(element="morphology_focus").pl.render_shapes(
    fill_alpha=0, outline_alpha=0.2
).pl.show(coordinate_systems="aligned")
_images/5d21e9b4de1cd507375ff8baec60f6eae102e6b5f81044dce6159c9bb9493d2d.png

For any Xenium analysis we would use the AnnData object, which contains the count matrix, cell and gene annotations. It is stored in the spatialdata.tables slot:

xenium_adata = xenium_1_sd.tables["table"]
xenium_adata
Hide code cell output
AnnData object with n_obs × n_vars = 1812 × 313
    obs: 'cell_id', 'transcript_counts', 'control_probe_counts', 'control_codeword_counts', 'total_counts', 'cell_area', 'nucleus_area', 'region', 'dataset', 'celltype_major', 'celltype_minor'
    var: 'symbols', 'feature_types', 'genome'
    uns: 'spatialdata_attrs'
    obsm: 'spatial'
xenium_adata.obs
Hide code cell output
cell_id transcript_counts control_probe_counts control_codeword_counts total_counts cell_area nucleus_area region dataset celltype_major celltype_minor
92782 92783 271 1 0 272 401.484219 27.048594 cell_circles xe_rep1 cancer associated fibroblast CAFs myCAF-like
92783 92784 110 0 0 110 163.826875 21.900781 cell_circles xe_rep1 cancer associated fibroblast CAFs myCAF-like
92784 92785 158 1 0 159 262.583594 7.225000 cell_circles xe_rep1 cancer associated fibroblast CAFs myCAF-like
92785 92786 236 3 0 239 512.207344 17.701250 cell_circles xe_rep1 cancer associated fibroblast CAFs myCAF-like
92786 92787 133 0 0 133 361.250000 20.997656 cell_circles xe_rep1 endothelial cell Endothelial Lymphatic LYVE1
... ... ... ... ... ... ... ... ... ... ... ...
95912 95913 138 0 0 138 317.358125 29.125781 cell_circles xe_rep1 T cell T cells CD4+
95913 95914 148 0 0 148 174.393438 21.404063 cell_circles xe_rep1 T cell T cells CD8+
95914 95915 152 0 0 152 275.724063 31.609375 cell_circles xe_rep1 cancer associated fibroblast CAFs myCAF-like
95915 95916 125 0 0 125 121.921875 28.222656 cell_circles xe_rep1 T cell T cells CD4+
95916 95917 135 0 0 135 115.374219 13.862969 cell_circles xe_rep1 myeloid cell Macrophage

1812 rows × 11 columns

Calculate the quality control metrics on the AnnData object using scanpy.pp.calculate_qc_metrics:

sc.pp.calculate_qc_metrics(xenium_adata, percent_top=(10, 20, 50, 150), inplace=True)

The percentage of control probes and control codewords can be calculated from the obs slot:

cprobes = (
    xenium_adata.obs["control_probe_counts"].sum()
    / xenium_adata.obs["total_counts"].sum()
    * 100
)
cwords = (
    xenium_adata.obs["control_codeword_counts"].sum()
    / xenium_adata.obs["total_counts"].sum()
    * 100
)
print(f"Negative DNA probe count % : {cprobes}")
print(f"Negative decoding count % : {cwords}")
Hide code cell output
Negative DNA probe count % : 0.07469165751640662
Negative decoding count % : 0.004468731646280738

Visualize annotation on UMAP and spatial coordinates:

xenium_adata.layers["counts"] = xenium_adata.X.copy()
sc.pp.normalize_total(xenium_adata, inplace=True)
sc.pp.log1p(xenium_adata)
sc.pp.pca(xenium_adata)
sc.pp.neighbors(xenium_adata)
sc.tl.umap(xenium_adata)
sc.tl.leiden(xenium_adata)
sc.pl.umap(
    xenium_adata,
    color=[
        "total_counts",
        "n_genes_by_counts",
        "leiden",
    ],
    wspace=0.4,
)
_images/bda022cbc63981ca42f482c7f158b7834e0f2bc24d99a062d311a6da10f0d9bd.png
sq.pl.spatial_scatter(
    xenium_adata,
    library_id="spatial",
    shape=None,
    color=[
        "leiden",
    ],
    wspace=0.4,
)
_images/7954ada9a3c69d8a1a1e2e15e8c5ca1bf97eeabd581c6843099a8eb5c10c720f.png

For a full tutorial on how to perform analysis of Xenium data, we refer to squidpy’s Xenium tutorial.

ln.finish()
Hide code cell output
 finished Run('kNRE0J8p') after 40s at 2025-04-18 11:47:34 UTC