scrna3/6 Jupyter Notebook lamindata

Query artifacts

Here, we’ll query artifacts and inspect their metadata.

This guide can be skipped if you are only interested in how to leverage the overall collection.

import lamindb as ln
import bionty as bt

ln.track("agayZTonayqA0000")
Hide code cell output
 connected lamindb: testuser1/test-scrna
 created Transform('agayZTonayqA0000'), started new Run('N8EprJIO...') at 2025-01-20 07:36:18 UTC
 notebook imports: bionty==1.0.0 lamindb==1.0.2

Query artifacts by provenance metadata

Query the transform, e.g., by uid:

transform = ln.Transform.get(uid="Nv48yAceNSh80003")

Query the artifact:

ln.Artifact.filter(transform=transform).df()
Hide code cell output
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux _branch_code
id
1 heySREilI158XLST0000 None Human immune cells from Conde22 .h5ad dataset AnnData 57612943 t_YJQpYrAyAGhs7Ir68zKj None 1648 sha1-fl True False 1 1 None None True 1 2025-01-20 07:35:53.126000+00:00 1 None 1

Query artifacts by biological metadata

tissues = bt.Tissue.lookup()

query = ln.Artifact.filter(
    tissues=tissues.blood,
)
query.df()
Hide code cell output
uid key description suffix kind otype size hash n_files n_observations _hash_type _key_is_virtual _overwrite_versions space_id storage_id schema_id version is_latest run_id created_at created_by_id _aux _branch_code
id
1 heySREilI158XLST0000 None Human immune cells from Conde22 .h5ad dataset AnnData 57612943 t_YJQpYrAyAGhs7Ir68zKj None 1648 sha1-fl True False 1 1 None None True 1 2025-01-20 07:35:53.126000+00:00 1 None 1

Inspect artifact metadata

Query all artifacts that measured the “cell_type” feature:

query_set = ln.Artifact.filter(feature_sets__features__name="cell_type").all()
artifact1, artifact2 = query_set[0], query_set[1]
artifact1.describe()
Hide code cell output
Artifact .h5ad/AnnData
├── General
│   ├── .uid = 'heySREilI158XLST0000'
│   ├── .size = 57612943
│   ├── .hash = 't_YJQpYrAyAGhs7Ir68zKj'
│   ├── .n_observations = 1648
│   ├── .path = /home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna/.lamindb/heySREilI158XLST0000.h5ad
│   ├── .created_by = testuser1 (Test User1)
│   ├── .created_at = 2025-01-20 07:35:53
│   └── .transform = 'scRNA-seq'
├── Dataset features/._schemas_m2m
│   ├── var36503                 [bionty.Gene]                                                       
│   │   MIR1302-2HG                 float                                                               
│   │   FAM138A                     float                                                               
│   │   OR4F5                       float                                                               
│   │   OR4F29                      float                                                               
│   │   OR4F16                      float                                                               
│   │   LINC01409                   float                                                               
│   │   FAM87B                      float                                                               
│   │   LINC01128                   float                                                               
│   │   LINC00115                   float                                                               
│   │   FAM41C                      float                                                               
│   └── obs4                     [Feature]                                                           
assay                       cat[bionty.ExperimentalF…  10x 3' v3, 10x 5' v1, 10x 5' v2          
cell_type                   cat[bionty.CellType]       CD16-negative, CD56-bright natural kille…
donor                       cat[ULabel]                582C, 621B, 637C, 640C, A29, A31, A35, A…
tissue                      cat[bionty.Tissue]         blood, bone marrow, caecum, duodenum, il…
└── Labels
    └── .tissues                    bionty.Tissue              jejunal epithelium, duodenum, caecum, bl…
        .cell_types                 bionty.CellType            megakaryocyte, T follicular helper cell,…
        .experimental_factors       bionty.ExperimentalFactor  10x 5' v1, 10x 5' v2, 10x 3' v3          
        .ulabels                    ULabel                     637C, D503, 640C, A37, A35, D496, 621B, …
artifact1.view_lineage()
Hide code cell output
_images/1b7807934f1efc63b834b51da34bfc12df321a385634722e632cfb2e82a2eb6b.svg
artifact2.describe()
Hide code cell output
Artifact .h5ad/AnnData
├── General
│   ├── .uid = 'Xl9wc1xjhWAhOdVJ0001'
│   ├── .size = 857336
│   ├── .hash = 'GK721a-L-fGDI8kXefKMtA'
│   ├── .n_observations = 70
│   ├── .path = /home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna/.lamindb/Xl9wc1xjhWAhOdVJ0001.h5ad
│   ├── .created_by = testuser1 (Test User1)
│   ├── .created_at = 2025-01-20 07:36:13
│   └── .transform = 'Standardize and append a dataset'
├── Dataset features/._schemas_m2m
│   ├── var754                   [bionty.Gene]                                                       
│   │   HES4                        float                                                               
│   │   TNFRSF4                     float                                                               
│   │   SSU72                       float                                                               
│   │   PARK7                       float                                                               
│   │   RBP7                        float                                                               
│   │   SRM                         float                                                               
│   │   MAD2L2                      float                                                               
│   │   AGTRAP                      float                                                               
│   │   TNFRSF1B                    float                                                               
│   │   EFHD2                       float                                                               
│   │   NECAP2                      float                                                               
│   │   HP1BP3                      float                                                               
│   │   C1QA                        float                                                               
│   │   C1QB                        float                                                               
│   │   HNRNPR                      float                                                               
│   │   GALE                        float                                                               
│   │   STMN1                       float                                                               
│   │   CD52                        float                                                               
│   │   FGR                         float                                                               
│   │   ATP5IF1                     float                                                               
│   └── obs2                     [Feature]                                                           
cell_type                   cat[bionty.CellType]       B cell, CD19-positive, CD14-positive mon…
cell_type_untrusted         cat[bionty.CellType]       B cell, CD19-positive, CD14-positive mon…
└── Labels
    └── .cell_types                 bionty.CellType            CD8-positive, alpha-beta memory T cell, …
artifact2.view_lineage()
Hide code cell output
_images/46c5f8824401fa934b9fd0ee6a7508b6721d0c9e8d17e591fd32cfa01f0f53fa.svg

Compare features

Here we compute shared genes:

artifact1_genes = artifact1.features["var"]
artifact2_genes = artifact2.features["var"]

shared_genes = artifact1_genes & artifact2_genes
len(shared_genes)
Hide code cell output
749
shared_genes.list("symbol")[:10]
Hide code cell output
['HES4',
 'TNFRSF4',
 'SSU72',
 'PARK7',
 'RBP7',
 'SRM',
 'MAD2L2',
 'AGTRAP',
 'TNFRSF1B',
 'EFHD2']

Compare cell types

artifact1_celltypes = artifact1.cell_types.all()
artifact2_celltypes = artifact2.cell_types.all()

shared_celltypes = artifact1_celltypes & artifact2_celltypes
shared_celltypes_names = shared_celltypes.list("name")
shared_celltypes_names
Hide code cell output
['CD8-positive, alpha-beta memory T cell, CD45RO-positive',
 'CD8-positive, alpha-beta memory T cell, CD45RO-positive']

Load the individual artifacts

We could either load the artifacts into memory or access them in backed mode through .open() to lazily load their content.

Let’s load them into memory:

adata1 = artifact1.load()
adata2 = artifact2.load()

We can now subset the two collections by shared cell types:

adata2
AnnData object with n_obs × n_vars = 70 × 754
    obs: 'cell_type_untrusted', 'n_genes', 'percent_mito', 'louvain', 'cell_type_untrusted_original', 'cell_type'
    var: 'symbol', 'n_counts', 'highly_variable'
    uns: 'louvain', 'louvain_colors', 'neighbors', 'pca'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    obsp: 'connectivities', 'distances'
adata1_subset = adata1[adata1.obs["cell_type"].isin(shared_celltypes_names)]
adata2_subset = adata2[adata2.obs["cell_type"].isin(shared_celltypes_names)]