Query artifacts¶
Here, we’ll query artifacts and inspect their metadata.
This guide can be skipped if you are only interested in how to leverage the overall collection.
import lamindb as ln
import bionty as bt
ln.track("agayZTonayqA0000")
Show code cell output
→ connected lamindb: testuser1/test-scrna
→ created Transform('agayZTonayqA0000'), started new Run('N8EprJIO...') at 2025-01-20 07:36:18 UTC
→ notebook imports: bionty==1.0.0 lamindb==1.0.2
Query artifacts by provenance metadata¶
Query the transform, e.g., by uid
:
transform = ln.Transform.get(uid="Nv48yAceNSh80003")
Query the artifact:
ln.Artifact.filter(transform=transform).df()
Show code cell output
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
1 | heySREilI158XLST0000 | None | Human immune cells from Conde22 | .h5ad | dataset | AnnData | 57612943 | t_YJQpYrAyAGhs7Ir68zKj | None | 1648 | sha1-fl | True | False | 1 | 1 | None | None | True | 1 | 2025-01-20 07:35:53.126000+00:00 | 1 | None | 1 |
Query artifacts by biological metadata¶
tissues = bt.Tissue.lookup()
query = ln.Artifact.filter(
tissues=tissues.blood,
)
query.df()
Show code cell output
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | _branch_code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
1 | heySREilI158XLST0000 | None | Human immune cells from Conde22 | .h5ad | dataset | AnnData | 57612943 | t_YJQpYrAyAGhs7Ir68zKj | None | 1648 | sha1-fl | True | False | 1 | 1 | None | None | True | 1 | 2025-01-20 07:35:53.126000+00:00 | 1 | None | 1 |
Inspect artifact metadata¶
Query all artifacts that measured the “cell_type” feature:
query_set = ln.Artifact.filter(feature_sets__features__name="cell_type").all()
artifact1, artifact2 = query_set[0], query_set[1]
artifact1.describe()
Show code cell output
Artifact .h5ad/AnnData ├── General │ ├── .uid = 'heySREilI158XLST0000' │ ├── .size = 57612943 │ ├── .hash = 't_YJQpYrAyAGhs7Ir68zKj' │ ├── .n_observations = 1648 │ ├── .path = /home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna/.lamindb/heySREilI158XLST0000.h5ad │ ├── .created_by = testuser1 (Test User1) │ ├── .created_at = 2025-01-20 07:35:53 │ └── .transform = 'scRNA-seq' ├── Dataset features/._schemas_m2m │ ├── var • 36503 [bionty.Gene] │ │ MIR1302-2HG float │ │ FAM138A float │ │ OR4F5 float │ │ OR4F29 float │ │ OR4F16 float │ │ LINC01409 float │ │ FAM87B float │ │ LINC01128 float │ │ LINC00115 float │ │ FAM41C float │ └── obs • 4 [Feature] │ assay cat[bionty.ExperimentalF… 10x 3' v3, 10x 5' v1, 10x 5' v2 │ cell_type cat[bionty.CellType] CD16-negative, CD56-bright natural kille… │ donor cat[ULabel] 582C, 621B, 637C, 640C, A29, A31, A35, A… │ tissue cat[bionty.Tissue] blood, bone marrow, caecum, duodenum, il… └── Labels └── .tissues bionty.Tissue jejunal epithelium, duodenum, caecum, bl… .cell_types bionty.CellType megakaryocyte, T follicular helper cell,… .experimental_factors bionty.ExperimentalFactor 10x 5' v1, 10x 5' v2, 10x 3' v3 .ulabels ULabel 637C, D503, 640C, A37, A35, D496, 621B, …
artifact1.view_lineage()
Show code cell output
artifact2.describe()
Show code cell output
Artifact .h5ad/AnnData ├── General │ ├── .uid = 'Xl9wc1xjhWAhOdVJ0001' │ ├── .size = 857336 │ ├── .hash = 'GK721a-L-fGDI8kXefKMtA' │ ├── .n_observations = 70 │ ├── .path = /home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna/.lamindb/Xl9wc1xjhWAhOdVJ0001.h5ad │ ├── .created_by = testuser1 (Test User1) │ ├── .created_at = 2025-01-20 07:36:13 │ └── .transform = 'Standardize and append a dataset' ├── Dataset features/._schemas_m2m │ ├── var • 754 [bionty.Gene] │ │ HES4 float │ │ TNFRSF4 float │ │ SSU72 float │ │ PARK7 float │ │ RBP7 float │ │ SRM float │ │ MAD2L2 float │ │ AGTRAP float │ │ TNFRSF1B float │ │ EFHD2 float │ │ NECAP2 float │ │ HP1BP3 float │ │ C1QA float │ │ C1QB float │ │ HNRNPR float │ │ GALE float │ │ STMN1 float │ │ CD52 float │ │ FGR float │ │ ATP5IF1 float │ └── obs • 2 [Feature] │ cell_type cat[bionty.CellType] B cell, CD19-positive, CD14-positive mon… │ cell_type_untrusted cat[bionty.CellType] B cell, CD19-positive, CD14-positive mon… └── Labels └── .cell_types bionty.CellType CD8-positive, alpha-beta memory T cell, …
artifact2.view_lineage()
Show code cell output
Compare features¶
Here we compute shared genes:
artifact1_genes = artifact1.features["var"]
artifact2_genes = artifact2.features["var"]
shared_genes = artifact1_genes & artifact2_genes
len(shared_genes)
Show code cell output
749
shared_genes.list("symbol")[:10]
Show code cell output
['HES4',
'TNFRSF4',
'SSU72',
'PARK7',
'RBP7',
'SRM',
'MAD2L2',
'AGTRAP',
'TNFRSF1B',
'EFHD2']
Compare cell types¶
artifact1_celltypes = artifact1.cell_types.all()
artifact2_celltypes = artifact2.cell_types.all()
shared_celltypes = artifact1_celltypes & artifact2_celltypes
shared_celltypes_names = shared_celltypes.list("name")
shared_celltypes_names
Show code cell output
['CD8-positive, alpha-beta memory T cell, CD45RO-positive',
'CD8-positive, alpha-beta memory T cell, CD45RO-positive']
Load the individual artifacts¶
We could either load the artifacts into memory or access them in backed
mode through .open()
to lazily load their content.
Let’s load them into memory:
adata1 = artifact1.load()
adata2 = artifact2.load()
We can now subset the two collections by shared cell types:
adata2
AnnData object with n_obs × n_vars = 70 × 754
obs: 'cell_type_untrusted', 'n_genes', 'percent_mito', 'louvain', 'cell_type_untrusted_original', 'cell_type'
var: 'symbol', 'n_counts', 'highly_variable'
uns: 'louvain', 'louvain_colors', 'neighbors', 'pca'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'
obsp: 'connectivities', 'distances'
adata1_subset = adata1[adata1.obs["cell_type"].isin(shared_celltypes_names)]
adata2_subset = adata2[adata2.obs["cell_type"].isin(shared_celltypes_names)]