Query artifacts¶
Here, we’ll query artifacts and inspect their metadata.
This guide can be skipped if you are only interested in how to leverage the overall collection.
import lamindb as ln
import bionty as bt
ln.track("agayZTonayqA0000")
Show code cell output
→ connected lamindb: testuser1/test-scrna
→ created Transform('agayZTon'), started new Run('QigE5SYw') at 2024-11-21 06:54:10 UTC
→ notebook imports: bionty==0.53.1 lamindb==0.76.16
Query artifacts by provenance metadata¶
Query the transform, e.g., by uid
:
transform = ln.Transform.get(uid="Nv48yAceNSh80003")
Query the artifact:
ln.Artifact.filter(transform=transform).df()
Show code cell output
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
1 | AE158pvBDEQbQeOj0000 | None | True | Human immune cells from Conde22 | None | .h5ad | dataset | 57612943 | t_YJQpYrAyAGhs7Ir68zKj | None | 1648 | sha1-fl | AnnData | 1 | True | 1 | 1 | 1 | 2024-11-21 06:53:36.460292+00:00 | 1 |
Query artifacts by biological metadata¶
tissues = bt.Tissue.lookup()
query = ln.Artifact.filter(
tissues=tissues.blood,
)
query.df()
Show code cell output
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
1 | AE158pvBDEQbQeOj0000 | None | True | Human immune cells from Conde22 | None | .h5ad | dataset | 57612943 | t_YJQpYrAyAGhs7Ir68zKj | None | 1648 | sha1-fl | AnnData | 1 | True | 1 | 1 | 1 | 2024-11-21 06:53:36.460292+00:00 | 1 |
Inspect artifact metadata¶
Query all artifacts that measured the “cell_type” feature:
query_set = ln.Artifact.filter(feature_sets__features__name="cell_type").all()
artifact1, artifact2 = query_set[0], query_set[1]
artifact1.describe()
Show code cell output
Artifact(uid='AE158pvBDEQbQeOj0000', is_latest=True, description='Human immune cells from Conde22', suffix='.h5ad', type='dataset', size=57612943, hash='t_YJQpYrAyAGhs7Ir68zKj', n_observations=1648, _hash_type='sha1-fl', _accessor='AnnData', visibility=1, _key_is_virtual=True, created_at=2024-11-21 06:53:36 UTC)
Provenance
.storage = '/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna'
.transform = 'scRNA-seq'
.run = 2024-11-21 06:52:53 UTC
.created_by = 'testuser1'
Usage
.input_of_runs = 2024-11-21 06:53:45 UTC
Labels
.tissues = 'duodenum', 'lamina propria', 'sigmoid colon', 'jejunal epithelium', 'thymus', 'skeletal muscle tissue', 'caecum', 'mesenteric lymph node', 'spleen', 'omentum', ...
.cell_types = 'megakaryocyte', 'effector memory CD4-positive, alpha-beta T cell', 'plasmacytoid dendritic cell', 'alveolar macrophage', 'naive B cell', 'alpha-beta T cell', 'CD8-positive, alpha-beta memory T cell, CD45RO-positive', 'progenitor cell', 'gamma-delta T cell', 'CD4-positive helper T cell', ...
.experimental_factors = '10x 3' v3', '10x 5' v1', '10x 5' v2'
.ulabels = 'A52', 'A29', '582C', 'D496', 'A35', '640C', 'A36', '637C', '621B', 'A37', ...
Features
'assay' = '10x 3' v3', '10x 5' v1', '10x 5' v2'
'cell_type' = 'CD16-negative, CD56-bright natural killer cell, human', 'CD16-positive, CD56-dim natural killer cell, human', 'CD4-positive helper T cell', 'CD8-positive, alpha-beta memory T cell', 'CD8-positive, alpha-beta memory T cell, CD45RO-positive', 'T follicular helper cell', 'alpha-beta T cell', 'alveolar macrophage', 'animal cell', 'classical monocyte', ...
'donor' = '582C', '621B', '637C', '640C', 'A29', 'A31', 'A35', 'A36', 'A37', 'A52', ...
'tissue' = 'blood', 'bone marrow', 'caecum', 'duodenum', 'ileum', 'jejunal epithelium', 'lamina propria', 'liver', 'lung', 'mesenteric lymph node', ...
Feature sets
'var' = 'MIR1302-2HG', 'FAM138A', 'OR4F5', 'None', 'OR4F29', 'OR4F16', 'LINC01409', 'FAM87B', 'LINC01128', 'LINC00115', 'FAM41C'
'obs' = 'donor', 'tissue', 'cell_type', 'assay'
artifact1.view_lineage()
Show code cell output
artifact2.describe()
Show code cell output
Artifact(uid='b7dQMbVcWW7iNJzH0001', is_latest=True, description='10x reference adata, trusted cell type annotation', suffix='.h5ad', type='dataset', size=851664, hash='iETHP3Lw-tVqZxYAuEC-SA', n_observations=70, _hash_type='md5', _accessor='AnnData', visibility=1, _key_is_virtual=True, created_at=2024-11-21 06:54:05 UTC)
Provenance
.storage = '/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna'
.transform = 'Standardize and append a dataset'
.run = 2024-11-21 06:53:45 UTC
.created_by = 'testuser1'
Labels
.cell_types = 'CD16-positive, CD56-dim natural killer cell, human', 'dendritic cell', 'B cell, CD19-positive', 'effector memory CD4-positive, alpha-beta T cell, terminally differentiated', 'cytotoxic T cell', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'CD14-positive, CD16-negative classical monocyte', 'CD38-positive naive B cell', 'CD38-high pre-BCR positive cell'
Features
'cell_type' = 'B cell, CD19-positive', 'CD14-positive, CD16-negative classical monocyte', 'CD16-positive, CD56-dim natural killer cell, human', 'CD38-high pre-BCR positive cell', 'CD38-positive naive B cell', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'cytotoxic T cell', 'dendritic cell', 'effector memory CD4-positive, alpha-beta T cell, terminally differentiated'
'cell_type_untrusted' = 'B cell, CD19-positive', 'CD14-positive, CD16-negative classical monocyte', 'CD16-positive, CD56-dim natural killer cell, human', 'CD38-high pre-BCR positive cell', 'CD38-positive naive B cell', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'cytotoxic T cell', 'dendritic cell', 'effector memory CD4-positive, alpha-beta T cell, terminally differentiated'
Feature sets
'var' = 'HES4', 'TNFRSF4', 'SSU72', 'PARK7', 'RBP7', 'SRM', 'MAD2L2', 'AGTRAP', 'TNFRSF1B', 'EFHD2', 'NECAP2', 'HP1BP3', 'C1QA', 'C1QB', 'HNRNPR', 'GALE', 'STMN1', 'CD52', 'FGR', 'ATP5IF1'
'obs' = 'cell_type', 'cell_type_untrusted'
artifact2.view_lineage()
Show code cell output
Compare features¶
Here we compute shared genes:
artifact1_genes = artifact1.features["var"]
artifact2_genes = artifact2.features["var"]
shared_genes = artifact1_genes & artifact2_genes
len(shared_genes)
Show code cell output
749
shared_genes.list("symbol")[:10]
Show code cell output
['HES4',
'TNFRSF4',
'SSU72',
'PARK7',
'RBP7',
'SRM',
'MAD2L2',
'AGTRAP',
'TNFRSF1B',
'EFHD2']
Compare cell types¶
artifact1_celltypes = artifact1.cell_types.all()
artifact2_celltypes = artifact2.cell_types.all()
shared_celltypes = artifact1_celltypes & artifact2_celltypes
shared_celltypes_names = shared_celltypes.list("name")
shared_celltypes_names
Show code cell output
['CD16-positive, CD56-dim natural killer cell, human',
'CD16-positive, CD56-dim natural killer cell, human']
Load the individual artifacts¶
We could either load the artifacts into memory or access them in backed
mode through .open()
to lazily load their content.
Let’s load them into memory:
adata1 = artifact1.load()
adata2 = artifact2.load()
We can now subset the two collections by shared cell types:
adata2
AnnData object with n_obs × n_vars = 70 × 754
obs: 'cell_type_untrusted', 'n_genes', 'percent_mito', 'louvain', 'cell_type_untrusted_original', 'cell_type'
var: 'symbol', 'n_counts', 'highly_variable'
uns: 'louvain', 'louvain_colors', 'neighbors', 'pca'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'
obsp: 'connectivities', 'distances'
adata1_subset = adata1[adata1.obs["cell_type"].isin(shared_celltypes_names)]
adata2_subset = adata2[adata2.obs["cell_type"].isin(shared_celltypes_names)]