Query artifacts¶
Here, we’ll query artifacts and inspect their metadata.
This guide can be skipped if you are only interested in how to leverage the overall collection.
import lamindb as ln
import bionty as bt
ln.track("agayZTonayqA0000")
Show code cell output
→ connected lamindb: testuser1/test-scrna
→ notebook imports: bionty==0.51.2 lamindb==0.76.12
→ created Transform('agayZTon'), started new Run('rVmpZpdM') at 2024-10-11 09:33:22 UTC
Query artifacts by provenance metadata¶
users = ln.User.lookup()
ln.Transform.filter(created_by=users.testuser1).search("scrna").df()
Show code cell output
uid | version | is_latest | name | key | description | type | source_code | hash | reference | reference_type | _source_code_artifact_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
1 | Nv48yAceNSh80000 | None | True | scRNA-seq | scrna.ipynb | None | notebook | None | None | None | None | None | 2024-10-11 09:32:00.692969+00:00 | 1 |
2 | ManDYgmftZ8C0000 | None | True | Standardize and append a batch of data | scrna2.ipynb | None | notebook | None | None | None | None | None | 2024-10-11 09:32:59.007502+00:00 | 1 |
3 | agayZTonayqA0000 | None | True | Query artifacts | scrna3.ipynb | None | notebook | None | None | None | None | None | 2024-10-11 09:33:22.518471+00:00 | 1 |
transform = ln.Transform.get(uid="Nv48yAceNSh80000")
ln.Artifact.filter(transform=transform).df()
Show code cell output
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||||||||
1 | FPsBpgCam6wbPLlY0000 | None | True | Human immune cells from Conde22 | None | .h5ad | dataset | 57612943 | t_YJQpYrAyAGhs7Ir68zKj | None | 1648 | sha1-fl | AnnData | 1 | True | 1 | 1 | 1 | 2024-10-11 09:32:51.448605+00:00 | 1 |
Query artifacts by biological metadata¶
organism = bt.Organism.lookup()
tissues = bt.Tissue.lookup()
query = ln.Artifact.filter(
organisms=organism.human,
tissues=tissues.bone_marrow,
)
query.df()
Show code cell output
uid | version | is_latest | description | key | suffix | type | size | hash | n_objects | n_observations | _hash_type | _accessor | visibility | _key_is_virtual | storage_id | transform_id | run_id | created_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id |
Inspect artifact metadata¶
query_set = ln.Artifact.filter().all()
artifact1, artifact2 = query_set[0], query_set[1]
artifact1.describe()
Show code cell output
Artifact(uid='FPsBpgCam6wbPLlY0000', is_latest=True, description='Human immune cells from Conde22', suffix='.h5ad', type='dataset', size=57612943, hash='t_YJQpYrAyAGhs7Ir68zKj', n_observations=1648, _hash_type='sha1-fl', _accessor='AnnData', visibility=1, _key_is_virtual=True, created_at=2024-10-11 09:32:51 UTC)
Provenance
.storage = '/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna'
.transform = 'scRNA-seq'
.run = 2024-10-11 09:32:00 UTC
.created_by = 'testuser1'
Usage
.input_of_runs = 2024-10-11 09:32:59 UTC
Labels
.tissues = 'blood', 'thoracic lymph node', 'spleen', 'lung', 'mesenteric lymph node', 'lamina propria', 'liver', 'jejunal epithelium', 'omentum', 'bone marrow', ...
.cell_types = 'classical monocyte', 'T follicular helper cell', 'memory B cell', 'alveolar macrophage', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'effector memory CD8-positive, alpha-beta T cell, terminally differentiated', 'alpha-beta T cell', 'CD4-positive helper T cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'macrophage', ...
.experimental_factors = '10x 3' v3', '10x 5' v2', '10x 5' v1'
.ulabels = 'D496', '621B', 'A29', 'A36', 'A35', '637C', 'A52', 'A37', 'D503', '640C', ...
Features
'assay' = '10x 3' v3', '10x 5' v2', '10x 5' v1'
'cell_type' = 'classical monocyte', 'T follicular helper cell', 'memory B cell', 'alveolar macrophage', 'naive thymus-derived CD4-positive, alpha-beta T cell', 'effector memory CD8-positive, alpha-beta T cell, terminally differentiated', 'alpha-beta T cell', 'CD4-positive helper T cell', 'naive thymus-derived CD8-positive, alpha-beta T cell', 'macrophage', ...
'donor' = 'D496', '621B', 'A29', 'A36', 'A35', '637C', 'A52', 'A37', 'D503', '640C', ...
'tissue' = 'blood', 'thoracic lymph node', 'spleen', 'lung', 'mesenteric lymph node', 'lamina propria', 'liver', 'jejunal epithelium', 'omentum', 'bone marrow', ...
Feature sets
'var' = 'MIR1302-2HG', 'FAM138A', 'OR4F5', 'None', 'OR4F29', 'OR4F16', 'LINC01409', 'FAM87B', 'LINC01128', 'LINC00115', 'FAM41C'
'obs' = 'donor', 'tissue', 'cell_type', 'assay'
artifact1.view_lineage()
artifact2.describe()
Show code cell output
Artifact(uid='H37yo7578CVfxO0N0000', is_latest=True, description='10x reference adata', suffix='.h5ad', type='dataset', size=853388, hash='mIKkPaZAA3EdtZLeFuWNEg', n_observations=70, _hash_type='md5', _accessor='AnnData', visibility=1, _key_is_virtual=True, created_at=2024-10-11 09:33:17 UTC)
Provenance
.storage = '/home/runner/work/lamin-usecases/lamin-usecases/docs/test-scrna'
.transform = 'Standardize and append a batch of data'
.run = 2024-10-11 09:32:59 UTC
.created_by = 'testuser1'
Labels
.cell_types = 'B cell, CD19-positive', 'dendritic cell', 'effector memory CD4-positive, alpha-beta T cell, terminally differentiated', 'cytotoxic T cell', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'CD16-positive, CD56-dim natural killer cell, human', 'CD14-positive, CD16-negative classical monocyte', 'CD38-positive naive B cell', 'CD38-high pre-BCR positive cell'
Features
'cell_type' = 'B cell, CD19-positive', 'dendritic cell', 'effector memory CD4-positive, alpha-beta T cell, terminally differentiated', 'cytotoxic T cell', 'CD8-positive, CD25-positive, alpha-beta regulatory T cell', 'CD16-positive, CD56-dim natural killer cell, human', 'CD14-positive, CD16-negative classical monocyte', 'CD38-positive naive B cell', 'CD38-high pre-BCR positive cell'
Feature sets
'var' = 'TLE5', 'S1PR4', 'CD164', 'SMIM24', 'DCAF10', 'RAB13', 'TPM3', 'HES4', 'HAX1', 'GSTK1', 'SNX2', 'GTF3C6', 'ADD3', 'ACAA1', 'MATK', 'ZYX', 'JAML', 'CD3E', 'TNFRSF4', 'EXOG'
'obs' = 'cell_type'
artifact2.view_lineage()
Compare features¶
Here we compute shared genes:
artifact1_genes = artifact1.features["var"]
artifact2_genes = artifact2.features["var"]
shared_genes = artifact1_genes & artifact2_genes
len(shared_genes)
Show code cell output
749
shared_genes.list("symbol")[:10]
Show code cell output
['HES4',
'TNFRSF4',
'SSU72',
'PARK7',
'RBP7',
'SRM',
'MAD2L2',
'AGTRAP',
'TNFRSF1B',
'EFHD2']
Compare cell types¶
artifact1_celltypes = artifact1.cell_types.all()
artifact2_celltypes = artifact2.cell_types.all()
shared_celltypes = artifact1_celltypes & artifact2_celltypes
shared_celltypes_names = shared_celltypes.list("name")
shared_celltypes_names
Show code cell output
['CD16-positive, CD56-dim natural killer cell, human']
Load the individual artifacts¶
We could either load the artifacts into memory or access them in backed
mode through .open()
to lazily load their content.
Let’s load them into memory:
adata1 = artifact1.load()
adata2 = artifact2.load()
We can now subset the two collections by shared cell types:
adata1_subset = adata1[adata1.obs["cell_type"].isin(shared_celltypes_names)]
adata2_subset = adata2[adata2.obs["cell_type"].isin(shared_celltypes_names)]