Standardize metadata on-the-flyยถ

This use cases runs on a LaminDB instance with populated CellType and Pathway registries. Make sure you run the GO Ontology notebook before executing this use case.

Here, we demonstrate how to standardize the metadata on-the-fly during cell type annotation and pathway enrichment analysis using these two registries.

For more information, see:

!lamin load use-cases-registries
๐Ÿ’ก connected lamindb: testuser1/use-cases-registries
import lamindb as ln
import bionty as bt
from lamin_usecases import datasets as ds
import scanpy as sc
import matplotlib.pyplot as plt
import celltypist
import gseapy as gp
๐Ÿ’ก connected lamindb: testuser1/use-cases-registries
sc.settings.set_figure_params(dpi=50, facecolor="white")
ln.settings.transform.stem_uid = "hsPU1OENv0LS"
ln.settings.transform.version = "0"
ln.track()
๐Ÿ’ก notebook imports: bionty==0.44.0 celltypist==1.6.3 gseapy==1.1.3 lamin_usecases==0.0.1 lamindb==0.74.1 matplotlib==3.9.0 scanpy==1.10.1
๐Ÿ’ก saved: Transform(uid='hsPU1OENv0LS6K79', version='0', name='Standardize metadata on-the-fly', key='analysis-registries', type='notebook', created_by_id=1, updated_at='2024-07-01 13:57:02 UTC')
๐Ÿ’ก saved: Run(uid='m5Tmphpfs8KahovQlwfI', transform_id=1, created_by_id=1)
Run(uid='m5Tmphpfs8KahovQlwfI', started_at='2024-07-01 13:57:02 UTC', is_consecutive=True, transform_id=1, created_by_id=1)

An interferon-beta treated datasetยถ

A small peripheral blood mononuclear cell dataset that is split into control and stimulated groups. The stimulated group was treated with interferon beta.

Letโ€™s load the dataset and perform some preprocessing:

adata = ds.anndata_seurat_ifnb(preprocess=False, populate_registries=True)
adata


AnnData object with n_obs ร— n_vars = 13999 ร— 9942
    obs: 'stim'
    var: 'symbol'
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
sc.pp.pca(adata, n_comps=20)
sc.pp.neighbors(adata, n_pcs=10)
sc.tl.umap(adata)
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Analysis: cell type annotation using CellTypistยถ

model = celltypist.models.Model.load(model="Immune_All_Low.pkl")
Hide code cell output
๐Ÿ”Ž No available models. Downloading...
๐Ÿ“œ Retrieving model list from server https://celltypist.cog.sanger.ac.uk/models/models.json
๐Ÿ“š Total models in list: 50
๐Ÿ“‚ Storing models in /home/runner/.celltypist/data/models
๐Ÿ’พ Downloading model [1/50]: Immune_All_Low.pkl
๐Ÿ’พ Downloading model [2/50]: Immune_All_High.pkl
๐Ÿ’พ Downloading model [3/50]: Adult_COVID19_PBMC.pkl
๐Ÿ’พ Downloading model [4/50]: Adult_CynomolgusMacaque_Hippocampus.pkl
๐Ÿ’พ Downloading model [5/50]: Adult_Human_PancreaticIslet.pkl
๐Ÿ’พ Downloading model [6/50]: Adult_Human_Skin.pkl
๐Ÿ’พ Downloading model [7/50]: Adult_Mouse_Gut.pkl
๐Ÿ’พ Downloading model [8/50]: Adult_Mouse_OlfactoryBulb.pkl
๐Ÿ’พ Downloading model [9/50]: Adult_Pig_Hippocampus.pkl
๐Ÿ’พ Downloading model [10/50]: Adult_RhesusMacaque_Hippocampus.pkl
๐Ÿ’พ Downloading model [11/50]: Autopsy_COVID19_Lung.pkl
๐Ÿ’พ Downloading model [12/50]: COVID19_HumanChallenge_Blood.pkl
๐Ÿ’พ Downloading model [13/50]: COVID19_Immune_Landscape.pkl
๐Ÿ’พ Downloading model [14/50]: Cells_Adult_Breast.pkl
๐Ÿ’พ Downloading model [15/50]: Cells_Fetal_Lung.pkl
๐Ÿ’พ Downloading model [16/50]: Cells_Human_Tonsil.pkl
๐Ÿ’พ Downloading model [17/50]: Cells_Intestinal_Tract.pkl
๐Ÿ’พ Downloading model [18/50]: Cells_Lung_Airway.pkl
๐Ÿ’พ Downloading model [19/50]: Developing_Human_Brain.pkl
๐Ÿ’พ Downloading model [20/50]: Developing_Human_Gonads.pkl
๐Ÿ’พ Downloading model [21/50]: Developing_Human_Hippocampus.pkl
๐Ÿ’พ Downloading model [22/50]: Developing_Human_Organs.pkl
๐Ÿ’พ Downloading model [23/50]: Developing_Human_Thymus.pkl
๐Ÿ’พ Downloading model [24/50]: Developing_Mouse_Brain.pkl
๐Ÿ’พ Downloading model [25/50]: Developing_Mouse_Hippocampus.pkl
๐Ÿ’พ Downloading model [26/50]: Fetal_Human_AdrenalGlands.pkl
๐Ÿ’พ Downloading model [27/50]: Fetal_Human_Pancreas.pkl
๐Ÿ’พ Downloading model [28/50]: Fetal_Human_Pituitary.pkl
๐Ÿ’พ Downloading model [29/50]: Fetal_Human_Retina.pkl
๐Ÿ’พ Downloading model [30/50]: Fetal_Human_Skin.pkl
๐Ÿ’พ Downloading model [31/50]: Healthy_Adult_Heart.pkl
๐Ÿ’พ Downloading model [32/50]: Healthy_COVID19_PBMC.pkl
๐Ÿ’พ Downloading model [33/50]: Healthy_Human_Liver.pkl
๐Ÿ’พ Downloading model [34/50]: Healthy_Mouse_Liver.pkl
๐Ÿ’พ Downloading model [35/50]: Human_AdultAged_Hippocampus.pkl
๐Ÿ’พ Downloading model [36/50]: Human_Colorectal_Cancer.pkl
๐Ÿ’พ Downloading model [37/50]: Human_Developmental_Retina.pkl
๐Ÿ’พ Downloading model [38/50]: Human_Embryonic_YolkSac.pkl
๐Ÿ’พ Downloading model [39/50]: Human_IPF_Lung.pkl
๐Ÿ’พ Downloading model [40/50]: Human_Longitudinal_Hippocampus.pkl
๐Ÿ’พ Downloading model [41/50]: Human_Lung_Atlas.pkl
๐Ÿ’พ Downloading model [42/50]: Human_PF_Lung.pkl
๐Ÿ’พ Downloading model [43/50]: Human_Placenta_Decidua.pkl
๐Ÿ’พ Downloading model [44/50]: Lethal_COVID19_Lung.pkl
๐Ÿ’พ Downloading model [45/50]: Mouse_Dentate_Gyrus.pkl
๐Ÿ’พ Downloading model [46/50]: Mouse_Isocortex_Hippocampus.pkl
๐Ÿ’พ Downloading model [47/50]: Mouse_Postnatal_DentateGyrus.pkl
๐Ÿ’พ Downloading model [48/50]: Mouse_Whole_Brain.pkl
๐Ÿ’พ Downloading model [49/50]: Nuclei_Lung_Airway.pkl
๐Ÿ’พ Downloading model [50/50]: Pan_Fetal_Human.pkl
predictions = celltypist.annotate(
    adata, model="Immune_All_Low.pkl", majority_voting=True
)
adata.obs["cell_type_celltypist"] = predictions.predicted_labels.majority_voting
๐Ÿ”ฌ Input data has 13999 cells and 9942 genes
๐Ÿ”— Matching reference genes in the model
๐Ÿงฌ 3698 features used for prediction
โš–๏ธ Scaling input data
๐Ÿ–‹๏ธ Predicting labels
โœ… Prediction done!
๐Ÿ‘€ Detected a neighborhood graph in the input object, will run over-clustering on the basis of it
โ›“๏ธ Over-clustering input data with resolution set to 10
๐Ÿ—ณ๏ธ Majority voting the predictions
โœ… Majority voting done!
adata.obs["cell_type_celltypist"] = bt.CellType.standardize(
    adata.obs["cell_type_celltypist"]
)
sc.pl.umap(
    adata,
    color=["cell_type_celltypist", "stim"],
    frameon=False,
    legend_fontsize=10,
    wspace=0.4,
)
... storing 'cell_type_celltypist' as categorical
_images/30610f8eb4a598ac6c2dd95a58b1a9afd787682e1e580194ef41c323b2ca076b.png

Analysis: Pathway enrichment analysis using Enrichrยถ

This analysis is based on the GSEApy scRNA-seq Example.

First, we compute differentially expressed genes using a Wilcoxon test between stimulated and control cells.

# compute differentially expressed genes
sc.tl.rank_genes_groups(
    adata,
    groupby="stim",
    use_raw=False,
    method="wilcoxon",
    groups=["STIM"],
    reference="CTRL",
)

rank_genes_groups_df = sc.get.rank_genes_groups_df(adata, "STIM")
rank_genes_groups_df.head()
names scores logfoldchanges pvals pvals_adj
0 ISG15 99.454666 7.132604 0.0 0.0
1 ISG20 96.735107 5.074248 0.0 0.0
2 IFI6 94.970726 5.828559 0.0 0.0
3 IFIT3 92.481827 7.432271 0.0 0.0
4 IFIT1 90.698677 8.053523 0.0 0.0

Next, we filter out up/down-regulated differentially expressed gene sets:

degs_up = rank_genes_groups_df[
    (rank_genes_groups_df["logfoldchanges"] > 0)
    & (rank_genes_groups_df["pvals_adj"] < 0.05)
]
degs_dw = rank_genes_groups_df[
    (rank_genes_groups_df["logfoldchanges"] < 0)
    & (rank_genes_groups_df["pvals_adj"] < 0.05)
]

degs_up.shape, degs_dw.shape
((542, 5), (935, 5))

Run pathway enrichment analysis on DEGs and plot top 10 pathways:

enr_up = gp.enrichr(degs_up.names, gene_sets="GO_Biological_Process_2023").res2d
gp.dotplot(enr_up, figsize=(2, 3), title="Up", cmap=plt.cm.autumn_r);
enr_dw = gp.enrichr(degs_dw.names, gene_sets="GO_Biological_Process_2023").res2d
gp.dotplot(enr_dw, figsize=(2, 3), title="Down", cmap=plt.cm.winter_r);

Register analyzed dataset and annotate with metadataยถ

gRegister new features and labels (check out more details here):

new_features = ln.Feature.from_df(adata.obs)
ln.save(new_features)
new_labels = [ln.ULabel(name=i) for i in adata.obs["stim"].unique()]
ln.save(new_labels)
features = ln.Feature.lookup()

Register dataset using a Artifact object:

artifact = ln.Artifact.from_anndata(
    adata,
    description="seurat_ifnb_activated_Bcells",
)
artifact.save()
Artifact(uid='IsgeEDLCyoP6wcXhw3e9', description='seurat_ifnb_activated_Bcells', suffix='.h5ad', type='dataset', accessor='AnnData', size=214910892, hash='Q84ZwSIFD43-mM4cz7JId9', hash_type='sha1-fl', visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1, transform_id=1, run_id=1, updated_at='2024-07-01 14:00:52 UTC')
artifact.features._add_set_from_anndata(
    var_field=bt.Gene.symbol,
    organism="human", # optionally, globally set organism via bt.settings.organism = "human"
)

Querying metadataยถ

artifact.describe()
Artifact(uid='IsgeEDLCyoP6wcXhw3e9', description='seurat_ifnb_activated_Bcells', suffix='.h5ad', type='dataset', accessor='AnnData', size=214910892, hash='Q84ZwSIFD43-mM4cz7JId9', hash_type='sha1-fl', visibility=1, key_is_virtual=True, updated_at='2024-07-01 14:00:53 UTC')
  Provenance
    .created_by = 'testuser1'
    .storage = '/home/runner/work/lamin-usecases/lamin-usecases/docs/use-cases-registries'
    .transform = 'Standardize metadata on-the-fly'
    .run = '2024-07-01 13:57:02 UTC'
  Labels
    .cell_types = 'effector memory CD8-positive, alpha-beta T cell', 'B cell', 'effector memory CD4-positive, alpha-beta T cell', 'dendritic cell, human', 'macrophage', 'natural killer cell', 'classical monocyte', 'non-classical monocyte', 'plasmacytoid dendritic cell', 'regulatory T cell', ...
    .ulabels = 'STIM', 'CTRL'
  Features
    'stim' = 'STIM', 'CTRL'
    'cell_type_celltypist' = 'effector memory CD8-positive, alpha-beta T cell', 'B cell', 'effector memory CD4-positive, alpha-beta T cell', 'dendritic cell, human', 'macrophage', 'natural killer cell', 'classical monocyte', 'non-classical monocyte', 'plasmacytoid dendritic cell', 'regulatory T cell', ...
  Feature sets
    'var' = 'LAMB3', 'SMARCC2', 'COQ3', 'ADNP2', 'ATF7IP', 'PLAG1', 'VPS28', 'PTPN13', 'BTBD3', 'POMT2', 'BRDT', 'ENOX2', 'GRPEL2', 'SEC31A', 'PHF3', 'DDHD1', 'CTNNBL1', 'PPP1R16B', 'BLNK'
    'obs' = 'stim', 'cell_type_celltypist'
    'STIM-up-DEGs' = 'MEF2A', 'NECAP2', 'YWHAQ', 'SCPEP1', 'TMEM50A', 'CTNNBL1', 'NFE2L2', 'MCL1', 'CSRNP1', 'NAGK', 'DNAJC15', 'PRDX4', 'HLA-F'
    'STIM-down-DEGs' = 'TPM4', 'VPS28', 'G6PD', 'SEC11A', 'EXOSC1', 'RPL19', 'SELENOF', 'AUP1', 'YPEL3', 'DAD1', 'CD9', 'BIN1', 'SON', 'TTC3', 'DDX46', 'UBE2L3', 'NDUFB1', 'TAX1BP3', 'GPX4'

Querying cell typesยถ

Querying for cell types contains โ€œB cellโ€ in the name:

bt.CellType.filter(name__contains="B cell").df().head()
uid name ontology_id abbr synonyms description public_source_id run_id created_by_id updated_at
id
1 ryEtgi1y B cell CL:0000236 None B-lymphocyte|B-cell|B cells|Cycling B cells|B ... A Lymphocyte Of B Lineage That Is Capable Of B... 29 None 1 2024-07-01 13:56:53.780213+00:00
2 2EhFTUoZ follicular B cell CL:0000843 None Fo B cell|follicular B-cell|Fo B-cell|Follicul... A Resting Mature B Cell That Has The Phenotype... 29 None 1 2024-07-01 13:56:53.573566+00:00
3 4IowPafD germinal center B cell CL:0000844 None GC B cell|germinal center B-cell|Proliferative... A Rapidly Cycling Mature B Cell That Has Disti... 29 None 1 2024-07-01 13:56:53.611478+00:00
4 2cUPBtY8 memory B cell CL:0000787 None Memory B cells|memory B-cell|Age-associated B ... A Memory B Cell Is A Mature B Cell That Is Lon... 29 None 1 2024-07-01 13:56:53.648919+00:00
5 3jdCg7zi naive B cell CL:0000788 None naive B-lymphocyte|Naive B cells|naive B-cell|... A Naive B Cell Is A Mature B Cell That Has The... 29 None 1 2024-07-01 13:56:53.667932+00:00

Querying for all artifacts annotated with a cell type:

celltypes = bt.CellType.lookup()
celltypes.plasmacytoid_dendritic_cell
CellType(uid='3JO0EdVd', name='plasmacytoid dendritic cell', ontology_id='CL:0000784', synonyms='pDC|type 2 DC|plasmacytoid T cell|T-associated plasma cell|lymphoid dendritic cell|IPC|plasmacytoid monocyte|DC2|interferon-producing cell', description='A Dendritic Cell Type Of Distinct Morphology, Localization, And Surface Marker Expression (Cd123-Positive) From Other Dendritic Cell Types And Associated With Early Stage Immune Responses, Particularly The Release Of Physiologically Abundant Amounts Of Type I Interferons In Response To Infection.', created_by_id=1, public_source_id=29, updated_at='2024-07-01 13:56:43 UTC')
ln.Artifact.filter(cell_types=celltypes.plasmacytoid_dendritic_cell).df()
uid version description key suffix type accessor size hash hash_type n_objects n_observations visibility key_is_virtual storage_id transform_id run_id created_by_id updated_at
id
1 IsgeEDLCyoP6wcXhw3e9 None seurat_ifnb_activated_Bcells None .h5ad dataset AnnData 214910892 Q84ZwSIFD43-mM4cz7JId9 sha1-fl None None 1 True 1 1 1 1 2024-07-01 14:00:53.656502+00:00

Querying pathwaysยถ

Querying for pathways contains โ€œinterferon-betaโ€ in the name:

bt.Pathway.filter(name__contains="interferon-beta").df()
uid name ontology_id abbr synonyms description public_source_id run_id created_by_id updated_at
id
684 1l4z0v8W cellular response to interferon-beta GO:0035458 None cellular response to fibroblast interferon|cel... Any Process That Results In A Change In State ... 64 None 1 2024-07-01 13:55:11.163197+00:00
2130 1NzHDJDi negative regulation of interferon-beta production GO:0032688 None down regulation of interferon-beta production|... Any Process That Stops, Prevents, Or Reduces T... 64 None 1 2024-07-01 13:55:11.321826+00:00
3127 3x0xmK1y positive regulation of interferon-beta production GO:0032728 None positive regulation of IFN-beta production|up-... Any Process That Activates Or Increases The Fr... 64 None 1 2024-07-01 13:55:11.432802+00:00
4334 54R2a0el regulation of interferon-beta production GO:0032648 None regulation of IFN-beta production Any Process That Modulates The Frequency, Rate... 64 None 1 2024-07-01 13:55:11.568388+00:00
4953 3VZq4dMe response to interferon-beta GO:0035456 None response to fiblaferon|response to fibroblast ... Any Process That Results In A Change In State ... 64 None 1 2024-07-01 13:55:11.637183+00:00

Query pathways from a gene:

bt.Pathway.filter(genes__symbol="KIR2DL1").df()
uid name ontology_id abbr synonyms description public_source_id run_id created_by_id updated_at
id
1346 7S7qlEkG immune response-inhibiting cell surface recept... GO:0002767 None immune response-inhibiting cell surface recept... The Series Of Molecular Signals Initiated By A... 64 None 1 2024-07-01 13:55:11.234751+00:00

Query artifacts from a pathway:

ln.Artifact.filter(feature_sets__pathways__name__icontains="interferon-beta").first()
Artifact(uid='IsgeEDLCyoP6wcXhw3e9', description='seurat_ifnb_activated_Bcells', suffix='.h5ad', type='dataset', accessor='AnnData', size=214910892, hash='Q84ZwSIFD43-mM4cz7JId9', hash_type='sha1-fl', visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1, transform_id=1, run_id=1, updated_at='2024-07-01 14:00:53 UTC')

Query featuresets from a pathway to learn from which geneset this pathway was computed:

pathway = bt.Pathway.filter(ontology_id="GO:0035456").one()
pathway
Pathway(uid='3VZq4dMe', name='response to interferon-beta', ontology_id='GO:0035456', synonyms='response to fiblaferon|response to fibroblast interferon|response to interferon beta', description='Any Process That Results In A Change In State Or Activity Of A Cell Or An Organism (In Terms Of Movement, Secretion, Enzyme Production, Gene Expression, Etc.) As A Result Of An Interferon-Beta Stimulus. Interferon-Beta Is A Type I Interferon.', created_by_id=1, public_source_id=64, updated_at='2024-07-01 13:55:11 UTC')
degs = ln.FeatureSet.filter(pathways__ontology_id=pathway.ontology_id).one()

Now we can get the list of genes that are differentially expressed and belong to this pathway:

contributing_genes = pathway.genes.all() & degs.genes.all()
contributing_genes.list("symbol")
['MNDA',
 'IFITM3',
 'IFITM1',
 'PLSCR1',
 'PNPT1',
 'STAT1',
 'SHFL',
 'OAS1',
 'AIM2',
 'CALM1',
 'IRF1',
 'IFI16',
 'XAF1',
 'BST2',
 'IFITM2']
# clean up test instance
!lamin delete --force use-cases-registries
!rm -r ./use-cases-registries
Hide code cell output
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.14/x64/bin/lamin", line 8, in <module>
    sys.exit(main())
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 367, in __call__
    return super().__call__(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/rich_click/rich_command.py", line 152, in main
    rv = self.invoke(ctx)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamin_cli/__main__.py", line 103, in delete
    return delete(instance, force=force)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamindb_setup/_delete.py", line 98, in delete
    n_objects = check_storage_is_empty(
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/lamindb_setup/core/upath.py", line 779, in check_storage_is_empty
    raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage /home/runner/work/lamin-usecases/lamin-usecases/docs/use-cases-registries/.lamindb contains 1 objects ('_is_initialized' ignored) - delete them prior to deleting the instance
['/home/runner/work/lamin-usecases/lamin-usecases/docs/use-cases-registries/.lamindb/IsgeEDLCyoP6wcXhw3e9.h5ad', '/home/runner/work/lamin-usecases/lamin-usecases/docs/use-cases-registries/.lamindb/_is_initialized']