Vitessce¶
This tutorial has been adopted from the data preparation examples in the Vitessce documention.
It uses a dataset from the COVID-19 Cell Atlas.
# !pip install vitessce>=0.3.4
# !pip install 'lamindb[jupyter,aws,bionty]'
!lamin connect laminlabs/lamin-dev # <-- replace with your instance
Show code cell output
→ connected lamindb: laminlabs/lamin-dev
from urllib.request import urlretrieve
from pathlib import Path
from anndata import read_h5ad
import vitessce as vit
from vitessce import data_utils as vitdu
import lamindb as ln
# [optional] track the current notebook or script
ln.track("BZhZQ6uIbkWv0000")
Show code cell output
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/vitessce/__init__.py:42: UserWarning: Extra installs are necessary to use widgets: No module named 'anywidget'
warn(f'Extra installs are necessary to use widgets: {e}')
→ connected lamindb: laminlabs/lamin-dev
→ loaded Transform('BZhZQ6uI'), started Run('65LLvfnV') at 2024-11-21 05:37:49 UTC
→ notebook imports: anndata==0.10.9 lamindb==0.76.16 vitessce==3.4.3
Save your dataset¶
Convert the dataset to .zarr
format.
# from https://github.com/vitessce/vitessce-python/blob/main/demos/habib-2017/src/convert_to_zarr.py
def convert_h5ad_to_zarr(input_path, output_path):
adata = read_h5ad(input_path)
adata = adata[:, adata.var["highly_variable"]].copy()
leaf_list = vitdu.sort_var_axis(adata.X, adata.var.index.values)
adata = adata[:, leaf_list].copy()
adata.layers["X_uint8"] = vitdu.to_uint8(adata.X, norm_along="var")
adata = vitdu.optimize_adata(
adata, obs_cols=["CellType"], obsm_keys=["X_umap"], layer_keys=["X_uint8"]
)
adata.write_zarr(output_path)
h5ad_filepath = "./habib17.processed.h5ad"
if not Path(h5ad_filepath).exists():
urlretrieve(
"https://covid19.cog.sanger.ac.uk/habib17.processed.h5ad", h5ad_filepath
)
zarr_filepath = "./hhabib_2017_nature_methods.anndata.zarr"
convert_h5ad_to_zarr(h5ad_filepath, zarr_filepath)
Show code cell output
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/anndata/compat/__init__.py:329: FutureWarning: Moving element from .uns['neighbors']['distances'] to .obsp['distances'].
This is where adjacency matrices should go now.
warn(
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/anndata/compat/__init__.py:329: FutureWarning: Moving element from .uns['neighbors']['connectivities'] to .obsp['connectivities'].
This is where adjacency matrices should go now.
warn(
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/anndata/_core/storage.py:85: ImplicitModificationWarning: Layer 'X_uint8' should not be a np.matrix, use np.ndarray instead.
warnings.warn(msg, ImplicitModificationWarning)
Save a .zarr
version of the dataset to lamindb.
zarr_artifact = ln.Artifact(
zarr_filepath,
description="Habib et al., 2017 Nature Methods, optimized anndata zarr",
type="dataset",
)
zarr_artifact.save()
Show code cell output
Artifact(uid='MBGeelc3ecRWcFxk0000', is_latest=True, description='Habib et al., 2017 Nature Methods, optimized anndata zarr', suffix='.anndata.zarr', type='dataset', size=15291221, hash='_KUrHZHhe6iVcKczGuY49w', n_objects=176, _hash_type='md5-d', _accessor='AnnData', visibility=1, _key_is_virtual=True, storage_id=1, transform_id=107, run_id=179, created_by_id=2, created_at=2024-11-21 05:39:03 UTC)
Save a VitessceConfig object¶
You can create a dashboard for one or several datasets by using Vitessce’s component API.
vc = vit.VitessceConfig(
schema_version="1.0.15",
description=zarr_artifact.description,
)
dataset = vc.add_dataset(name="Habib 2017").add_object(
vit.AnnDataWrapper(
adata_artifact=zarr_artifact,
obs_feature_matrix_path="layers/X_uint8",
obs_embedding_paths=["obsm/X_umap"],
obs_embedding_names=["UMAP"],
obs_set_paths=["obs/CellType"],
obs_set_names=["Cell Type"],
)
)
obs_sets = vc.add_view(vit.Component.OBS_SETS, dataset=dataset)
obs_sets_sizes = vc.add_view(vit.Component.OBS_SET_SIZES, dataset=dataset)
scatterplot = vc.add_view(vit.Component.SCATTERPLOT, dataset=dataset, mapping="UMAP")
heatmap = vc.add_view(vit.Component.HEATMAP, dataset=dataset)
genes = vc.add_view(vit.Component.FEATURE_LIST, dataset=dataset)
vc.layout(((scatterplot | obs_sets) / heatmap) | (obs_sets_sizes / genes))
Show code cell output
<vitessce.config.VitessceConfig at 0x7f7d65ccfb10>
Save the VitessceConfig
object.
vc_artifact = ln.integrations.save_vitessce_config(
vc, description="View Habib17 in Vitessce"
)
Show code cell output
→ VitessceConfig references these artifacts:
Artifact(uid='MBGeelc3ecRWcFxk0000', is_latest=True, description='Habib et al., 2017 Nature Methods, optimized anndata zarr', suffix='.anndata.zarr', type='dataset', size=15291221, hash='_KUrHZHhe6iVcKczGuY49w', n_objects=176, _hash_type='md5-d', _accessor='AnnData', visibility=1, _key_is_virtual=True, storage_id=1, transform_id=107, run_id=179, created_by_id=2, created_at=2024-11-21 05:39:03 UTC)
→ VitessceConfig: https://lamin.ai/laminlabs/lamin-dev/artifact/P9w68wcdLzxzFoRp0000
→ Dataset: https://lamin.ai/laminlabs/lamin-dev/artifact/MBGeelc3ecRWcFxk0000
Note
You can now see the Vitessce button show up on your dataset as in this example dataset.
If your VitessceConfig
object contains multiple datasets, the Vitessce button will appear next to a Collection
that groups these artifacts.
vc_artifact.view_lineage()
# [optional] finish run context and auto-save the notebook
# ln.finish()
Upload speed
Here is a note on folder upload speed and why lamindb
does not use the .export(to="s3")
functionality of Vitessce.
Show code cell content
# clean up artifacts in CI run
zarr_artifact.delete(permanent=True)
vc_artifact.delete(permanent=True)