Vitessce¶
This tutorial has been adopted from the data preparation examples in the Vitessce documention.
It uses a dataset from the COVID-19 Cell Atlas.
# !pip install vitessce
# !pip install 'lamindb[jupyter,aws,bionty]'
!lamin load laminlabs/lamin-dev # <-- replace with your instance
Show code cell output
→ connected lamindb: laminlabs/lamin-dev
from urllib.request import urlretrieve
from pathlib import Path
from anndata import read_h5ad
import vitessce as vit
from vitessce import data_utils as vitdu
import lamindb as ln
# [optional] track the current notebook or script
ln.context.uid = "BZhZQ6uIbkWv0000"
ln.context.track()
Show code cell output
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/vitessce/__init__.py:42: UserWarning: Extra installs are necessary to use widgets: No module named 'anywidget'
warn(f'Extra installs are necessary to use widgets: {e}')
→ connected lamindb: laminlabs/lamin-dev
→ notebook imports: anndata==0.10.9 lamindb==0.76.4 vitessce==3.3.1
→ loaded Transform('BZhZQ6uIbkWv0000') & loaded Run('2024-09-05 16:58:14.935847+00:00')
Save your dataset¶
Convert the dataset to .zarr
format.
# from https://github.com/vitessce/vitessce-python/blob/main/demos/habib-2017/src/convert_to_zarr.py
def convert_h5ad_to_zarr(input_path, output_path):
adata = read_h5ad(input_path)
adata = adata[:, adata.var["highly_variable"]].copy()
leaf_list = vitdu.sort_var_axis(adata.X, adata.var.index.values)
adata = adata[:, leaf_list].copy()
adata.layers["X_uint8"] = vitdu.to_uint8(adata.X, norm_along="var")
adata = vitdu.optimize_adata(
adata, obs_cols=["CellType"], obsm_keys=["X_umap"], layer_keys=["X_uint8"]
)
adata.write_zarr(output_path)
h5ad_filepath = "./habib17.processed.h5ad"
if not Path(h5ad_filepath).exists():
urlretrieve(
"https://covid19.cog.sanger.ac.uk/habib17.processed.h5ad", h5ad_filepath
)
zarr_filepath = "./hhabib_2017_nature_methods.anndata.zarr"
convert_h5ad_to_zarr(h5ad_filepath, zarr_filepath)
Show code cell output
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/anndata/compat/__init__.py:329: FutureWarning: Moving element from .uns['neighbors']['distances'] to .obsp['distances'].
This is where adjacency matrices should go now.
warn(
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/anndata/compat/__init__.py:329: FutureWarning: Moving element from .uns['neighbors']['connectivities'] to .obsp['connectivities'].
This is where adjacency matrices should go now.
warn(
/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/anndata/_core/storage.py:85: ImplicitModificationWarning: Layer 'X_uint8' should not be a np.matrix, use np.ndarray instead.
warnings.warn(msg, ImplicitModificationWarning)
Save a .zarr
version of the dataset to lamindb.
zarr_artifact = ln.Artifact(
zarr_filepath,
description="Habib et al., 2017 Nature Methods, optimized anndata zarr",
type="dataset",
)
zarr_artifact.save()
Show code cell output
Artifact(uid='SQUsFn6Gv8lotsb30000', is_latest=True, description='Habib et al., 2017 Nature Methods, optimized anndata zarr', suffix='.anndata.zarr', type='dataset', size=15291221, hash='GmqSFnJ699ZzkpDLispSHw', n_objects=176, _hash_type='md5-d', _accessor='AnnData', visibility=1, _key_is_virtual=True, created_by_id=2, storage_id=1, transform_id=89, run_id=122, updated_at='2024-09-05 16:59:29 UTC')
Save a VitessceConfig
object¶
You can create a dashboard for one or several datasets by using Vitessce’s component API.
vc = vit.VitessceConfig(
schema_version="1.0.15",
description=zarr_artifact.description,
)
dataset = vc.add_dataset(name="Habib 2017").add_object(
vit.AnnDataWrapper(
adata_url=zarr_artifact.path.to_url(),
obs_feature_matrix_path="layers/X_uint8",
obs_embedding_paths=["obsm/X_umap"],
obs_embedding_names=["UMAP"],
obs_set_paths=["obs/CellType"],
obs_set_names=["Cell Type"],
)
)
obs_sets = vc.add_view(vit.Component.OBS_SETS, dataset=dataset)
obs_sets_sizes = vc.add_view(vit.Component.OBS_SET_SIZES, dataset=dataset)
scatterplot = vc.add_view(vit.Component.SCATTERPLOT, dataset=dataset, mapping="UMAP")
heatmap = vc.add_view(vit.Component.HEATMAP, dataset=dataset)
genes = vc.add_view(vit.Component.FEATURE_LIST, dataset=dataset)
vc.layout(((scatterplot | obs_sets) / heatmap) | (obs_sets_sizes / genes))
Show code cell output
<vitessce.config.VitessceConfig at 0x7f0ace273070>
Save the VitessceConfig
object.
vc_artifact = ln.integrations.save_vitessce_config(
vc, description="View Habib17 in Vitessce"
)
Show code cell output
→ go to: https://lamin.ai/laminlabs/lamin-dev/artifact/wI30ixEb8E2r2OJK0000
Note
You can now see the Vitessce button show up on your dataset as in this example dataset.
vc_artifact.view_lineage()
# [optional] finish run context and auto-save the notebook
# ln.context.finish()
Upload speed
Here is a note on folder upload speed and we why chose not to use the .export(to="s3")
functionality of Vitessce.
Show code cell content
# clean up artifacts in CI run
zarr_artifact.delete(permanent=True)
vc_artifact.delete(permanent=True)