hub

Vitessce

This tutorial has been adopted from the data preparation examples in the Vitessce documention.

It uses a dataset from the COVID-19 Cell Atlas.

# !pip install vitessce>=0.3.4
# !pip install 'lamindb[jupyter,aws,bionty]'
!lamin connect laminlabs/lamin-dev  # <-- replace with your instance
Hide code cell output
→ connected lamindb: laminlabs/lamin-dev
from urllib.request import urlretrieve
from pathlib import Path
from anndata import read_h5ad
import vitessce as vit
from vitessce import data_utils as vitdu
import lamindb as ln

# [optional] track the current notebook or script
ln.track("BZhZQ6uIbkWv0000")
Hide code cell output
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/vitessce/__init__.py:42: UserWarning: Extra installs are necessary to use widgets: No module named 'anywidget'
  warn(f'Extra installs are necessary to use widgets: {e}')
→ connected lamindb: laminlabs/lamin-dev
→ loaded Transform('BZhZQ6uI'), started Run('65LLvfnV') at 2024-11-21 05:37:49 UTC
→ notebook imports: anndata==0.10.9 lamindb==0.76.16 vitessce==3.4.3

Save your dataset

Convert the dataset to .zarr format.

# from https://github.com/vitessce/vitessce-python/blob/main/demos/habib-2017/src/convert_to_zarr.py
def convert_h5ad_to_zarr(input_path, output_path):
    adata = read_h5ad(input_path)
    adata = adata[:, adata.var["highly_variable"]].copy()
    leaf_list = vitdu.sort_var_axis(adata.X, adata.var.index.values)
    adata = adata[:, leaf_list].copy()
    adata.layers["X_uint8"] = vitdu.to_uint8(adata.X, norm_along="var")
    adata = vitdu.optimize_adata(
        adata, obs_cols=["CellType"], obsm_keys=["X_umap"], layer_keys=["X_uint8"]
    )
    adata.write_zarr(output_path)


h5ad_filepath = "./habib17.processed.h5ad"
if not Path(h5ad_filepath).exists():
    urlretrieve(
        "https://covid19.cog.sanger.ac.uk/habib17.processed.h5ad", h5ad_filepath
    )
zarr_filepath = "./hhabib_2017_nature_methods.anndata.zarr"
convert_h5ad_to_zarr(h5ad_filepath, zarr_filepath)
Hide code cell output
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/anndata/compat/__init__.py:329: FutureWarning: Moving element from .uns['neighbors']['distances'] to .obsp['distances'].

This is where adjacency matrices should go now.
  warn(
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/anndata/compat/__init__.py:329: FutureWarning: Moving element from .uns['neighbors']['connectivities'] to .obsp['connectivities'].

This is where adjacency matrices should go now.
  warn(
/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/site-packages/anndata/_core/storage.py:85: ImplicitModificationWarning: Layer 'X_uint8' should not be a np.matrix, use np.ndarray instead.
  warnings.warn(msg, ImplicitModificationWarning)

Save a .zarr version of the dataset to lamindb.

zarr_artifact = ln.Artifact(
    zarr_filepath,
    description="Habib et al., 2017 Nature Methods, optimized anndata zarr",
    type="dataset",
)
zarr_artifact.save()
Hide code cell output
Artifact(uid='MBGeelc3ecRWcFxk0000', is_latest=True, description='Habib et al., 2017 Nature Methods, optimized anndata zarr', suffix='.anndata.zarr', type='dataset', size=15291221, hash='_KUrHZHhe6iVcKczGuY49w', n_objects=176, _hash_type='md5-d', _accessor='AnnData', visibility=1, _key_is_virtual=True, storage_id=1, transform_id=107, run_id=179, created_by_id=2, created_at=2024-11-21 05:39:03 UTC)

Save a VitessceConfig object

You can create a dashboard for one or several datasets by using Vitessce’s component API.

vc = vit.VitessceConfig(
    schema_version="1.0.15",
    description=zarr_artifact.description,
)
dataset = vc.add_dataset(name="Habib 2017").add_object(
    vit.AnnDataWrapper(
        adata_artifact=zarr_artifact,
        obs_feature_matrix_path="layers/X_uint8",
        obs_embedding_paths=["obsm/X_umap"],
        obs_embedding_names=["UMAP"],
        obs_set_paths=["obs/CellType"],
        obs_set_names=["Cell Type"],
    )
)
obs_sets = vc.add_view(vit.Component.OBS_SETS, dataset=dataset)
obs_sets_sizes = vc.add_view(vit.Component.OBS_SET_SIZES, dataset=dataset)
scatterplot = vc.add_view(vit.Component.SCATTERPLOT, dataset=dataset, mapping="UMAP")
heatmap = vc.add_view(vit.Component.HEATMAP, dataset=dataset)
genes = vc.add_view(vit.Component.FEATURE_LIST, dataset=dataset)
vc.layout(((scatterplot | obs_sets) / heatmap) | (obs_sets_sizes / genes))
Hide code cell output
<vitessce.config.VitessceConfig at 0x7f7d65ccfb10>

Save the VitessceConfig object.

vc_artifact = ln.integrations.save_vitessce_config(
    vc, description="View Habib17 in Vitessce"
)
Hide code cell output
→ VitessceConfig references these artifacts:
Artifact(uid='MBGeelc3ecRWcFxk0000', is_latest=True, description='Habib et al., 2017 Nature Methods, optimized anndata zarr', suffix='.anndata.zarr', type='dataset', size=15291221, hash='_KUrHZHhe6iVcKczGuY49w', n_objects=176, _hash_type='md5-d', _accessor='AnnData', visibility=1, _key_is_virtual=True, storage_id=1, transform_id=107, run_id=179, created_by_id=2, created_at=2024-11-21 05:39:03 UTC)
→ VitessceConfig: https://lamin.ai/laminlabs/lamin-dev/artifact/P9w68wcdLzxzFoRp0000
→ Dataset: https://lamin.ai/laminlabs/lamin-dev/artifact/MBGeelc3ecRWcFxk0000

Note

You can now see the Vitessce button show up on your dataset as in this example dataset.

If your VitessceConfig object contains multiple datasets, the Vitessce button will appear next to a Collection that groups these artifacts.

vc_artifact.view_lineage()
_images/478e45095403e0430715d5a827b32fb4004d62f2c3fbd159a3d2f7b785c35418.svg
# [optional] finish run context and auto-save the notebook
# ln.finish()
Upload speed

Here is a note on folder upload speed and why lamindb does not use the .export(to="s3") functionality of Vitessce.

Hide code cell content
# clean up artifacts in CI run
zarr_artifact.delete(permanent=True)
vc_artifact.delete(permanent=True)