lamindb.examples.datasets

Example datasets.

The mini immuno dataset

lamindb.examples.datasets.mini_immuno()

The two “mini immuno” datasets.

Small in-memory datasets

lamindb.examples.datasets.anndata_with_obs()

Create a mini anndata with cell_type, disease and tissue.

Return type:

AnnData

Files

lamindb.examples.datasets.file_fcs()

Example FCS artifact.

Return type:

Path

lamindb.examples.datasets.file_fcs_alpert19(populate_registries=False)

FCS file from Alpert19.

Parameters:

populate_registries (bool, default: False) – pre-populate metadata records to simulate existing registries # noqa

Return type:

Path

lamindb.examples.datasets.file_tsv_rnaseq_nfcore_salmon_merged_gene_counts(populate_registries=False)

Gene counts table from nf-core RNA-seq pipeline.

Output of: https://nf-co.re/rnaseq

Return type:

Path

lamindb.examples.datasets.file_jpg_paradisi05()

Return jpg file example.

Originally from: https://upload.wikimedia.org/wikipedia/commons/2/28/Laminopathic_nuclei.jpg

Return type:

Path

lamindb.examples.datasets.file_tiff_suo22()

Image file from Suo22.

Pair with anndata_suo22_Visium10X

Return type:

Path

lamindb.examples.datasets.file_fastq(in_storage_root=False)

Mini mock fastq artifact.

Return type:

Path

lamindb.examples.datasets.file_bam(in_storage_root=False)

Mini mock bam artifact.

Return type:

Path

lamindb.examples.datasets.file_mini_csv(in_storage_root=False)

Mini csv artifact.

Return type:

Path

Directories

lamindb.examples.datasets.dir_scrnaseq_cellranger(sample_name, basedir='./', output_only=True)

Generate mock cell ranger outputs.

Parameters:
  • sample_name (str) – name of the sample

  • basedir (str | Path, default: './') – run directory

  • output_only (bool, default: True) – only generate output files

lamindb.examples.datasets.dir_iris_images()

Directory with 3 studies of the Iris flower: 405 images & metadata.

Provenance: https://lamin.ai/laminlabs/lamindata/transform/3q4MpQxRL2qZ5zKv

The problem is that the same artifact was also ingested by the downstream demo notebook: https://lamin.ai/laminlabs/lamindata/transform/NJvdsWWbJlZS5zKv

This is why on the UI, the artifact shows up as output of the downstream demo notebook rather than the upstream curation notebook. The lineage information should still be captured by laminlabs/lnschema-core but we don’t use this in the UI yet.

Return type:

UPath

Dictionary, Dataframe, AnnData, MuData, SpatialData

lamindb.examples.datasets.dict_cellxgene_uns()

An example CELLxGENE AnnData .uns dictionary.

Return type:

dict[str, Any]

lamindb.examples.datasets.df_iris()

The iris collection as in sklearn.

Original code:

sklearn.collections.load_iris(as_frame=True).frame
Return type:

DataFrame

lamindb.examples.datasets.df_iris_in_meter()

The iris collection with lengths in meter.

Return type:

DataFrame

lamindb.examples.datasets.df_iris_in_meter_study1()

The iris collection with lengths in meter.

Return type:

DataFrame

lamindb.examples.datasets.df_iris_in_meter_study2()

The iris collection with lengths in meter.

Return type:

DataFrame

lamindb.examples.datasets.anndata_mouse_sc_lymph_node(populate_registries=False)

Mouse lymph node scRNA-seq collection from EBI.

Subsampled to 10k genes.

From: https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-8414/

Parameters:

populate_registries (bool, default: False) – pre-populate metadata records to simulate existing registries # noqa

Return type:

AnnData

lamindb.examples.datasets.anndata_human_immune_cells(populate_registries=False)

Cross-tissue immune cell analysis reveals tissue-specific features in humans.

From: https://cellxgene.cziscience.com/collections/62ef75e4-cbea-454e-a0ce-998ec40223d3 Collection: Global

Return type:

AnnData

To reproduce the subsample::
>>> adata = sc.read('Global.h5ad')
>>> adata.obs = adata.obs[['donor_id', 'tissue', 'cell_type', 'assay', 'tissue_ontology_term_id', 'cell_type_ontology_term_id', 'assay_ontology_term_id']].copy()
>>> sc.pp.subsample(adata, fraction=0.005)
>>> del adata.uns["development_cache_ontology_term_id_colors"]
>>> del adata.uns["sex_ontology_term_id_colors"]
>>> adata.write('human_immune.h5ad')
lamindb.examples.datasets.anndata_pbmc68k_reduced()

Modified from scanpy.collections.pbmc68k_reduced().

This code was run:

pbmc68k = sc.collections.pbmc68k_reduced()
pbmc68k.obs.rename(columns={"bulk_labels": "cell_type"}, inplace=True)
pbmc68k.obs["cell_type"] = pbmc68k.obs["cell_type"].cat.rename_categories(
    {"Dendritic": "Dendritic cells", "CD14+ Monocyte": "CD14+ Monocytes"}
)
del pbmc68k.obs["G2M_score"]
del pbmc68k.obs["S_score"]
del pbmc68k.obs["phase"]
del pbmc68k.obs["n_counts"]
del pbmc68k.var["dispersions"]
del pbmc68k.var["dispersions_norm"]
del pbmc68k.var["means"]
del pbmc68k.uns["rank_genes_groups"]
del pbmc68k.uns["bulk_labels_colors"]
sc.pp.subsample(pbmc68k, fraction=0.1, random_state=123)
pbmc68k.write("scrnaseq_pbmc68k_tiny.h5ad")
Return type:

AnnData

lamindb.examples.datasets.anndata_file_pbmc68k_test()

Modified from scanpy.collections.pbmc68k_reduced().

Additional slots were added for testing purposes. Returns the filepath.

To reproduce:

pbmc68k = ln.examples.datasets.anndata_pbmc68k_reduced()
pbmc68k_test = pbmc68k[:30, :200].copy()
pbmc68k_test.raw = pbmc68k_test[:, :100]
pbmc68k_test.obsp["test"] = sparse.eye(pbmc68k_test.shape[0], format="csr")
pbmc68k_test.varp["test"] = sparse.eye(pbmc68k_test.shape[1], format="csr")
pbmc68k_test.layers["test"] = sparse.csr_matrix(pbmc68k_test.shape)
pbmc68k_test.layers["test"][0] = 1.
pbmc68k_test.write("pbmc68k_test.h5ad")
Return type:

Path

lamindb.examples.datasets.anndata_pbmc3k_processed()

Modified from scanpy.pbmc3k_processed().

Return type:

AnnData

lamindb.examples.datasets.anndata_with_obs()

Create a mini anndata with cell_type, disease and tissue.

Return type:

AnnData

lamindb.examples.datasets.anndata_suo22_Visium10X()

AnnData from Suo22 generated by 10x Visium.

lamindb.examples.datasets.mudata_papalexi21_subset(with_uns=False)

A subsetted mudata from papalexi21.

Return type:

MuData

To reproduce the subsetting:
>>> !wget https://figshare.com/ndownloader/files/36509460
>>> import mudata as md
>>> import scanpy as sc
>>> mdata = md.read_h5mu("36509460")
>>> mdata = sc.pp.subsample(mdata, n_obs=200, copy=True)[0]
>>> mdata[:, -300:].copy().write("papalexi21_subset_200x300_lamindb_demo_2023-07-25.h5mu")
lamindb.examples.datasets.schmidt22_crispra_gws_IFNG(basedir='.')

CRISPRi screen collection of Schmidt22.

Originally from: https://zenodo.org/record/5784651

Return type:

Path

lamindb.examples.datasets.schmidt22_perturbseq(basedir='.')

Perturb-seq collection of Schmidt22.

Subsampled and converted to h5ad from R file: https://zenodo.org/record/5784651

To reproduce the subsample: >>> adata = sc.read(‘HuTcellsCRISPRaPerturbSeq_Re-stimulated.h5ad’) >>> adata.obs = adata.obs[[‘cluster_name’]] >>> del adata.obsp >>> del adata.var[‘features’] >>> del adata.obsm[‘X_pca’] >>> del adata.uns >>> del adata.raw >>> del adata.varm >>> adata.obs = adata.obs.reset_index() >>> del adata.obs[‘index’] >>> sc.pp.subsample(adata, 0.03) >>> adata.write(‘schmidt22_perturbseq.h5ad’)

Return type:

Path

lamindb.examples.datasets.spatialdata_blobs()

Example SpatialData dataset for tutorials.

Return type:

SpatialData

Other

lamindb.examples.datasets.fake_bio_notebook_titles(n=100)

A fake collection of study titles.

Return type:

list[str]