lamindb.examples.datasets¶
Example datasets.
The mini immuno dataset¶
- lamindb.examples.datasets.mini_immuno()¶
The two “mini immuno” datasets.
Small in-memory datasets¶
- lamindb.examples.datasets.anndata_with_obs()¶
Create a mini anndata with cell_type, disease and tissue.
- Return type:
AnnData
Files¶
- lamindb.examples.datasets.file_fcs()¶
Example FCS artifact.
- Return type:
Path
- lamindb.examples.datasets.file_fcs_alpert19(populate_registries=False)¶
FCS file from Alpert19.
- Parameters:
populate_registries (
bool, default:False) – pre-populate metadata records to simulate existing registries # noqa- Return type:
Path
- lamindb.examples.datasets.file_tsv_rnaseq_nfcore_salmon_merged_gene_counts(populate_registries=False)¶
Gene counts table from nf-core RNA-seq pipeline.
Output of: https://nf-co.re/rnaseq
- Return type:
Path
- lamindb.examples.datasets.file_jpg_paradisi05()¶
Return jpg file example.
Originally from: https://upload.wikimedia.org/wikipedia/commons/2/28/Laminopathic_nuclei.jpg
- Return type:
Path
- lamindb.examples.datasets.file_tiff_suo22()¶
Image file from Suo22.
Pair with anndata_suo22_Visium10X
- Return type:
Path
- lamindb.examples.datasets.file_fastq(in_storage_root=False)¶
Mini mock fastq artifact.
- Return type:
Path
- lamindb.examples.datasets.file_bam(in_storage_root=False)¶
Mini mock bam artifact.
- Return type:
Path
- lamindb.examples.datasets.file_mini_csv(in_storage_root=False)¶
Mini csv artifact.
- Return type:
Path
Directories¶
- lamindb.examples.datasets.dir_scrnaseq_cellranger(sample_name, basedir='./', output_only=True)¶
Generate mock cell ranger outputs.
- Parameters:
sample_name (
str) – name of the samplebasedir (
str|Path, default:'./') – run directoryoutput_only (
bool, default:True) – only generate output files
- lamindb.examples.datasets.dir_iris_images()¶
Directory with 3 studies of the Iris flower: 405 images & metadata.
Provenance: https://lamin.ai/laminlabs/lamindata/transform/3q4MpQxRL2qZ5zKv
The problem is that the same artifact was also ingested by the downstream demo notebook: https://lamin.ai/laminlabs/lamindata/transform/NJvdsWWbJlZS5zKv
This is why on the UI, the artifact shows up as output of the downstream demo notebook rather than the upstream curation notebook. The lineage information should still be captured by laminlabs/lnschema-core but we don’t use this in the UI yet.
- Return type:
Dictionary, Dataframe, AnnData, MuData, SpatialData¶
- lamindb.examples.datasets.dict_cellxgene_uns()¶
An example CELLxGENE AnnData
.unsdictionary.- Return type:
dict[str,Any]
- lamindb.examples.datasets.df_iris()¶
The iris collection as in sklearn.
Original code:
sklearn.collections.load_iris(as_frame=True).frame
- Return type:
DataFrame
- lamindb.examples.datasets.df_iris_in_meter()¶
The iris collection with lengths in meter.
- Return type:
DataFrame
- lamindb.examples.datasets.df_iris_in_meter_study1()¶
The iris collection with lengths in meter.
- Return type:
DataFrame
- lamindb.examples.datasets.df_iris_in_meter_study2()¶
The iris collection with lengths in meter.
- Return type:
DataFrame
- lamindb.examples.datasets.anndata_mouse_sc_lymph_node(populate_registries=False)¶
Mouse lymph node scRNA-seq collection from EBI.
Subsampled to 10k genes.
From: https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-8414/
- Parameters:
populate_registries (
bool, default:False) – pre-populate metadata records to simulate existing registries # noqa- Return type:
AnnData
- lamindb.examples.datasets.anndata_human_immune_cells(populate_registries=False)¶
Cross-tissue immune cell analysis reveals tissue-specific features in humans.
From: https://cellxgene.cziscience.com/collections/62ef75e4-cbea-454e-a0ce-998ec40223d3 Collection: Global
- Return type:
AnnData
- To reproduce the subsample::
>>> adata = sc.read('Global.h5ad') >>> adata.obs = adata.obs[['donor_id', 'tissue', 'cell_type', 'assay', 'tissue_ontology_term_id', 'cell_type_ontology_term_id', 'assay_ontology_term_id']].copy() >>> sc.pp.subsample(adata, fraction=0.005) >>> del adata.uns["development_cache_ontology_term_id_colors"] >>> del adata.uns["sex_ontology_term_id_colors"] >>> adata.write('human_immune.h5ad')
- lamindb.examples.datasets.anndata_pbmc68k_reduced()¶
Modified from scanpy.collections.pbmc68k_reduced().
This code was run:
pbmc68k = sc.collections.pbmc68k_reduced() pbmc68k.obs.rename(columns={"bulk_labels": "cell_type"}, inplace=True) pbmc68k.obs["cell_type"] = pbmc68k.obs["cell_type"].cat.rename_categories( {"Dendritic": "Dendritic cells", "CD14+ Monocyte": "CD14+ Monocytes"} ) del pbmc68k.obs["G2M_score"] del pbmc68k.obs["S_score"] del pbmc68k.obs["phase"] del pbmc68k.obs["n_counts"] del pbmc68k.var["dispersions"] del pbmc68k.var["dispersions_norm"] del pbmc68k.var["means"] del pbmc68k.uns["rank_genes_groups"] del pbmc68k.uns["bulk_labels_colors"] sc.pp.subsample(pbmc68k, fraction=0.1, random_state=123) pbmc68k.write("scrnaseq_pbmc68k_tiny.h5ad")
- Return type:
AnnData
- lamindb.examples.datasets.anndata_file_pbmc68k_test()¶
Modified from scanpy.collections.pbmc68k_reduced().
Additional slots were added for testing purposes. Returns the filepath.
To reproduce:
pbmc68k = ln.examples.datasets.anndata_pbmc68k_reduced() pbmc68k_test = pbmc68k[:30, :200].copy() pbmc68k_test.raw = pbmc68k_test[:, :100] pbmc68k_test.obsp["test"] = sparse.eye(pbmc68k_test.shape[0], format="csr") pbmc68k_test.varp["test"] = sparse.eye(pbmc68k_test.shape[1], format="csr") pbmc68k_test.layers["test"] = sparse.csr_matrix(pbmc68k_test.shape) pbmc68k_test.layers["test"][0] = 1. pbmc68k_test.write("pbmc68k_test.h5ad")
- Return type:
Path
- lamindb.examples.datasets.anndata_pbmc3k_processed()¶
Modified from scanpy.pbmc3k_processed().
- Return type:
AnnData
- lamindb.examples.datasets.anndata_with_obs()¶
Create a mini anndata with cell_type, disease and tissue.
- Return type:
AnnData
- lamindb.examples.datasets.anndata_suo22_Visium10X()¶
AnnData from Suo22 generated by 10x Visium.
- lamindb.examples.datasets.mudata_papalexi21_subset(with_uns=False)¶
A subsetted mudata from papalexi21.
- Return type:
MuData
- To reproduce the subsetting:
>>> !wget https://figshare.com/ndownloader/files/36509460 >>> import mudata as md >>> import scanpy as sc >>> mdata = md.read_h5mu("36509460") >>> mdata = sc.pp.subsample(mdata, n_obs=200, copy=True)[0] >>> mdata[:, -300:].copy().write("papalexi21_subset_200x300_lamindb_demo_2023-07-25.h5mu")
- lamindb.examples.datasets.schmidt22_crispra_gws_IFNG(basedir='.')¶
CRISPRi screen collection of Schmidt22.
Originally from: https://zenodo.org/record/5784651
- Return type:
Path
- lamindb.examples.datasets.schmidt22_perturbseq(basedir='.')¶
Perturb-seq collection of Schmidt22.
Subsampled and converted to h5ad from R file: https://zenodo.org/record/5784651
To reproduce the subsample: >>> adata = sc.read(‘HuTcellsCRISPRaPerturbSeq_Re-stimulated.h5ad’) >>> adata.obs = adata.obs[[‘cluster_name’]] >>> del adata.obsp >>> del adata.var[‘features’] >>> del adata.obsm[‘X_pca’] >>> del adata.uns >>> del adata.raw >>> del adata.varm >>> adata.obs = adata.obs.reset_index() >>> del adata.obs[‘index’] >>> sc.pp.subsample(adata, 0.03) >>> adata.write(‘schmidt22_perturbseq.h5ad’)
- Return type:
Path
- lamindb.examples.datasets.spatialdata_blobs()¶
Example SpatialData dataset for tutorials.
- Return type:
SpatialData
Other¶
- lamindb.examples.datasets.fake_bio_notebook_titles(n=100)¶
A fake collection of study titles.
- Return type:
list[str]