Analysis flow¶
Here, we’ll track typical data transformations like subsetting that occur during analysis.
# pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-analysis-flow --modules bionty
import lamindb as ln
import bionty as bt
→ connected lamindb: testuser1/test-analysis-flow
Save an initial dataset¶
register_example_file.py¶
import lamindb as ln
import bionty as bt
ln.track("K4wsS5DTYdFp0000")
# an example dataset that has a few cell type, tissue and disease annotations
adata = ln.core.datasets.anndata_with_obs()
# validate and register features
curate = ln.Curator.from_anndata(
adata,
var_index=bt.Gene.ensembl_gene_id,
categoricals={
"cell_type": bt.CellType.name,
"cell_type_id": bt.CellType.ontology_id,
"tissue": bt.Tissue.name,
"disease": bt.Disease.name,
},
organism="human",
)
curate.add_new_from("cell_type")
curate.validate()
curate.save_artifact(description="anndata with obs")
ln.finish()
!python analysis-flow-scripts/register_example_file.py
Open a dataset, subset it, and register the result¶
Track the current notebook:
ln.track("eNef4Arw8nNM")
artifact = ln.Artifact.get(description="anndata with obs")
artifact.describe()
Get a backed AnnData object¶
adata = artifact.open()
adata
Subset dataset to specific cell types and diseases¶
cell_types = artifact.cell_types.all().distinct().lookup(return_field="name")
diseases = artifact.diseases.all().distinct().lookup(return_field="name")
Create the subset:
subset_obs = adata.obs.cell_type.isin(
[cell_types.t_cell, cell_types.hematopoietic_stem_cell]
) & (adata.obs.disease.isin([diseases.liver_lymphoma, diseases.chronic_kidney_disease]))
adata_subset = adata[subset_obs]
adata_subset
adata_subset.obs[["cell_type", "disease"]].value_counts()
Register the subsetted AnnData:
curate = ln.Curator.from_anndata(
adata_subset.to_memory(),
var_index=bt.Gene.ensembl_gene_id,
categoricals={
"cell_type": bt.CellType.name,
"disease": bt.Disease.name,
"tissue": bt.Tissue.name,
},
organism="human",
)
curate.validate()
artifact = curate.save_artifact(description="anndata with obs subset")
artifact.describe()
Examine data lineage¶
Query a subsetted .h5ad
artifact containing “hematopoietic stem cell” and “T cell”:
cell_types = bt.CellType.lookup()
my_subset = ln.Artifact.filter(
suffix=".h5ad",
description__endswith="subset",
cell_types__in=[
cell_types.hematopoietic_stem_cell,
cell_types.t_cell,
],
).first()
my_subset
Common questions that might arise are:
What is the history of this artifact?
Which features and labels are associated with it?
Which notebook analyzed and registered this artifact?
By whom?
And which artifact is its parent?
Let’s answer this using LaminDB:
artifact.features
Artifact .h5ad · AnnData · dataset └── Dataset features ├── var • 99 [bionty.Gene] │ TSPAN6 float │ TNMD float │ DPM1 float │ SCYL3 float │ FIRRM float │ FGR float │ CFH float │ FUCA2 float │ GCLC float │ NFYA float │ STPG1 float │ NIPAL3 float │ LAS1L float │ ENPP4 float │ SEMA3F float │ CFTR float │ ANKIB1 float │ CYP51A1 float │ KRIT1 float │ RAD52 float └── obs • 4 [Feature] cell_type cat[bionty.CellType] T cell, hematopoietic stem cell disease cat[bionty.Disease] chronic kidney disease, liver lymphoma tissue cat[bionty.Tissue] kidney, liver cell_type_id cat[bionty.CellType]
print("--> What is the lineage of this artifact?\n")
artifact.view_lineage()
print("\n\n--> Which features and labels are associated with it?\n")
print(artifact.features)
print(artifact.labels)
print("\n\n--> Which notebook analyzed and saved this artifact\n")
print(artifact.transform)
print("\n\n--> Who save this artifact?\n")
print(artifact.created_by)
print("\n\n--> Which artifacts were inputs?\n")
display(artifact.run.input_artifacts.df())
--> What is the lineage of this artifact?
--> Which features and labels are associated with it?
Artifact .h5ad · AnnData · dataset └── Dataset features ├── var • 99 [bionty.Gene] │ TSPAN6 float │ TNMD float │ DPM1 float │ SCYL3 float │ FIRRM float │ FGR float │ CFH float │ FUCA2 float │ GCLC float │ NFYA float │ STPG1 float │ NIPAL3 float │ LAS1L float │ ENPP4 float │ SEMA3F float │ CFTR float │ ANKIB1 float │ CYP51A1 float │ KRIT1 float │ RAD52 float └── obs • 4 [Feature] cell_type cat[bionty.CellType] T cell, hematopoietic stem cell disease cat[bionty.Disease] chronic kidney disease, liver lymphoma tissue cat[bionty.Tissue] kidney, liver cell_type_id cat[bionty.CellType]
Artifact .h5ad · AnnData · dataset └── Labels └── .tissues bionty.Tissue kidney, liver .cell_types bionty.CellType T cell, hematopoietic stem cell .diseases bionty.Disease chronic kidney disease, liver lymphoma
--> Which notebook analyzed and saved this artifact
Transform(uid='eNef4Arw8nNM0000', is_latest=True, key='analysis-flow.ipynb', description='Analysis flow', type='notebook', branch_id=1, space_id=1, created_by_id=1, created_at=2025-07-14 06:43:44 UTC)
--> Who save this artifact?
User object (1)
--> Which artifacts were inputs?
uid | key | description | suffix | kind | otype | size | hash | n_files | n_observations | _hash_type | _key_is_virtual | _overwrite_versions | space_id | storage_id | schema_id | version | is_latest | run_id | created_at | created_by_id | _aux | branch_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||||
2 | 7W6s68YULuzFzbOh0000 | None | anndata with obs | .h5ad | dataset | AnnData | 46992 | IJORtcQUSS11QBqD-nTD0A | None | 40 | md5 | True | False | 1 | 1 | None | None | True | 1 | 2025-07-14 06:43:42.949000+00:00 | 1 | {'af': {'0': True}} | 1 |