Multi-modal¶
Here, we’ll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects.
ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.
MuData objects build on top of AnnData objects to store multimodal data.
%load_ext autoreload
%autoreload 2
# !pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-multimodal --schema bionty
Show code cell output
→ connected lamindb: testuser1/test-multimodal
import lamindb as ln
import bionty as bt
Show code cell output
→ connected lamindb: testuser1/test-multimodal
mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata
Show code cell output
MuData object with n_obs × n_vars = 200 × 300 obs: 'perturbation', 'replicate' var: 'name' 4 modalities rna: 200 x 173 obs: 'nCount_RNA', 'nFeature_RNA', 'percent.mito' var: 'name' adt: 200 x 4 obs: 'nCount_ADT', 'nFeature_ADT' var: 'name' hto: 200 x 12 obs: 'nCount_HTO', 'nFeature_HTO', 'technique' var: 'name' gdo: 200 x 111 obs: 'nCount_GDO' var: 'name'
Validate annotations¶
curate = ln.Curator.from_mudata(
mdata,
var_index={
"rna": bt.Gene.symbol, # gene expression
"adt": bt.CellMarker.name, # antibody derived tags reflecting surface proteins
"hto": ln.Feature.name, # cell hashing
"gdo": ln.Feature.name, # guide RNAs
},
categoricals={
"perturbation": ln.ULabel.name, # shared categorical
"replicate": ln.ULabel.name, # shared categorical
"hto:technique": bt.ExperimentalFactor.name, # note this is a modality specific categorical
},
organism="human",
)
Show code cell output
✓ added 2 records with Feature.name for "columns": 'perturbation', 'replicate'
! indexing datasets with gene symbols can be problematic: https://docs.lamin.ai/faq/symbol-mapping
✓ added 1 record with Feature.name for "columns": 'technique'
# optional: register additional columns we'd like to curate
curate.add_new_from_columns(modality="rna")
curate.add_new_from_columns(modality="adt")
curate.add_new_from_columns(modality="hto")
curate.add_new_from_columns(modality="gdo")
Show code cell output
/tmp/ipykernel_3815/1003816735.py:2: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
curate.add_new_from_columns(modality="rna")
/tmp/ipykernel_3815/1003816735.py:3: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
curate.add_new_from_columns(modality="adt")
/tmp/ipykernel_3815/1003816735.py:4: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
curate.add_new_from_columns(modality="hto")
/tmp/ipykernel_3815/1003816735.py:5: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
curate.add_new_from_columns(modality="gdo")
curate.validate()
Show code cell output
• saving validated records of 'var_index'
• saving validated records of 'var_index'
• saving validated records of 'technique'
• validating categoricals in "obs"...
• mapping "perturbation" on ULabel.name
! 2 terms are not validated: 'Perturbed', 'NT'
→ fix typos, remove non-existent values, or save terms via .add_new_from("perturbation")
• mapping "replicate" on ULabel.name
! 3 terms are not validated: 'rep3', 'rep1', 'rep2'
→ fix typos, remove non-existent values, or save terms via .add_new_from("replicate")
• validating categoricals in modality "adt"...
✓ "var_index" is validated against CellMarker.name
• validating categoricals in modality "rna"...
• mapping "var_index" on Gene.symbol
! 96 terms are not validated: 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'CTC-467M3.1', 'HIST1H4K', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'CASC1', 'RP11-324E6.9', ...
12 synonyms found: "CTC-467M3.1" → "MEF2C-AS2", "HIST1H4K" → "H4C12", "CASC1" → "DNAI7", "LARGE" → "LARGE1", "NBPF16" → "NBPF15", "C1orf65" → "CCDC185", "IBA57-AS1" → "IBA57-DT", "KIAA1239" → "NWD2", "TMEM75" → "LINC02912", "AP003419.16" → "RPS6KB2-AS1", "FAM65C" → "RIPOR3", "C14orf177" → "LINC02914"
→ curate synonyms via .standardize("var_index") for remaining terms:
→ fix typos, remove non-existent values, or save terms via .add_new_from_var_index()
• validating categoricals in modality "hto"...
• mapping "var_index" on Feature.name
! 12 terms are not validated: 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
→ fix typos, remove non-existent values, or save terms via .add_new_from_var_index()
✓ "technique" is validated against ExperimentalFactor.name
• validating categoricals in modality "gdo"...
• mapping "var_index" on Feature.name
! 111 terms are not validated: 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4', ...
→ fix typos, remove non-existent values, or save terms via .add_new_from_var_index()
False
# add new var index
curate.add_new_from_var_index("rna")
curate.add_new_from_var_index("hto")
curate.add_new_from_var_index("gdo")
# add new categories
curate.add_new_from("perturbation")
curate.add_new_from("replicate")
Show code cell output
✓ added 96 records with Gene.symbol for "var_index": 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'CTC-467M3.1', 'HIST1H4K', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'CASC1', 'RP11-324E6.9', ...
✓ added 12 records with Feature.name for "var_index": 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
✓ added 111 records with Feature.name for "var_index": 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4', ...
✓ added 2 records with ULabel.name for "perturbation": 'Perturbed', 'NT'
✓ added 3 records with ULabel.name for "replicate": 'rep3', 'rep2', 'rep1'
curate.validate()
Show code cell output
• validating categoricals in "obs"...
✓ "perturbation" is validated against ULabel.name
✓ "replicate" is validated against ULabel.name
• validating categoricals in modality "adt"...
✓ "var_index" is validated against CellMarker.name
• validating categoricals in modality "rna"...
✓ "var_index" is validated against Gene.symbol
• validating categoricals in modality "hto"...
✓ "var_index" is validated against Feature.name
✓ "technique" is validated against ExperimentalFactor.name
• validating categoricals in modality "gdo"...
✓ "var_index" is validated against Feature.name
True
Register curated artifact¶
artifact = curate.save_artifact(description="Sub-sampled MuData from Papalexi21")
Show code cell output
! no run & transform got linked, call `ln.track()` & re-run
! run input wasn't tracked, call `ln.track()` and re-run
! did not create Feature records for 37 non-validated names: 'adt:G2M.Score', 'adt:HTO_classification', 'adt:MULTI_ID', 'adt:NT', 'adt:Phase', 'adt:S.Score', 'adt:gene_target', 'adt:guide_ID', 'adt:orig.ident', 'adt:percent.mito', 'adt:perturbation', 'adt:replicate', 'gdo:G2M.Score', 'gdo:HTO_classification', 'gdo:MULTI_ID', 'gdo:NT', 'gdo:Phase', 'gdo:S.Score', 'gdo:gene_target', 'gdo:guide_ID', ...
! 3 unique terms (100.00%) are not validated for name: 'nCount_RNA', 'nFeature_RNA', 'percent.mito'
! skip linking features to artifact in slot 'obs'
! 2 unique terms (100.00%) are not validated for name: 'nCount_ADT', 'nFeature_ADT'
! skip linking features to artifact in slot 'obs'
! 2 unique terms (66.70%) are not validated for name: 'nCount_HTO', 'nFeature_HTO'
! did not create Feature records for 2 non-validated names: 'nCount_HTO', 'nFeature_HTO'
! 1 unique term (100.00%) is not validated for name: 'nCount_GDO'
! skip linking features to artifact in slot 'obs'
artifact.describe()
Show code cell output
Artifact .h5mu/MuData ├── General │ ├── .uid = 'E23fsaGkZ4qrRdDF0000' │ ├── .size = 549984 │ ├── .hash = 'aFIJ7G9AIcxoEib8kecChw' │ ├── .n_observations = 200 │ ├── .path = │ │ /home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal/.lamindb/E23fsaGkZ4qrRdDF0000.h5mu │ ├── .created_by = testuser1 (Test User1) │ └── .created_at = 2024-12-20 15:08:00 ├── Dataset features/.feature_sets │ ├── obs • 2 [Feature] │ │ perturbation cat[ULabel] NT, Perturbed │ │ replicate cat[ULabel] rep1, rep2, rep3 │ ├── ['rna'].var • 184 [bionty.Gene] │ │ SH2D6 float │ │ ARHGAP26-AS1 float │ │ GABRA1 float │ │ HLA-DQB1-AS1 float │ │ HLA-DQB1-AS1 float │ │ HLA-DQB1-AS1 float │ │ HLA-DQB1-AS1 float │ │ HLA-DQB1-AS1 float │ │ HLA-DQB1-AS1 float │ │ HLA-DQB1-AS1 float │ │ SPACA1 float │ │ VNN1 float │ │ CTAGE15 float │ │ CTAGE15 float │ │ PFKFB1 float │ │ TRPC5 float │ │ RBPMS-AS1 float │ │ CA8 float │ │ CSMD3 float │ │ ZNF483 float │ ├── ['adt'].var • 4 [bionty.CellMarker] │ │ CD86 float │ │ PDL1 float │ │ PDL2 float │ │ CD366 float │ ├── ['hto'].var • 12 [Feature] │ │ rep1-tx cat │ │ rep1-ctrl cat │ │ rep2-tx cat │ │ rep2-ctrl cat │ │ PDL1g1-tx cat │ │ PDL1g1-ctrl cat │ │ PDL1g2-tx cat │ │ PDL1g2-ctrl cat │ │ rep3-tx cat │ │ rep3-ctrl cat │ │ rep4-tx cat │ │ rep4-ctrl cat │ ├── ['hto'].obs • 1 [Feature] │ │ technique cat[bionty.ExperimentalF… cell hashing │ └── ['gdo'].var • 111 [Feature] │ eGFPg1 cat │ CUL3g1 cat │ CUL3g2 cat │ CUL3g3 cat │ CMTM6g1 cat │ CMTM6g2 cat │ CMTM6g3 cat │ NTg1 cat │ NTg2 cat │ NTg3 cat │ NTg4 cat │ NTg5 cat │ NTg7 cat │ PDL1g1 cat │ PDL1g2 cat │ PDL1g3 cat │ ATF2g1 cat │ ATF2g2 cat │ ATF2g3 cat │ ATF2g4 cat └── Labels └── .experimental_factors bionty.ExperimentalFactor cell hashing .ulabels ULabel Perturbed, NT, rep3, rep2, rep1
# clean up test instance
!rm -r test-multimodal
!lamin delete --force test-multimodal
Show code cell output
• deleting instance testuser1/test-multimodal