Multi-modal¶
Here, we’ll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects.
ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.
MuData objects build on top of AnnData objects to store multimodal data.
%load_ext autoreload
%autoreload 2
# !pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-multimodal --schema bionty
Show code cell output
→ connected lamindb: testuser1/test-multimodal
import lamindb as ln
import bionty as bt
Show code cell output
→ connected lamindb: testuser1/test-multimodal
mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata
Show code cell output
MuData object with n_obs × n_vars = 200 × 300 obs: 'perturbation', 'replicate' var: 'name' 4 modalities rna: 200 x 173 obs: 'nCount_RNA', 'nFeature_RNA', 'percent.mito' var: 'name' adt: 200 x 4 obs: 'nCount_ADT', 'nFeature_ADT' var: 'name' hto: 200 x 12 obs: 'nCount_HTO', 'nFeature_HTO', 'technique' var: 'name' gdo: 200 x 111 obs: 'nCount_GDO' var: 'name'
Validate annotations¶
curate = ln.Curator.from_mudata(
mdata,
var_index={
"rna": bt.Gene.symbol, # gene expression
"adt": bt.CellMarker.name, # antibody derived tags reflecting surface proteins
"hto": ln.Feature.name, # cell hashing
"gdo": ln.Feature.name, # guide RNAs
},
categoricals={
"perturbation": ln.ULabel.name, # shared categorical
"replicate": ln.ULabel.name, # shared categorical
"hto:technique": bt.ExperimentalFactor.name, # note this is a modality specific categorical
},
organism="human",
)
Show code cell output
✓ added 1 record with Feature.name for columns: 'technique'
✓ added 2 records with Feature.name for columns: 'perturbation', 'replicate'
# optional: register additional columns we'd like to curate
curate.add_new_from_columns(modality="rna")
curate.add_new_from_columns(modality="adt")
curate.add_new_from_columns(modality="hto")
curate.add_new_from_columns(modality="gdo")
Show code cell output
✓ added 3 records with Feature.name for rna obs columns: 'nCount_RNA', 'nFeature_RNA', 'percent.mito'
✓ added 2 records with Feature.name for adt obs columns: 'nCount_ADT', 'nFeature_ADT'
✓ added 2 records with Feature.name for hto obs columns: 'nCount_HTO', 'nFeature_HTO'
✓ added 1 record with Feature.name for gdo obs columns: 'nCount_GDO'
curate.validate()
Show code cell output
• saving validated records of 'var_index'
• saving validated records of 'var_index'
• saving validated records of 'technique'
• mapping rna_var_index on Gene.symbol
! 84 terms are not validated: 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'RP11-524H19.2', 'AC006042.7', ...
→ fix typos, remove non-existent values, or save terms via .add_new_from_var_index("rna")
✓ 'adt_var_index' is validated against CellMarker.name
• mapping hto_var_index on Feature.name
! 12 terms are not validated: 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', ...
→ fix typos, remove non-existent values, or save terms via .add_new_from_var_index("hto")
• mapping gdo_var_index on Feature.name
! 111 terms are not validated: 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', ...
→ fix typos, remove non-existent values, or save terms via .add_new_from_var_index("gdo")
• mapping perturbation on ULabel.name
! 2 terms are not validated: 'Perturbed', 'NT'
→ fix typos, remove non-existent values, or save terms via .add_new_from('perturbation')
• mapping replicate on ULabel.name
! 3 terms are not validated: 'rep3', 'rep1', 'rep2'
→ fix typos, remove non-existent values, or save terms via .add_new_from('replicate')
✓ 'technique' is validated against ExperimentalFactor.name
False
# add new var index
curate.add_new_from_var_index("rna")
curate.add_new_from_var_index("hto")
curate.add_new_from_var_index("gdo")
# add new categories
curate.add_new_from("perturbation")
curate.add_new_from("replicate")
Show code cell output
✓ added 84 records with Gene.symbol for var_index: 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'RP11-324E6.9', 'RP11-187A9.3', 'RP11-365N19.2', 'RP11-346D14.1', ...
✓ added 12 records with Feature.name for var_index: 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
✓ added 111 records with Feature.name for var_index: 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4', ...
✓ added 2 records with ULabel.name for perturbation: 'Perturbed', 'NT'
✓ added 3 records with ULabel.name for replicate: 'rep1', 'rep2', 'rep3'
curate.validate()
Show code cell output
✓ 'rna_var_index' is validated against Gene.symbol
✓ 'adt_var_index' is validated against CellMarker.name
✓ 'hto_var_index' is validated against Feature.name
✓ 'gdo_var_index' is validated against Feature.name
✓ 'perturbation' is validated against ULabel.name
✓ 'replicate' is validated against ULabel.name
✓ 'technique' is validated against ExperimentalFactor.name
True
Register curated artifact¶
artifact = curate.save_artifact(description="Sub-sampled MuData from Papalexi21")
Show code cell output
! no run & transform got linked, call `ln.track()` & re-run
! run input wasn't tracked, call `ln.track()` and re-run
! did not create Feature records for 37 non-validated names: 'adt:G2M.Score', 'adt:HTO_classification', 'adt:MULTI_ID', 'adt:NT', 'adt:Phase', 'adt:S.Score', 'adt:gene_target', 'adt:guide_ID', 'adt:orig.ident', 'adt:percent.mito', 'adt:perturbation', 'adt:replicate', 'gdo:G2M.Score', 'gdo:HTO_classification', 'gdo:MULTI_ID', 'gdo:NT', 'gdo:Phase', 'gdo:S.Score', 'gdo:gene_target', 'gdo:guide_ID', ...
! 12 unique terms (6.90%) are not validated for symbol: 'CTC-467M3.1', 'HIST1H4K', 'CASC1', 'LARGE', 'NBPF16', 'C1orf65', 'IBA57-AS1', 'KIAA1239', 'TMEM75', 'AP003419.16', ...
artifact.describe()
Show code cell output
Artifact(uid='HlP2JIW0IdatAR610000', is_latest=True, description='Sub-sampled MuData from Papalexi21', suffix='.h5mu', type='dataset', size=549984, hash='aFIJ7G9AIcxoEib8kecChw', n_observations=200, _hash_type='md5', _accessor='MuData', visibility=1, _key_is_virtual=True, created_at=2024-11-21 06:56:49 UTC)
Provenance
.storage = '/home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal'
.created_by = 'testuser1'
Labels
.experimental_factors = 'cell hashing'
.ulabels = 'Perturbed', 'NT', 'rep1', 'rep2', 'rep3'
Features
'perturbation' = 'NT', 'Perturbed'
'replicate' = 'rep1', 'rep2', 'rep3'
'technique' = 'cell hashing'
Feature sets
'obs' = 'perturbation', 'replicate'
'['rna'].var' = 'SH2D6', 'ARHGAP26-AS1', 'GABRA1', 'HLA-DQB1-AS1', 'SPACA1', 'VNN1', 'CTAGE15', 'PFKFB1', 'TRPC5', 'RBPMS-AS1', 'CA8', 'CSMD3', 'ZNF483'
'['rna'].obs' = 'nCount_RNA', 'nFeature_RNA', 'percent.mito'
'['adt'].var' = 'CD86', 'PDL1', 'PDL2', 'CD366'
'['adt'].obs' = 'nCount_ADT', 'nFeature_ADT'
'['hto'].var' = 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
'['hto'].obs' = 'technique', 'nCount_HTO', 'nFeature_HTO'
'['gdo'].var' = 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4'
'['gdo'].obs' = 'nCount_GDO'
# clean up test instance
!rm -r test-multimodal
!lamin delete --force test-multimodal
Show code cell output
• deleting instance testuser1/test-multimodal