Jupyter Notebook

Multi-modal

Here, we’ll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects.

ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.

MuData objects build on top of AnnData objects to store multimodal data.

%load_ext autoreload
%autoreload 2
# !pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-multimodal --schema bionty
Hide code cell output
→ connected lamindb: testuser1/test-multimodal
import lamindb as ln
import bionty as bt
Hide code cell output
→ connected lamindb: testuser1/test-multimodal
mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata
Hide code cell output
MuData object with n_obs × n_vars = 200 × 300
  obs:	'perturbation', 'replicate'
  var:	'name'
  4 modalities
    rna:	200 x 173
      obs:	'nCount_RNA', 'nFeature_RNA', 'percent.mito'
      var:	'name'
    adt:	200 x 4
      obs:	'nCount_ADT', 'nFeature_ADT'
      var:	'name'
    hto:	200 x 12
      obs:	'nCount_HTO', 'nFeature_HTO', 'technique'
      var:	'name'
    gdo:	200 x 111
      obs:	'nCount_GDO'
      var:	'name'

Validate annotations

curate = ln.Curator.from_mudata(
    mdata,
    var_index={
        "rna": bt.Gene.symbol,  # gene expression
        "adt": bt.CellMarker.name,  # antibody derived tags reflecting surface proteins
        "hto": ln.Feature.name,  # cell hashing
        "gdo": ln.Feature.name,  # guide RNAs
    },
    categoricals={
        "perturbation": ln.ULabel.name,  # shared categorical
        "replicate": ln.ULabel.name,  # shared categorical
        "hto:technique": bt.ExperimentalFactor.name,  # note this is a modality specific categorical
    },
    organism="human",
)
Hide code cell output
✓ added 1 record with Feature.name for columns: 'technique'
✓ added 2 records with Feature.name for columns: 'perturbation', 'replicate'
# optional: register additional columns we'd like to curate
curate.add_new_from_columns(modality="rna")
curate.add_new_from_columns(modality="adt")
curate.add_new_from_columns(modality="hto")
curate.add_new_from_columns(modality="gdo")
Hide code cell output
✓ added 3 records with Feature.name for rna obs columns: 'nCount_RNA', 'nFeature_RNA', 'percent.mito'
✓ added 2 records with Feature.name for adt obs columns: 'nCount_ADT', 'nFeature_ADT'
✓ added 2 records with Feature.name for hto obs columns: 'nCount_HTO', 'nFeature_HTO'
✓ added 1 record with Feature.name for gdo obs columns: 'nCount_GDO'
curate.validate()
Hide code cell output
• saving validated records of 'var_index'
• saving validated records of 'var_index'
• saving validated records of 'technique'
• mapping rna_var_index on Gene.symbol
!    84 terms are not validated: 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'RP11-524H19.2', 'AC006042.7', ...
→ fix typos, remove non-existent values, or save terms via .add_new_from_var_index("rna")
✓ 'adt_var_index' is validated against CellMarker.name
• mapping hto_var_index on Feature.name
!    12 terms are not validated: 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', ...
→ fix typos, remove non-existent values, or save terms via .add_new_from_var_index("hto")
• mapping gdo_var_index on Feature.name
!    111 terms are not validated: 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', ...
→ fix typos, remove non-existent values, or save terms via .add_new_from_var_index("gdo")
• mapping perturbation on ULabel.name
!    2 terms are not validated: 'Perturbed', 'NT'
→ fix typos, remove non-existent values, or save terms via .add_new_from('perturbation')
• mapping replicate on ULabel.name
!    3 terms are not validated: 'rep3', 'rep1', 'rep2'
→ fix typos, remove non-existent values, or save terms via .add_new_from('replicate')
✓ 'technique' is validated against ExperimentalFactor.name
False
# add new var index
curate.add_new_from_var_index("rna")
curate.add_new_from_var_index("hto")
curate.add_new_from_var_index("gdo")

# add new categories
curate.add_new_from("perturbation")
curate.add_new_from("replicate")
Hide code cell output
✓ added 84 records with Gene.symbol for var_index: 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'RP11-324E6.9', 'RP11-187A9.3', 'RP11-365N19.2', 'RP11-346D14.1', ...
✓ added 12 records with Feature.name for var_index: 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
✓ added 111 records with Feature.name for var_index: 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4', ...
✓ added 2 records with ULabel.name for perturbation: 'Perturbed', 'NT'
✓ added 3 records with ULabel.name for replicate: 'rep1', 'rep2', 'rep3'
curate.validate()
Hide code cell output
✓ 'rna_var_index' is validated against Gene.symbol
✓ 'adt_var_index' is validated against CellMarker.name
✓ 'hto_var_index' is validated against Feature.name
✓ 'gdo_var_index' is validated against Feature.name
✓ 'perturbation' is validated against ULabel.name
✓ 'replicate' is validated against ULabel.name
✓ 'technique' is validated against ExperimentalFactor.name
True

Register curated artifact

artifact = curate.save_artifact(description="Sub-sampled MuData from Papalexi21")
Hide code cell output
! no run & transform got linked, call `ln.track()` & re-run
! run input wasn't tracked, call `ln.track()` and re-run
! did not create Feature records for 37 non-validated names: 'adt:G2M.Score', 'adt:HTO_classification', 'adt:MULTI_ID', 'adt:NT', 'adt:Phase', 'adt:S.Score', 'adt:gene_target', 'adt:guide_ID', 'adt:orig.ident', 'adt:percent.mito', 'adt:perturbation', 'adt:replicate', 'gdo:G2M.Score', 'gdo:HTO_classification', 'gdo:MULTI_ID', 'gdo:NT', 'gdo:Phase', 'gdo:S.Score', 'gdo:gene_target', 'gdo:guide_ID', ...
!    12 unique terms (6.90%) are not validated for symbol: 'CTC-467M3.1', 'HIST1H4K', 'CASC1', 'LARGE', 'NBPF16', 'C1orf65', 'IBA57-AS1', 'KIAA1239', 'TMEM75', 'AP003419.16', ...
artifact.describe()
Hide code cell output
Artifact(uid='HlP2JIW0IdatAR610000', is_latest=True, description='Sub-sampled MuData from Papalexi21', suffix='.h5mu', type='dataset', size=549984, hash='aFIJ7G9AIcxoEib8kecChw', n_observations=200, _hash_type='md5', _accessor='MuData', visibility=1, _key_is_virtual=True, created_at=2024-11-21 06:56:49 UTC)
  Provenance
    .storage = '/home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal'
    .created_by = 'testuser1'
  Labels
    .experimental_factors = 'cell hashing'
    .ulabels = 'Perturbed', 'NT', 'rep1', 'rep2', 'rep3'
  Features
    'perturbation' = 'NT', 'Perturbed'
    'replicate' = 'rep1', 'rep2', 'rep3'
    'technique' = 'cell hashing'
  Feature sets
    'obs' = 'perturbation', 'replicate'
    '['rna'].var' = 'SH2D6', 'ARHGAP26-AS1', 'GABRA1', 'HLA-DQB1-AS1', 'SPACA1', 'VNN1', 'CTAGE15', 'PFKFB1', 'TRPC5', 'RBPMS-AS1', 'CA8', 'CSMD3', 'ZNF483'
    '['rna'].obs' = 'nCount_RNA', 'nFeature_RNA', 'percent.mito'
    '['adt'].var' = 'CD86', 'PDL1', 'PDL2', 'CD366'
    '['adt'].obs' = 'nCount_ADT', 'nFeature_ADT'
    '['hto'].var' = 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
    '['hto'].obs' = 'technique', 'nCount_HTO', 'nFeature_HTO'
    '['gdo'].var' = 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4'
    '['gdo'].obs' = 'nCount_GDO'
# clean up test instance
!rm -r test-multimodal
!lamin delete --force test-multimodal
Hide code cell output
• deleting instance testuser1/test-multimodal