Jupyter Notebook

Multi-modal

Here, we’ll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects.

ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.

MuData objects build on top of AnnData objects to store multimodal data.

%load_ext autoreload
%autoreload 2
# !pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-multimodal --schema bionty
Hide code cell output
 connected lamindb: testuser1/test-multimodal
import lamindb as ln
import bionty as bt
Hide code cell output
 connected lamindb: testuser1/test-multimodal
mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata
Hide code cell output
MuData object with n_obs × n_vars = 200 × 300
  obs:	'perturbation', 'replicate'
  var:	'name'
  4 modalities
    rna:	200 x 173
      obs:	'nCount_RNA', 'nFeature_RNA', 'percent.mito'
      var:	'name'
    adt:	200 x 4
      obs:	'nCount_ADT', 'nFeature_ADT'
      var:	'name'
    hto:	200 x 12
      obs:	'nCount_HTO', 'nFeature_HTO', 'technique'
      var:	'name'
    gdo:	200 x 111
      obs:	'nCount_GDO'
      var:	'name'

Validate annotations

curate = ln.Curator.from_mudata(
    mdata,
    var_index={
        "rna": bt.Gene.symbol,  # gene expression
        "adt": bt.CellMarker.name,  # antibody derived tags reflecting surface proteins
        "hto": ln.Feature.name,  # cell hashing
        "gdo": ln.Feature.name,  # guide RNAs
    },
    categoricals={
        "perturbation": ln.ULabel.name,  # shared categorical
        "replicate": ln.ULabel.name,  # shared categorical
        "hto:technique": bt.ExperimentalFactor.name,  # note this is a modality specific categorical
    },
    organism="human",
)
Hide code cell output
 added 2 records with Feature.name for "columns": 'perturbation', 'replicate'
! indexing datasets with gene symbols can be problematic: https://docs.lamin.ai/faq/symbol-mapping
 added 1 record with Feature.name for "columns": 'technique'
# optional: register additional columns we'd like to curate
curate.add_new_from_columns(modality="rna")
curate.add_new_from_columns(modality="adt")
curate.add_new_from_columns(modality="hto")
curate.add_new_from_columns(modality="gdo")
Hide code cell output
/tmp/ipykernel_3815/1003816735.py:2: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
  curate.add_new_from_columns(modality="rna")
/tmp/ipykernel_3815/1003816735.py:3: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
  curate.add_new_from_columns(modality="adt")
/tmp/ipykernel_3815/1003816735.py:4: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
  curate.add_new_from_columns(modality="hto")
/tmp/ipykernel_3815/1003816735.py:5: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
  curate.add_new_from_columns(modality="gdo")
curate.validate()
Hide code cell output
 saving validated records of 'var_index'
 saving validated records of 'var_index'
 saving validated records of 'technique'
 validating categoricals in "obs"...
 mapping "perturbation" on ULabel.name
!   2 terms are not validated: 'Perturbed', 'NT'
    → fix typos, remove non-existent values, or save terms via .add_new_from("perturbation")
 mapping "replicate" on ULabel.name
!   3 terms are not validated: 'rep3', 'rep1', 'rep2'
    → fix typos, remove non-existent values, or save terms via .add_new_from("replicate")

 validating categoricals in modality "adt"...
 "var_index" is validated against CellMarker.name

 validating categoricals in modality "rna"...
 mapping "var_index" on Gene.symbol
!   96 terms are not validated: 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'CTC-467M3.1', 'HIST1H4K', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'CASC1', 'RP11-324E6.9', ...
    12 synonyms found: "CTC-467M3.1" → "MEF2C-AS2", "HIST1H4K" → "H4C12", "CASC1" → "DNAI7", "LARGE" → "LARGE1", "NBPF16" → "NBPF15", "C1orf65" → "CCDC185", "IBA57-AS1" → "IBA57-DT", "KIAA1239" → "NWD2", "TMEM75" → "LINC02912", "AP003419.16" → "RPS6KB2-AS1", "FAM65C" → "RIPOR3", "C14orf177" → "LINC02914"
    → curate synonyms via .standardize("var_index")    for remaining terms:
    → fix typos, remove non-existent values, or save terms via .add_new_from_var_index()

 validating categoricals in modality "hto"...
 mapping "var_index" on Feature.name
!   12 terms are not validated: 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
    → fix typos, remove non-existent values, or save terms via .add_new_from_var_index()
 "technique" is validated against ExperimentalFactor.name

 validating categoricals in modality "gdo"...
 mapping "var_index" on Feature.name
!   111 terms are not validated: 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4', ...
    → fix typos, remove non-existent values, or save terms via .add_new_from_var_index()

False
# add new var index
curate.add_new_from_var_index("rna")
curate.add_new_from_var_index("hto")
curate.add_new_from_var_index("gdo")

# add new categories
curate.add_new_from("perturbation")
curate.add_new_from("replicate")
Hide code cell output
 added 96 records with Gene.symbol for "var_index": 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'CTC-467M3.1', 'HIST1H4K', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'CASC1', 'RP11-324E6.9', ...
 added 12 records with Feature.name for "var_index": 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
 added 111 records with Feature.name for "var_index": 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4', ...
 added 2 records with ULabel.name for "perturbation": 'Perturbed', 'NT'
 added 3 records with ULabel.name for "replicate": 'rep3', 'rep2', 'rep1'
curate.validate()
Hide code cell output
 validating categoricals in "obs"...
 "perturbation" is validated against ULabel.name
 "replicate" is validated against ULabel.name

 validating categoricals in modality "adt"...
 "var_index" is validated against CellMarker.name

 validating categoricals in modality "rna"...
 "var_index" is validated against Gene.symbol

 validating categoricals in modality "hto"...
 "var_index" is validated against Feature.name
 "technique" is validated against ExperimentalFactor.name

 validating categoricals in modality "gdo"...
 "var_index" is validated against Feature.name

True

Register curated artifact

artifact = curate.save_artifact(description="Sub-sampled MuData from Papalexi21")
Hide code cell output
! no run & transform got linked, call `ln.track()` & re-run
! run input wasn't tracked, call `ln.track()` and re-run
! did not create Feature records for 37 non-validated names: 'adt:G2M.Score', 'adt:HTO_classification', 'adt:MULTI_ID', 'adt:NT', 'adt:Phase', 'adt:S.Score', 'adt:gene_target', 'adt:guide_ID', 'adt:orig.ident', 'adt:percent.mito', 'adt:perturbation', 'adt:replicate', 'gdo:G2M.Score', 'gdo:HTO_classification', 'gdo:MULTI_ID', 'gdo:NT', 'gdo:Phase', 'gdo:S.Score', 'gdo:gene_target', 'gdo:guide_ID', ...
!    3 unique terms (100.00%) are not validated for name: 'nCount_RNA', 'nFeature_RNA', 'percent.mito'
! skip linking features to artifact in slot 'obs'
!    2 unique terms (100.00%) are not validated for name: 'nCount_ADT', 'nFeature_ADT'
! skip linking features to artifact in slot 'obs'
!    2 unique terms (66.70%) are not validated for name: 'nCount_HTO', 'nFeature_HTO'
!    did not create Feature records for 2 non-validated names: 'nCount_HTO', 'nFeature_HTO'
!    1 unique term (100.00%) is not validated for name: 'nCount_GDO'
! skip linking features to artifact in slot 'obs'
artifact.describe()
Hide code cell output
Artifact .h5mu/MuData
├── General
│   ├── .uid = 'E23fsaGkZ4qrRdDF0000'
│   ├── .size = 549984
│   ├── .hash = 'aFIJ7G9AIcxoEib8kecChw'
│   ├── .n_observations = 200
│   ├── .path = 
│   │   /home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal/.lamindb/E23fsaGkZ4qrRdDF0000.h5mu
│   ├── .created_by = testuser1 (Test User1)
│   └── .created_at = 2024-12-20 15:08:00
├── Dataset features/.feature_sets
│   ├── obs2                     [Feature]                                                           
│   │   perturbation                cat[ULabel]                NT, Perturbed                            
│   │   replicate                   cat[ULabel]                rep1, rep2, rep3                         
│   ├── ['rna'].var184           [bionty.Gene]                                                       
│   │   SH2D6                       float                                                               
│   │   ARHGAP26-AS1                float                                                               
│   │   GABRA1                      float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   SPACA1                      float                                                               
│   │   VNN1                        float                                                               
│   │   CTAGE15                     float                                                               
│   │   CTAGE15                     float                                                               
│   │   PFKFB1                      float                                                               
│   │   TRPC5                       float                                                               
│   │   RBPMS-AS1                   float                                                               
│   │   CA8                         float                                                               
│   │   CSMD3                       float                                                               
│   │   ZNF483                      float                                                               
│   ├── ['adt'].var4             [bionty.CellMarker]                                                 
│   │   CD86                        float                                                               
│   │   PDL1                        float                                                               
│   │   PDL2                        float                                                               
│   │   CD366                       float                                                               
│   ├── ['hto'].var12            [Feature]                                                           
│   │   rep1-tx                     cat                                                                 
│   │   rep1-ctrl                   cat                                                                 
│   │   rep2-tx                     cat                                                                 
│   │   rep2-ctrl                   cat                                                                 
│   │   PDL1g1-tx                   cat                                                                 
│   │   PDL1g1-ctrl                 cat                                                                 
│   │   PDL1g2-tx                   cat                                                                 
│   │   PDL1g2-ctrl                 cat                                                                 
│   │   rep3-tx                     cat                                                                 
│   │   rep3-ctrl                   cat                                                                 
│   │   rep4-tx                     cat                                                                 
│   │   rep4-ctrl                   cat                                                                 
│   ├── ['hto'].obs1             [Feature]                                                           
│   │   technique                   cat[bionty.ExperimentalF…  cell hashing                             
│   └── ['gdo'].var111           [Feature]                                                           
eGFPg1                      cat                                                                 
CUL3g1                      cat                                                                 
CUL3g2                      cat                                                                 
CUL3g3                      cat                                                                 
CMTM6g1                     cat                                                                 
CMTM6g2                     cat                                                                 
CMTM6g3                     cat                                                                 
NTg1                        cat                                                                 
NTg2                        cat                                                                 
NTg3                        cat                                                                 
NTg4                        cat                                                                 
NTg5                        cat                                                                 
NTg7                        cat                                                                 
PDL1g1                      cat                                                                 
PDL1g2                      cat                                                                 
PDL1g3                      cat                                                                 
ATF2g1                      cat                                                                 
ATF2g2                      cat                                                                 
ATF2g3                      cat                                                                 
ATF2g4                      cat                                                                 
└── Labels
    └── .experimental_factors       bionty.ExperimentalFactor  cell hashing                             
        .ulabels                    ULabel                     Perturbed, NT, rep3, rep2, rep1          
# clean up test instance
!rm -r test-multimodal
!lamin delete --force test-multimodal
Hide code cell output
 deleting instance testuser1/test-multimodal