Jupyter Notebook

Multi-modal

Here, we’ll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects.

ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.

MuData objects build on top of AnnData objects to store multimodal data.

%load_ext autoreload
%autoreload 2
# !pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-multimodal --schema bionty
Hide code cell output
 connected lamindb: testuser1/test-multimodal
import lamindb as ln
import bionty as bt
Hide code cell output
 connected lamindb: testuser1/test-multimodal
mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata
Hide code cell output
MuData object with n_obs × n_vars = 200 × 300
  obs:	'perturbation', 'replicate'
  var:	'name'
  4 modalities
    rna:	200 x 173
      obs:	'nCount_RNA', 'nFeature_RNA', 'percent.mito'
      var:	'name'
    adt:	200 x 4
      obs:	'nCount_ADT', 'nFeature_ADT'
      var:	'name'
    hto:	200 x 12
      obs:	'nCount_HTO', 'nFeature_HTO', 'technique'
      var:	'name'
    gdo:	200 x 111
      obs:	'nCount_GDO'
      var:	'name'

Validate annotations

curate = ln.Curator.from_mudata(
    mdata,
    var_index={
        "rna": bt.Gene.symbol,  # gene expression
        "adt": bt.CellMarker.name,  # antibody derived tags reflecting surface proteins
        "hto": ln.Feature.name,  # cell hashing
        "gdo": ln.Feature.name,  # guide RNAs
    },
    categoricals={
        "perturbation": ln.ULabel.name,  # shared categorical
        "replicate": ln.ULabel.name,  # shared categorical
        "hto:technique": bt.ExperimentalFactor.name,  # note this is a modality specific categorical
    },
    organism="human",
)
Hide code cell output
 added 2 records with Feature.name for "columns": 'perturbation', 'replicate'
! indexing datasets with gene symbols can be problematic: https://docs.lamin.ai/faq/symbol-mapping
 added 1 record with Feature.name for "columns": 'technique'
# optional: register additional columns we'd like to curate
curate.add_new_from_columns(modality="rna")
curate.add_new_from_columns(modality="adt")
curate.add_new_from_columns(modality="hto")
curate.add_new_from_columns(modality="gdo")
Hide code cell output
/tmp/ipykernel_3219/1003816735.py:2: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
  curate.add_new_from_columns(modality="rna")
/tmp/ipykernel_3219/1003816735.py:3: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
  curate.add_new_from_columns(modality="adt")
/tmp/ipykernel_3219/1003816735.py:4: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
  curate.add_new_from_columns(modality="hto")
/tmp/ipykernel_3219/1003816735.py:5: DeprecationWarning: `.add_new_from_columns()` is deprecated and will be removed in a future version. It's run by default during initialization.
  curate.add_new_from_columns(modality="gdo")
curate.validate()
Hide code cell output
 saving validated records of 'var_index'
 saving validated records of 'var_index'
 saving validated records of 'technique'
 validating categoricals in "obs"...
 mapping "perturbation" on ULabel.name
!   2 terms are not validated: 'Perturbed', 'NT'
    → fix typos, remove non-existent values, or save terms via .add_new_from("perturbation")
 mapping "replicate" on ULabel.name
!   3 terms are not validated: 'rep3', 'rep1', 'rep2'
    → fix typos, remove non-existent values, or save terms via .add_new_from("replicate")

 validating categoricals in modality "gdo"...
 mapping "var_index" on Feature.name
!   111 terms are not validated: 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4', ...
    → fix typos, remove non-existent values, or save terms via .add_new_from_var_index()

 validating categoricals in modality "rna"...
 mapping "var_index" on Gene.symbol
!   96 terms are not validated: 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'CTC-467M3.1', 'HIST1H4K', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'CASC1', 'RP11-324E6.9', ...
    12 synonyms found: "CTC-467M3.1" → "MEF2C-AS2", "HIST1H4K" → "H4C12", "CASC1" → "DNAI7", "LARGE" → "LARGE1", "NBPF16" → "NBPF15", "C1orf65" → "CCDC185", "IBA57-AS1" → "IBA57-DT", "KIAA1239" → "NWD2", "TMEM75" → "LINC02912", "AP003419.16" → "RPS6KB2-AS1", "FAM65C" → "RIPOR3", "C14orf177" → "LINC02914"
    → curate synonyms via .standardize("var_index")    for remaining terms:
    → fix typos, remove non-existent values, or save terms via .add_new_from_var_index()

 validating categoricals in modality "adt"...
 "var_index" is validated against CellMarker.name

 validating categoricals in modality "hto"...
 mapping "var_index" on Feature.name
!   12 terms are not validated: 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
    → fix typos, remove non-existent values, or save terms via .add_new_from_var_index()
 "technique" is validated against ExperimentalFactor.name

False
# add new var index
curate.add_new_from_var_index("rna")
curate.add_new_from_var_index("hto")
curate.add_new_from_var_index("gdo")

# add new categories
curate.add_new_from("perturbation")
curate.add_new_from("replicate")
Hide code cell output
 added 96 records with Gene.symbol for "var_index": 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'CTC-467M3.1', 'HIST1H4K', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'CASC1', 'RP11-324E6.9', ...
 added 12 records with Feature.name for "var_index": 'rep1-tx', 'rep1-ctrl', 'rep2-tx', 'rep2-ctrl', 'PDL1g1-tx', 'PDL1g1-ctrl', 'PDL1g2-tx', 'PDL1g2-ctrl', 'rep3-tx', 'rep3-ctrl', 'rep4-tx', 'rep4-ctrl'
 added 111 records with Feature.name for "var_index": 'eGFPg1', 'CUL3g1', 'CUL3g2', 'CUL3g3', 'CMTM6g1', 'CMTM6g2', 'CMTM6g3', 'NTg1', 'NTg2', 'NTg3', 'NTg4', 'NTg5', 'NTg7', 'PDL1g1', 'PDL1g2', 'PDL1g3', 'ATF2g1', 'ATF2g2', 'ATF2g3', 'ATF2g4', ...
 added 2 records with ULabel.name for "perturbation": 'Perturbed', 'NT'
 added 3 records with ULabel.name for "replicate": 'rep3', 'rep1', 'rep2'
curate.validate()
Hide code cell output
 validating categoricals in "obs"...
 "perturbation" is validated against ULabel.name
 "replicate" is validated against ULabel.name

 validating categoricals in modality "gdo"...
 "var_index" is validated against Feature.name

 validating categoricals in modality "rna"...
 "var_index" is validated against Gene.symbol

 validating categoricals in modality "adt"...
 "var_index" is validated against CellMarker.name

 validating categoricals in modality "hto"...
 "var_index" is validated against Feature.name
 "technique" is validated against ExperimentalFactor.name

True

Register curated artifact

artifact = curate.save_artifact(description="Sub-sampled MuData from Papalexi21")
Hide code cell output
! no run & transform got linked, call `ln.track()` & re-run
! run input wasn't tracked, call `ln.track()` and re-run
! did not create Feature records for 37 non-validated names: 'adt:G2M.Score', 'adt:HTO_classification', 'adt:MULTI_ID', 'adt:NT', 'adt:Phase', 'adt:S.Score', 'adt:gene_target', 'adt:guide_ID', 'adt:orig.ident', 'adt:percent.mito', 'adt:perturbation', 'adt:replicate', 'gdo:G2M.Score', 'gdo:HTO_classification', 'gdo:MULTI_ID', 'gdo:NT', 'gdo:Phase', 'gdo:S.Score', 'gdo:gene_target', 'gdo:guide_ID', ...
!    3 unique terms (100.00%) are not validated for name: 'nCount_RNA', 'nFeature_RNA', 'percent.mito'
! skip linking features to artifact in slot 'obs'
!    2 unique terms (100.00%) are not validated for name: 'nCount_ADT', 'nFeature_ADT'
! skip linking features to artifact in slot 'obs'
!    2 unique terms (66.70%) are not validated for name: 'nCount_HTO', 'nFeature_HTO'
!    did not create Feature records for 2 non-validated names: 'nCount_HTO', 'nFeature_HTO'
!    1 unique term (100.00%) is not validated for name: 'nCount_GDO'
! skip linking features to artifact in slot 'obs'
artifact.describe()
Hide code cell output
Artifact .h5mu/MuData
├── General
│   ├── .uid = 'UU7jZzW5pLRGNcGg0000'
│   ├── .size = 549984
│   ├── .hash = 'aFIJ7G9AIcxoEib8kecChw'
│   ├── .n_observations = 200
│   ├── .path = 
│   │   /home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal/.lamindb/UU7jZzW5pLRGNcGg0000.h5mu
│   ├── .created_by = testuser1 (Test User1)
│   └── .created_at = 2024-12-03 08:35:15
├── Dataset/.feature_sets
│   ├── obs2                     [Feature]                                                           
│   │   perturbation                cat[ULabel]                NT, Perturbed                            
│   │   replicate                   cat[ULabel]                rep1, rep2, rep3                         
│   ├── ['rna'].var184           [bionty.Gene]                                                       
│   │   SH2D6                       float                                                               
│   │   ARHGAP26-AS1                float                                                               
│   │   GABRA1                      float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   HLA-DQB1-AS1                float                                                               
│   │   SPACA1                      float                                                               
│   │   VNN1                        float                                                               
│   │   CTAGE15                     float                                                               
│   │   CTAGE15                     float                                                               
│   │   PFKFB1                      float                                                               
│   │   TRPC5                       float                                                               
│   │   RBPMS-AS1                   float                                                               
│   │   CA8                         float                                                               
│   │   CSMD3                       float                                                               
│   │   ZNF483                      float                                                               
│   ├── ['adt'].var4             [bionty.CellMarker]                                                 
│   │   CD86                        float                                                               
│   │   PDL1                        float                                                               
│   │   PDL2                        float                                                               
│   │   CD366                       float                                                               
│   ├── ['hto'].var12            [Feature]                                                           
│   │   rep1-tx                     float                                                               
│   │   rep1-ctrl                   float                                                               
│   │   rep2-tx                     float                                                               
│   │   rep2-ctrl                   float                                                               
│   │   PDL1g1-tx                   float                                                               
│   │   PDL1g1-ctrl                 float                                                               
│   │   PDL1g2-tx                   float                                                               
│   │   PDL1g2-ctrl                 float                                                               
│   │   rep3-tx                     float                                                               
│   │   rep3-ctrl                   float                                                               
│   │   rep4-tx                     float                                                               
│   │   rep4-ctrl                   float                                                               
│   ├── ['hto'].obs1             [Feature]                                                           
│   │   technique                   cat[bionty.ExperimentalF…  cell hashing                             
│   └── ['gdo'].var111           [Feature]                                                           
eGFPg1                      float                                                               
CUL3g1                      float                                                               
CUL3g2                      float                                                               
CUL3g3                      float                                                               
CMTM6g1                     float                                                               
CMTM6g2                     float                                                               
CMTM6g3                     float                                                               
NTg1                        float                                                               
NTg2                        float                                                               
NTg3                        float                                                               
NTg4                        float                                                               
NTg5                        float                                                               
NTg7                        float                                                               
PDL1g1                      float                                                               
PDL1g2                      float                                                               
PDL1g3                      float                                                               
ATF2g1                      float                                                               
ATF2g2                      float                                                               
ATF2g3                      float                                                               
ATF2g4                      float                                                               
└── Annotations
    └── Labels                                                                                          
        .experimental_factors       bionty.ExperimentalFactor  'cell hashing'                           
        .ulabels                    ULabel                     'Perturbed', 'NT', 'rep3', 'rep1', 'rep2'
# clean up test instance
!rm -r test-multimodal
!lamin delete --force test-multimodal
Hide code cell output
 deleting instance testuser1/test-multimodal