Jupyter Notebook

Multi-modal

Here, we’ll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects.

ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.

MuData objects build on top of AnnData objects to store multimodal data.

# !pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-multimodal --modules bionty
Hide code cell output
 initialized lamindb: testuser1/test-multimodal
import lamindb as ln
import bionty as bt

bt.settings.organism = "human"
ln.track()
Hide code cell output
 connected lamindb: testuser1/test-multimodal
 created Transform('SjhyOCDTNDCP0000', key='multimodal.ipynb'), started new Run('eDwPCyqTgTyE2NCu') at 2026-01-10 23:49:33 UTC
 notebook imports: bionty==2.0a4 lamindb==2.0a2
 recommendation: to identify the notebook across renames, pass the uid: ln.track("SjhyOCDTNDCP")

Creating MuData Artifacts

lamindb provides a from_mudata() method to create Artifact from MuData objects.

mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata
Hide code cell output
MuData object with n_obs × n_vars = 200 × 300
  obs:	'perturbation', 'replicate'
  var:	'name'
  4 modalities
    rna:	200 x 173
      obs:	'nCount_RNA', 'nFeature_RNA', 'percent.mito'
      var:	'name'
    adt:	200 x 4
      obs:	'nCount_ADT', 'nFeature_ADT'
      var:	'name'
    hto:	200 x 12
      obs:	'nCount_HTO', 'nFeature_HTO', 'technique'
      var:	'name'
    gdo:	200 x 111
      obs:	'nCount_GDO'
      var:	'name'
mdata_artifact = ln.Artifact.from_mudata(mdata, key="papalexi.h5mu")
mdata_artifact
Hide code cell output
 writing the in-memory object into cache
Artifact(uid='WRzSCuSnwCYaOyFk0000', version_tag=None, is_latest=True, key='papalexi.h5mu', description=None, suffix='.h5mu', kind='dataset', otype='MuData', size=550136, hash='as4mRWTdRo1z6ppZhxQlzw', n_files=None, n_observations=200, branch_id=1, space_id=1, storage_id=3, run_id=1, schema_id=None, created_by_id=3, created_at=<django.db.models.expressions.DatabaseDefault object at 0x7fc0b074e180>, is_locked=False)
# MuData Artifacts have the corresponding otype
mdata_artifact.otype
Hide code cell output
'MuData'
# MuData Artifacts can easily be loaded back into memory
papalexi_in_memory = mdata_artifact.load()
papalexi_in_memory
Hide code cell output
MuData object with n_obs × n_vars = 200 × 300
  obs:	'perturbation', 'replicate'
  var:	'name'
  4 modalities
    rna:	200 x 173
      obs:	'nCount_RNA', 'nFeature_RNA', 'percent.mito'
      var:	'name'
    adt:	200 x 4
      obs:	'nCount_ADT', 'nFeature_ADT'
      var:	'name'
    hto:	200 x 12
      obs:	'nCount_HTO', 'nFeature_HTO', 'technique'
      var:	'name'
    gdo:	200 x 111
      obs:	'nCount_GDO'
      var:	'name'

Schema

# define labels
perturbation = ln.ULabel(name="Perturbation", is_type=True).save()
ln.ULabel(name="Perturbed", type=perturbation).save()
ln.ULabel(name="NT", type=perturbation).save()

replicate = ln.ULabel(name="Replicate", is_type=True).save()
ln.ULabel(name="rep1", type=replicate).save()
ln.ULabel(name="rep2", type=replicate).save()
ln.ULabel(name="rep3", type=replicate).save()

# define obs schema
obs_schema = ln.Schema(
    name="mudata_papalexi21_subset_obs_schema",
    features=[
        ln.Feature(name="perturbation", dtype="cat[ULabel[Perturbation]]").save(),
        ln.Feature(name="replicate", dtype="cat[ULabel[Replicate]]").save(),
    ],
).save()

obs_schema_rna = ln.Schema(
    name="mudata_papalexi21_subset_rna_obs_schema",
    features=[
        ln.Feature(name="nCount_RNA", dtype=int).save(),
        ln.Feature(name="nFeature_RNA", dtype=int).save(),
        ln.Feature(name="percent.mito", dtype=float).save(),
    ],
    coerce_dtype=True,
).save()

obs_schema_hto = ln.Schema(
    name="mudata_papalexi21_subset_hto_obs_schema",
    features=[
        ln.Feature(name="nCount_HTO", dtype=float).save(),
        ln.Feature(name="nFeature_HTO", dtype=int).save(),
        ln.Feature(name="technique", dtype=bt.ExperimentalFactor).save(),
    ],
    coerce_dtype=True,
).save()

var_schema_rna = ln.Schema(
    name="mudata_papalexi21_subset_rna_var_schema",
    itype=bt.Gene.symbol,
    dtype=float,
).save()

# define composite schema
mudata_schema = ln.Schema(
    name="mudata_papalexi21_subset_mudata_schema",
    otype="MuData",
    slots={
        "obs": obs_schema,
        "rna:obs": obs_schema_rna,
        "hto:obs": obs_schema_hto,
        "rna:var": var_schema_rna,
    },
).save()
Hide code cell output
! rather than passing a string 'cat[ULabel[Perturbation]]' to dtype, consider passing a Python object
! rather than passing a string 'cat[ULabel[Replicate]]' to dtype, consider passing a Python object
! you are trying to create a record with name='nFeature_HTO' but a record with similar name exists: 'nFeature_RNA'. Did you mean to load it?
/tmp/ipykernel_3439/3132910678.py:20: DeprecationWarning: `coerce_dtype` argument was renamed to `coerce` and will be removed in a future release.
  obs_schema_rna = ln.Schema(
/tmp/ipykernel_3439/3132910678.py:30: DeprecationWarning: `coerce_dtype` argument was renamed to `coerce` and will be removed in a future release.
  obs_schema_hto = ln.Schema(
mudata_schema.describe()
Schema: mudata_papalexi21_subset_mudata_schema
├── uid: fDrLiEjJ0aKa5V7T                run: eDwPCyq (multimodal.ipynb)
itype: None                          otype: MuData                  
hash: TCIdnC013H2A1KoT5oCuSA         ordered_set: False             
maximal_set: False                   minimal_set: True              
branch: main                         space: all                     
created_at: 2026-01-10 23:49:34 UTC  created_by: testuser1          
├── obs: mudata_papalexi21_subset_obs_schema
│   ├── uid: SWrtJpyzPhHroKhe                run: eDwPCyq (multimodal.ipynb)
│   │   itype: Feature                       otype: None                    
│   │   hash: dnU72LWsCOXZSpvyMXG18A         ordered_set: False             
│   │   maximal_set: False                   minimal_set: True              
│   │   branch: main                         space: all                     
│   │   created_at: 2026-01-10 23:49:34 UTC  created_by: testuser1          
│   └── Features (2)
│       └── name          dtype             optional  nullable  coerce  default_value
perturbation  ULabel[rnBzLKYc]  ✗         ✓         ✗       unset        
replicate     ULabel[AwEn1Jbr]  ✗         ✓         ✗       unset        
├── rna:obs: mudata_papalexi21_subset_rna_obs_schema
│   ├── uid: cEnMBNti19B4uHu9                run: eDwPCyq (multimodal.ipynb)
│   │   itype: Feature                       otype: None                    
│   │   hash: hSrqLWx3g1qgi8X7dhsnlA         ordered_set: False             
│   │   maximal_set: False                   minimal_set: True              
│   │   branch: main                         space: all                     
│   │   created_at: 2026-01-10 23:49:34 UTC  created_by: testuser1          
│   └── Features (3)
│       └── name          dtype  optional  nullable  coerce  default_value
nCount_RNA    int    ✗         ✓         ✓       unset        
nFeature_RNA  int    ✗         ✓         ✓       unset        
percent.mito  float  ✗         ✓         ✓       unset        
├── hto:obs: mudata_papalexi21_subset_hto_obs_schema
│   ├── uid: 4LzzR859KWT4QBWu                run: eDwPCyq (multimodal.ipynb)
│   │   itype: Feature                       otype: None                    
│   │   hash: x6FGX4tuP71LKS6gkyrKpA         ordered_set: False             
│   │   maximal_set: False                   minimal_set: True              
│   │   branch: main                         space: all                     
│   │   created_at: 2026-01-10 23:49:34 UTC  created_by: testuser1          
│   └── Features (3)
│       └── name          dtype                      optional  nullable  coerce  default_value
nCount_HTO    float                      ✗         ✓         ✓       unset        
nFeature_HTO  int                        ✗         ✓         ✓       unset        
technique     bionty.ExperimentalFactor  ✗         ✓         ✓       unset        
└── rna:var: mudata_papalexi21_subset_rna_var_schema
    ├── uid: 2AvYJGWPVu7V2nul                run: eDwPCyq (multimodal.ipynb)
itype: bionty.Gene.symbol            otype: None                    
hash: rooz5mfOcfQvgjRu-gGnvA         ordered_set: False             
maximal_set: False                   minimal_set: True              
branch: main                         space: all                     
created_at: 2026-01-10 23:49:34 UTC  created_by: testuser1          
    └── bionty.Gene.symbol
        └── dtype: float

Validate MuData annotations

curator = ln.curators.MuDataCurator(mdata, mudata_schema)
! auto-transposed `var` for backward compat, please indicate transposition in the schema definition by calling out `.T`: slots={'var.T': itype=bt.Gene.ensembl_gene_id}
try:
    curator.validate()
except ln.errors.ValidationError:
    pass
! 37 terms not validated in feature 'columns' in slot 'obs': 'adt:NT', 'hto:G2M.Score', 'hto:guide_ID', 'gdo:MULTI_ID', 'adt:S.Score', 'gdo:HTO_classification', 'adt:guide_ID', 'gdo:guide_ID', 'hto:percent.mito', 'hto:technique', 'adt:percent.mito', 'adt:replicate', 'hto:MULTI_ID', 'hto:perturbation', 'hto:replicate', 'hto:NT', 'hto:S.Score', 'adt:Phase', 'gdo:NT', 'adt:HTO_classification', ...
    → fix typos, remove non-existent values, or save terms via: curator.slots['obs'].cat.add_new_from('columns')
! 96 terms not validated in feature 'columns' in slot 'rna:var': 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'CTC-467M3.1', 'HIST1H4K', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'CASC1', 'RP11-324E6.9', ...
    12 synonyms found: "CTC-467M3.1" → "MEF2C-AS2", "HIST1H4K" → "H4C12", "CASC1" → "DNAI7", "LARGE" → "LARGE1", "NBPF16" → "NBPF15", "C1orf65" → "CCDC185", "IBA57-AS1" → "IBA57-DT", "KIAA1239" → "NWD2", "TMEM75" → "LINC02912", "AP003419.16" → "RPS6KB2-AS1", "FAM65C" → "RIPOR3", "C14orf177" → "LINC02914"
    → curate synonyms via: .standardize("columns")
    for remaining terms:
    → fix organism 'Organism(uid='1dpCL6TduFJ3AP', name='human', ontology_id='NCBITaxon:9606', abbr=None, synonyms=None, description=None, scientific_name='Homo sapiens', branch_id=1, space_id=1, created_by_id=3, run_id=None, source_id=34, created_at=2026-01-10 23:49:32 UTC, is_locked=False)', fix typos, remove non-existent values, or save terms via: curator.slots['rna:var'].cat.add_new_from('columns')
curator.slots["rna:var"].cat.standardize("columns")
curator.slots["rna:var"].cat.add_new_from("columns")
curator.validate()
Hide code cell output
! 37 terms not validated in feature 'columns' in slot 'obs': 'adt:NT', 'hto:G2M.Score', 'hto:guide_ID', 'gdo:MULTI_ID', 'adt:S.Score', 'gdo:HTO_classification', 'adt:guide_ID', 'gdo:guide_ID', 'hto:percent.mito', 'hto:technique', 'adt:percent.mito', 'adt:replicate', 'hto:MULTI_ID', 'hto:perturbation', 'hto:replicate', 'hto:NT', 'hto:S.Score', 'adt:Phase', 'gdo:NT', 'adt:HTO_classification', ...
    → fix typos, remove non-existent values, or save terms via: curator.slots['obs'].cat.add_new_from('columns')
! 12 terms not validated in feature 'columns' in slot 'rna:var': 'CTC-467M3.1', 'HIST1H4K', 'CASC1', 'LARGE', 'NBPF16', 'C1orf65', 'IBA57-AS1', 'KIAA1239', 'TMEM75', 'AP003419.16', 'FAM65C', 'C14orf177'
    12 synonyms found: "CTC-467M3.1" → "MEF2C-AS2", "HIST1H4K" → "H4C12", "CASC1" → "DNAI7", "LARGE" → "LARGE1", "NBPF16" → "NBPF15", "C1orf65" → "CCDC185", "IBA57-AS1" → "IBA57-DT", "KIAA1239" → "NWD2", "TMEM75" → "LINC02912", "AP003419.16" → "RPS6KB2-AS1", "FAM65C" → "RIPOR3", "C14orf177" → "LINC02914"
    → curate synonyms via: .standardize("columns")

Register curated Artifact

artifact = curator.save_artifact(key="mudata_papalexi21_subset.h5mu")
Hide code cell output
 writing the in-memory object into cache
 returning schema with same hash: Schema(uid='SWrtJpyzPhHroKhe', is_type=False, name='mudata_papalexi21_subset_obs_schema', description=None, n_members=2, coerce=None, flexible=False, itype='Feature', otype=None, hash='dnU72LWsCOXZSpvyMXG18A', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=3, run_id=1, type_id=None, created_at=2026-01-10 23:49:34 UTC, is_locked=False)
 returning schema with same hash: Schema(uid='cEnMBNti19B4uHu9', is_type=False, name='mudata_papalexi21_subset_rna_obs_schema', description=None, n_members=3, coerce=True, flexible=False, itype='Feature', otype=None, hash='hSrqLWx3g1qgi8X7dhsnlA', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=3, run_id=1, type_id=None, created_at=2026-01-10 23:49:34 UTC, is_locked=False)
 returning schema with same hash: Schema(uid='4LzzR859KWT4QBWu', is_type=False, name='mudata_papalexi21_subset_hto_obs_schema', description=None, n_members=3, coerce=True, flexible=False, itype='Feature', otype=None, hash='x6FGX4tuP71LKS6gkyrKpA', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=3, run_id=1, type_id=None, created_at=2026-01-10 23:49:34 UTC, is_locked=False)
artifact.describe()
Hide code cell output
Artifact: mudata_papalexi21_subset.h5mu (0000)
├── uid: 8VTFCAjPfNMleDAt0000            run: eDwPCyq (multimodal.ipynb)
kind: dataset                        otype: MuData                  
hash: as4mRWTdRo1z6ppZhxQlzw         size: 537.2 KB                 
branch: main                         space: all                     
created_at: 2026-01-10 23:49:36 UTC  created_by: testuser1          
n_observations: 200                                                 
├── storage/path: 
/home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal/.lamindb/8VTFCAjPfNMleDAt0000.h5mu
├── Dataset features
├── obs (2)                                                                                                    
│   perturbation                    ULabel[Perturbation]               NT, Perturbed                           
│   replicate                       ULabel[Replicate]                  rep1, rep2, rep3                        
├── rna:obs (3)                                                                                                
│   nCount_RNA                      int                                                                        
│   nFeature_RNA                    int                                                                        
│   percent.mito                    float                                                                      
├── hto:obs (3)                                                                                                
│   nCount_HTO                      float                                                                      
│   nFeature_HTO                    int                                                                        
│   technique                       bionty.ExperimentalFactor          cell hashing                            
└── rna:var (184 bionty.Gene.symb…                                                                             
    ARHGAP26-AS1                    num                                                                        
    CA8                             num                                                                        
    CTAGE15                         num                                                                        
    CTAGE15                         num                                                                        
    GABRA1                          num                                                                        
    H4C12                           num                                                                        
    HLA-DQB1-AS1                    num                                                                        
    HLA-DQB1-AS1                    num                                                                        
    HLA-DQB1-AS1                    num                                                                        
    HLA-DQB1-AS1                    num                                                                        
    HLA-DQB1-AS1                    num                                                                        
    HLA-DQB1-AS1                    num                                                                        
    HLA-DQB1-AS1                    num                                                                        
    MEF2C-AS2                       num                                                                        
    PFKFB1                          num                                                                        
    RBPMS-AS1                       num                                                                        
    SH2D6                           num                                                                        
    SPACA1                          num                                                                        
    TRPC5                           num                                                                        
    VNN1                            num                                                                        
└── Labels
    └── .ulabels                        ULabel                             Perturbed, NT, rep1, rep2, rep3         
        .experimental_factors           bionty.ExperimentalFactor          cell hashing                            
ln.finish()
Hide code cell output
! cells [(15, 17)] were not run consecutively
 finished Run('eDwPCyqTgTyE2NCu') after 4s at 2026-01-10 23:49:38 UTC
# clean up test instance
bt.settings.organism = None
!rm -r test-multimodal
!lamin delete --force test-multimodal