Multi-modal .md .md

Here, we’ll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects.

ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.

MuData objects build on top of AnnData objects to store multimodal data.

# pip install lamindb
!lamin init --storage ./test-multimodal --modules bionty
Hide code cell output
 initialized lamindb: testuser1/test-multimodal
import lamindb as ln
import bionty as bt

bt.settings.organism = "human"
ln.track()
Hide code cell output
 connected lamindb: testuser1/test-multimodal
 created Transform('EpZouI0iB3l70000', key='multimodal.ipynb'), started new Run('3LzXqnlZQebfyZZx') at <django.db.models.expressions.DatabaseDefault object at 0x7fbf4076e900>
 notebook imports: bionty==2.4.0 lamindb-core==2.5.1
 recommendation: to identify the notebook across renames, pass the uid: ln.track("EpZouI0iB3l7")

Creating MuData Artifacts

lamindb provides a from_mudata() method to create Artifact from MuData objects.

mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata
Hide code cell output
/opt/hostedtoolcache/Python/3.12.13/x64/lib/python3.12/site-packages/mudata/_core/mudata.py:1786: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  mod_df.rename(
MuData object with n_obs × n_vars = 200 × 300
  obs:	'perturbation', 'replicate'
  var:	'name'
  4 modalities
    rna:	200 × 173
      obs:	'nCount_RNA', 'nFeature_RNA', 'percent.mito'
      var:	'name'
    adt:	200 × 4
      obs:	'nCount_ADT', 'nFeature_ADT'
      var:	'name'
    hto:	200 × 12
      obs:	'nCount_HTO', 'nFeature_HTO', 'technique'
      var:	'name'
    gdo:	200 × 111
      obs:	'nCount_GDO'
      var:	'name'
mdata_artifact = ln.Artifact.from_mudata(mdata, key="papalexi.h5mu")
mdata_artifact
Hide code cell output
Artifact(uid='sQChDXJtBhpCKlRk0000', key='papalexi.h5mu', description=None, suffix='.h5mu', kind='dataset', otype='MuData', size=550048, hash='kI4NFS6HLF8185Y4_E_lsg', n_files=None, n_observations=200, branch_id=1, created_on_id=1, space_id=1, storage_id=1, run_id=1, schema_id=None, created_by_id=1, created_at=<django.db.models.expressions.DatabaseDefault object at 0x7fbf3040a030>, is_locked=False, version_tag=None, is_latest=True)
# MuData Artifacts have the corresponding otype
mdata_artifact.otype
Hide code cell output
'MuData'
# MuData Artifacts can easily be loaded back into memory
papalexi_in_memory = mdata_artifact.load()
papalexi_in_memory
Hide code cell output
MuData object with n_obs × n_vars = 200 × 300
  obs:	'perturbation', 'replicate'
  var:	'name'
  4 modalities
    rna:	200 × 173
      obs:	'nCount_RNA', 'nFeature_RNA', 'percent.mito'
      var:	'name'
    adt:	200 × 4
      obs:	'nCount_ADT', 'nFeature_ADT'
      var:	'name'
    hto:	200 × 12
      obs:	'nCount_HTO', 'nFeature_HTO', 'technique'
      var:	'name'
    gdo:	200 × 111
      obs:	'nCount_GDO'
      var:	'name'

Schema

# define labels
perturbation = ln.ULabel(name="Perturbation", is_type=True).save()
ln.ULabel(name="Perturbed", type=perturbation).save()
ln.ULabel(name="NT", type=perturbation).save()

replicate = ln.ULabel(name="Replicate", is_type=True).save()
ln.ULabel(name="rep1", type=replicate).save()
ln.ULabel(name="rep2", type=replicate).save()
ln.ULabel(name="rep3", type=replicate).save()

# define obs schema
obs_schema = ln.Schema(
    name="mudata_papalexi21_subset_obs_schema",
    features=[
        ln.Feature(name="perturbation", dtype=perturbation).save(),
        ln.Feature(name="replicate", dtype=replicate).save(),
    ],
).save()

obs_schema_rna = ln.Schema(
    name="mudata_papalexi21_subset_rna_obs_schema",
    features=[
        ln.Feature(name="nCount_RNA", dtype=int).save(),
        ln.Feature(name="nFeature_RNA", dtype=int).save(),
        ln.Feature(name="percent.mito", dtype=float).save(),
    ],
    coerce_dtype=True,
).save()

obs_schema_hto = ln.Schema(
    name="mudata_papalexi21_subset_hto_obs_schema",
    features=[
        ln.Feature(name="nCount_HTO", dtype=int).save(),
        ln.Feature(name="nFeature_HTO", dtype=int).save(),
        ln.Feature(name="technique", dtype=bt.ExperimentalFactor).save(),
    ],
    coerce_dtype=True,
).save()

var_schema_rna = ln.Schema(
    name="mudata_papalexi21_subset_rna_var_schema",
    itype=bt.Gene.symbol,
    dtype=float,
).save()

# define composite schema
mudata_schema = ln.Schema(
    name="mudata_papalexi21_subset_mudata_schema",
    otype="MuData",
    slots={
        "obs": obs_schema,
        "rna:obs": obs_schema_rna,
        "hto:obs": obs_schema_hto,
        "rna:var": var_schema_rna,
    },
).save()
Hide code cell output
! you are trying to create a record with name='nFeature_HTO' but a record with similar name exists: 'nFeature_RNA'. Did you mean to load it?
/tmp/ipykernel_3824/897646117.py:20: DeprecationWarning: `coerce_dtype` argument was renamed to `coerce` and will be removed in a future release.
  obs_schema_rna = ln.Schema(
/tmp/ipykernel_3824/897646117.py:30: DeprecationWarning: `coerce_dtype` argument was renamed to `coerce` and will be removed in a future release.
  obs_schema_hto = ln.Schema(
mudata_schema.describe()
Hide code cell output
Schema: mudata_papalexi21_subset_mudata_schema
├── uid: YfAhZLMBGWf3GblO                run: 3LzXqnl (multimodal.ipynb)
itype: None                          otype: MuData                  
hash: DYd4bgyzknQCvKJShi_9mA         ordered_set: False             
maximal_set: False                   minimal_set: True              
branch: main                         space: all                     
created_at: 2026-06-04 13:12:30 UTC  created_by: testuser1          
├── obs: mudata_papalexi21_subset_obs_schema
│   ├── uid: tIiZpfvplm2FKqAF                run: 3LzXqnl (multimodal.ipynb)
│   │   itype: Feature                       otype: None                    
│   │   hash: kMVwXYqm2N5UkUdpx42ijw         ordered_set: False             
│   │   maximal_set: False                   minimal_set: True              
│   │   branch: main                         space: all                     
│   │   created_at: 2026-06-04 13:12:30 UTC  created_by: testuser1          
│   └── Features (2)
│       └── name          dtype                 optional  nullable  coerce  default_value
perturbation  ULabel[Perturbation]  ✗         ✓         ✗       unset        
replicate     ULabel[Replicate]     ✗         ✓         ✗       unset        
├── rna:obs: mudata_papalexi21_subset_rna_obs_schema
│   ├── uid: REFV7QBzy65dFfo3                run: 3LzXqnl (multimodal.ipynb)
│   │   itype: Feature                       otype: None                    
│   │   hash: utl9N4sOe3UD5rR0DnbFWQ         ordered_set: False             
│   │   maximal_set: False                   minimal_set: True              
│   │   branch: main                         space: all                     
│   │   created_at: 2026-06-04 13:12:30 UTC  created_by: testuser1          
│   └── Features (3)
│       └── name          dtype  optional  nullable  coerce  default_value
nCount_RNA    int    ✗         ✓         ✓       unset        
nFeature_RNA  int    ✗         ✓         ✓       unset        
percent.mito  float  ✗         ✓         ✓       unset        
├── hto:obs: mudata_papalexi21_subset_hto_obs_schema
│   ├── uid: uiebSpq3I60cW4Ul                run: 3LzXqnl (multimodal.ipynb)
│   │   itype: Feature                       otype: None                    
│   │   hash: Pk8Y5OZ2rsr53FQ9CICvhA         ordered_set: False             
│   │   maximal_set: False                   minimal_set: True              
│   │   branch: main                         space: all                     
│   │   created_at: 2026-06-04 13:12:30 UTC  created_by: testuser1          
│   └── Features (3)
│       └── name          dtype                      optional  nullable  coerce  default_value
nCount_HTO    int                        ✗         ✓         ✓       unset        
nFeature_HTO  int                        ✗         ✓         ✓       unset        
technique     bionty.ExperimentalFactor  ✗         ✓         ✓       unset        
└── rna:var: mudata_papalexi21_subset_rna_var_schema
    ├── uid: 24UseN5KzZ8hHkLU                run: 3LzXqnl (multimodal.ipynb)
itype: bionty.Gene.symbol            otype: None                    
hash: rooz5mfOcfQvgjRu-gGnvA         ordered_set: False             
maximal_set: False                   minimal_set: True              
branch: main                         space: all                     
created_at: 2026-06-04 13:12:30 UTC  created_by: testuser1          
    └── bionty.Gene.symbol
        └── dtype: float

Validate MuData annotations

curator = ln.curators.MuDataCurator(mdata, mudata_schema)
Hide code cell output
! auto-transposed `var` for backward compat, please indicate transposition in the schema definition by calling out `.T`: slots={'var.T': itype=bt.Gene.ensembl_gene_id}
try:
    curator.validate()
except ln.errors.ValidationError:
    pass
curator.slots["rna:var"].cat.standardize("columns")
curator.slots["rna:var"].cat.add_new_from("columns")
curator.validate()

Register curated Artifact

artifact = curator.save_artifact(key="mudata_papalexi21_subset.h5mu")
Hide code cell output
 returning schema with same hash: Schema(uid='tIiZpfvplm2FKqAF', is_type=False, name='mudata_papalexi21_subset_obs_schema', description=None, n_members=2, coerce=None, flexible=False, itype='Feature', otype=None, hash='kMVwXYqm2N5UkUdpx42ijw', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, created_on_id=1, space_id=1, created_by_id=1, run_id=1, type_id=None, created_at=2026-06-04 13:12:30 UTC, is_locked=False)
 returning schema with same hash: Schema(uid='REFV7QBzy65dFfo3', is_type=False, name='mudata_papalexi21_subset_rna_obs_schema', description=None, n_members=3, coerce=True, flexible=False, itype='Feature', otype=None, hash='utl9N4sOe3UD5rR0DnbFWQ', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, created_on_id=1, space_id=1, created_by_id=1, run_id=1, type_id=None, created_at=2026-06-04 13:12:30 UTC, is_locked=False)
 returning schema with same hash: Schema(uid='uiebSpq3I60cW4Ul', is_type=False, name='mudata_papalexi21_subset_hto_obs_schema', description=None, n_members=3, coerce=True, flexible=False, itype='Feature', otype=None, hash='Pk8Y5OZ2rsr53FQ9CICvhA', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, created_on_id=1, space_id=1, created_by_id=1, run_id=1, type_id=None, created_at=2026-06-04 13:12:30 UTC, is_locked=False)
artifact.describe()
Hide code cell output
Artifact: mudata_papalexi21_subset.h5mu (0000)
├── uid: 04LrJ1UOUi0u4aEr0000            run: 3LzXqnl (multimodal.ipynb)               
kind: dataset                        otype: MuData                                 
hash: kI4NFS6HLF8185Y4_E_lsg         size: 537.2 KB                                
branch: main                         space: all                                    
created_at: 2026-06-04 13:12:33 UTC  created_by: testuser1                         
n_observations: 200                  schema: mudata_papalexi21_subset_mudata_schema
├── storage/path: 
/home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal/.lamindb/04LrJ1UOUi0u4aEr0000.h5mu
├── Dataset features
├── obs (2)                                                                                                    
│   perturbation                   ULabel[Perturbation]                 NT, Perturbed                          
│   replicate                      ULabel[Replicate]                    rep1, rep2, rep3                       
├── rna:obs (3)                                                                                                
│   nCount_RNA                     int                                                                         
│   nFeature_RNA                   int                                                                         
│   percent.mito                   float                                                                       
├── hto:obs (3)                                                                                                
│   nCount_HTO                     int                                                                         
│   nFeature_HTO                   int                                                                         
│   technique                      bionty.ExperimentalFactor            cell hashing                           
└── rna:var (184 bionty.Gene.sym…                                                                              
    ARHGAP26-AS1                   num                                                                         
    CA8                            num                                                                         
    CTAGE15                        num                                                                         
    CTAGE15                        num                                                                         
    GABRA1                         num                                                                         
    H4C12                          num                                                                         
    HLA-DQB1-AS1                   num                                                                         
    HLA-DQB1-AS1                   num                                                                         
    HLA-DQB1-AS1                   num                                                                         
    HLA-DQB1-AS1                   num                                                                         
    HLA-DQB1-AS1                   num                                                                         
    HLA-DQB1-AS1                   num                                                                         
    HLA-DQB1-AS1                   num                                                                         
    MEF2C-AS2                      num                                                                         
    PFKFB1                         num                                                                         
    RBPMS-AS1                      num                                                                         
    SH2D6                          num                                                                         
    SPACA1                         num                                                                         
    TRPC5                          num                                                                         
    VNN1                           num                                                                         
└── Labels
    └── .ulabels                       ULabel                               Perturbed, NT, rep1, rep2, rep3        
        .experimental_factors          bionty.ExperimentalFactor            cell hashing                           
ln.finish()
Hide code cell output
! cells [(15, 17)] were not run consecutively
 finished Run('3LzXqnlZQebfyZZx') after 5s at 2026-06-04 13:12:34 UTC
# clean up test instance
bt.settings.organism = None
!rm -r test-multimodal
!lamin delete --force test-multimodal