facs2/4 Jupyter Notebook lamindata

Append a new dataset

We have one dataset in storage and are about to receive a new dataset.

In this notebook, we’ll see how to manage the situation.

import lamindb as ln
import bionty as bt
import readfcs

bt.settings.organism = "human"

ln.track("SmQmhrhigFPL0000")
 connected lamindb: testuser1/test-facs
 created Transform('SmQmhrhigFPL0000'), started new Run('Ww93zihY...') at 2025-01-20 07:39:06 UTC
 notebook imports: bionty==1.0.0 lamindb==1.0.2 pytometry==0.1.6 readfcs==1.1.9 scanpy==1.10.4

Ingest a new artifact

Access

Let us validate and register another .fcs file from Oetjen18:

filepath = readfcs.datasets.Oetjen18_t1()

adata = readfcs.read(filepath)
adata
Hide code cell output
AnnData object with n_obs × n_vars = 241552 × 20
    var: 'n', 'channel', 'marker', '$PnR', '$PnB', '$PnE', '$PnV', '$PnG'
    uns: 'meta'

Transform: normalize

import pytometry as pm
pm.pp.split_signal(adata, var_key="channel")
pm.pp.compensate(adata)
pm.tl.normalize_biExp(adata)
adata = adata[  # subset to rows that do not have nan values
    adata.to_df().isna().sum(axis=1) == 0
]
adata.to_df().describe()
Hide code cell output
CD95 CD8 CD27 CXCR4 CCR7 LIVE/DEAD CD4 CD45RA CD3 CD49B CD14/19 CD69 CD103
count 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000
mean 887.579860 1302.985717 1221.257257 877.533482 977.505533 1883.358298 556.687953 929.493316 941.166747 966.012244 1210.769935 741.523184 1003.064857
std 573.549695 827.850302 672.851319 411.966073 584.217139 932.113729 480.875917 795.550133 658.984751 456.437094 694.622980 473.287558 642.728024
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 462.757715 493.413744 605.463427 588.047798 495.437303 1063.670965 240.623098 404.087640 477.932659 592.294399 575.401173 380.247262 475.108131
50% 774.350833 1207.624048 1110.367681 782.939692 782.981430 1951.855099 484.355203 557.904360 655.909639 800.280049 1124.574275 705.802991 775.101973
75% 1327.792103 2036.849496 1721.730010 1070.479036 1453.929567 2623.975657 729.754419 1345.771633 1218.445208 1347.042403 1742.288464 1069.175380 1420.744291
max 4053.903716 4065.495666 4095.351322 4025.827267 3999.075551 4096.000000 4088.719985 3961.255364 3940.061146 4089.445928 3982.769373 3810.774988 4023.968008

Validate cell markers

Let’s see how many markers validate:

validated = bt.CellMarker.validate(adata.var.index)
Hide code cell output
! 9 unique terms (69.20%) are not validated for name: 'CD95', 'CXCR4', 'CCR7', 'LIVE/DEAD', 'CD4', 'CD49B', 'CD14/19', 'CD69', 'CD103'

Let’s standardize and re-validate:

adata.var.index = bt.CellMarker.standardize(adata.var.index)
validated = bt.CellMarker.validate(adata.var.index)
Hide code cell output
! 7 unique terms (53.80%) are not validated for name: 'CD95', 'CXCR4', 'LIVE/DEAD', 'CD49B', 'CD14/19', 'CD69', 'CD103'
/tmp/ipykernel_3660/92294437.py:1: ImplicitModificationWarning: Trying to modify index of attribute `.var` of view, initializing view as actual.
  adata.var.index = bt.CellMarker.standardize(adata.var.index)

Next, register non-validated markers from Bionty:

records = bt.CellMarker.from_values(adata.var.index[~validated])
ln.save(records)
Hide code cell output
! did not create CellMarker records for 2 non-validated names: 'CD14/19', 'LIVE/DEAD'

Manually create 1 marker:

bt.CellMarker(name="CD14/19").save()
Hide code cell output
CellMarker(uid='3ZFziy5ims8J', name='CD14/19', created_by_id=1, run_id=2, space_id=1, organism_id=1, created_at=2025-01-20 07:39:09 UTC)

Move metadata to obs:

validated = bt.CellMarker.validate(adata.var.index)
adata.obs = adata[:, ~validated].to_df()
adata = adata[:, validated].copy()
Hide code cell output
! 1 unique term (7.70%) is not validated for name: 'LIVE/DEAD'

Now all markers pass validation:

validated = bt.CellMarker.validate(adata.var.index)
assert all(validated)

Register

curate = ln.Curator.from_anndata(adata, var_index=bt.CellMarker.name, categoricals={})
curate.validate()
Hide code cell output
 "var_index" is validated against CellMarker.name
True
artifact = curate.save_artifact(description="Oetjen18_t1")
Hide code cell output
!    1 unique term (100.00%) is not validated for name: 'LIVE/DEAD'
! skip linking features to artifact in slot 'obs'

Annotate with more labels:

efs = bt.ExperimentalFactor.lookup()
organism = bt.Organism.lookup()

artifact.labels.add(efs.fluorescence_activated_cell_sorting)
artifact.labels.add(organism.human)
artifact.describe()
Hide code cell output
Artifact .h5ad/AnnData
├── General
│   ├── .uid = '5s56cWrWhpiAwh5h0000'
│   ├── .size = 46506448
│   ├── .hash = 'WbPHGIMM_5GT68rC8ZydHA'
│   ├── .n_observations = 241552
│   ├── .path = /home/runner/work/lamin-usecases/lamin-usecases/docs/test-facs/.lamindb/5s56cWrWhpiAwh5h0000.h5ad
│   ├── .created_by = testuser1 (Test User1)
│   ├── .created_at = 2025-01-20 07:39:09
│   └── .transform = 'Append a new dataset'
├── Dataset features/._schemas_m2m
│   └── var12                    [bionty.CellMarker]                                                 
Cd4                         float                                                               
CD8                         float                                                               
CD3                         float                                                               
CD27                        float                                                               
Ccr7                        float                                                               
CD45RA                      float                                                               
CD95                        float                                                               
CXCR4                       float                                                               
CD49B                       float                                                               
CD69                        float                                                               
CD103                       float                                                               
CD14/19                     float                                                               
└── Labels
    └── .organisms                  bionty.Organism            human                                    
        .experimental_factors       bionty.ExperimentalFactor  fluorescence-activated cell sorting      

Inspect a PCA fo QC - this collection looks much like noise:

import scanpy as sc

markers = bt.CellMarker.lookup()

sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd8.name)
Hide code cell output
_images/41a18c1ad754d7337e587589c46db7a21d43e296dee9f10df247a142896e736f.png

Create a new version of the collection by appending a artifact

Query the old version:

collection_v1 = ln.Collection.get(name="My versioned cytometry collection")
collection_v2 = ln.Collection(
    [artifact, collection_v1.ordered_artifacts[0]],
    revises=collection_v1,
    version="2",
)
collection_v2.describe()
Hide code cell output
 adding collection ids [1] as inputs for run 2, adding parent transform 1
 adding artifact ids [1] as inputs for run 2, adding parent transform 1
Collection 
└── General
    ├── .uid = '10GVN0SBTT6Cqg0n0001'
    ├── .key = 'My versioned cytometry collection'
    ├── .hash = 'aIyjTZDm9LEyi4udLlQ-FA'
    ├── .version = '2'
    ├── .created_by = testuser1 (Test User1)
    ├── .created_at = timestamp of unsaved record not available
    └── .transform = 'Append a new dataset'
collection_v2.save()
Hide code cell output
Collection(uid='10GVN0SBTT6Cqg0n0001', version='2', is_latest=True, key='My versioned cytometry collection', hash='aIyjTZDm9LEyi4udLlQ-FA', created_by_id=1, space_id=1, run_id=2, created_at=2025-01-20 07:39:10 UTC)