facs2/4 Jupyter Notebook lamindata

Append a new dataset

We have one dataset in storage and are about to receive a new dataset.

In this notebook, we’ll see how to manage the situation.

import lamindb as ln
import bionty as bt
import readfcs

bt.settings.organism = "human"

ln.track("SmQmhrhigFPL0000")
→ connected lamindb: testuser1/test-facs
→ created Transform('SmQmhrhi'), started new Run('LTWrnnk2') at 2024-11-21 06:55:30 UTC
→ notebook imports: bionty==0.53.1 lamindb==0.76.16 pytometry==0.1.6 readfcs==1.1.9 scanpy==1.10.4

Ingest a new artifact

Access

Let us validate and register another .fcs file from Oetjen18:

filepath = readfcs.datasets.Oetjen18_t1()

adata = readfcs.read(filepath)
adata
Hide code cell output
AnnData object with n_obs × n_vars = 241552 × 20
    var: 'n', 'channel', 'marker', '$PnR', '$PnB', '$PnE', '$PnV', '$PnG'
    uns: 'meta'

Transform: normalize

import pytometry as pm
pm.pp.split_signal(adata, var_key="channel")
pm.pp.compensate(adata)
pm.tl.normalize_biExp(adata)
adata = adata[  # subset to rows that do not have nan values
    adata.to_df().isna().sum(axis=1) == 0
]
adata.to_df().describe()
Hide code cell output
CD95 CD8 CD27 CXCR4 CCR7 LIVE/DEAD CD4 CD45RA CD3 CD49B CD14/19 CD69 CD103
count 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000 241552.000000
mean 887.579860 1302.985717 1221.257257 877.533482 977.505533 1883.358298 556.687953 929.493316 941.166747 966.012244 1210.769935 741.523184 1003.064857
std 573.549695 827.850302 672.851319 411.966073 584.217139 932.113729 480.875917 795.550133 658.984751 456.437094 694.622980 473.287558 642.728024
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 462.757715 493.413744 605.463427 588.047798 495.437303 1063.670965 240.623098 404.087640 477.932659 592.294399 575.401173 380.247262 475.108131
50% 774.350833 1207.624048 1110.367681 782.939692 782.981430 1951.855099 484.355203 557.904360 655.909639 800.280049 1124.574275 705.802991 775.101973
75% 1327.792103 2036.849496 1721.730010 1070.479036 1453.929567 2623.975657 729.754419 1345.771633 1218.445208 1347.042403 1742.288464 1069.175380 1420.744291
max 4053.903716 4065.495666 4095.351322 4025.827267 3999.075551 4096.000000 4088.719985 3961.255364 3940.061146 4089.445928 3982.769373 3810.774988 4023.968008

Validate cell markers

Let’s see how many markers validate:

validated = bt.CellMarker.validate(adata.var.index)
Hide code cell output
! 9 unique terms (69.20%) are not validated for name: 'CD95', 'CXCR4', 'CCR7', 'LIVE/DEAD', 'CD4', 'CD49B', 'CD14/19', 'CD69', 'CD103'

Let’s standardize and re-validate:

adata.var.index = bt.CellMarker.standardize(adata.var.index)
validated = bt.CellMarker.validate(adata.var.index)
Hide code cell output
! 7 unique terms (53.80%) are not validated for name: 'CD95', 'CXCR4', 'LIVE/DEAD', 'CD49B', 'CD14/19', 'CD69', 'CD103'
/tmp/ipykernel_3108/92294437.py:1: ImplicitModificationWarning: Trying to modify index of attribute `.var` of view, initializing view as actual.
  adata.var.index = bt.CellMarker.standardize(adata.var.index)

Next, register non-validated markers from Bionty:

records = bt.CellMarker.from_values(adata.var.index[~validated])
ln.save(records)
Hide code cell output
! did not create CellMarker records for 2 non-validated names: 'CD14/19', 'LIVE/DEAD'

Manually create 1 marker:

bt.CellMarker(name="CD14/19").save()
Hide code cell output
CellMarker(uid='3ZFziy5ims8J', name='CD14/19', created_by_id=1, run_id=2, organism_id=1, created_at=2024-11-21 06:55:33 UTC)

Move metadata to obs:

validated = bt.CellMarker.validate(adata.var.index)
adata.obs = adata[:, ~validated].to_df()
adata = adata[:, validated].copy()
Hide code cell output
! 1 unique term (7.70%) is not validated for name: 'LIVE/DEAD'

Now all markers pass validation:

validated = bt.CellMarker.validate(adata.var.index)
assert all(validated)

Register

curate = ln.Curator.from_anndata(adata, var_index=bt.CellMarker.name, categoricals={})
curate.validate()
Hide code cell output
✓ 'var_index' is validated against CellMarker.name
True
artifact = curate.save_artifact(description="Oetjen18_t1")
Hide code cell output
!    1 unique term (100.00%) is not validated for name: 'LIVE/DEAD'
! skip linking features to artifact in slot 'obs'

Annotate with more labels:

efs = bt.ExperimentalFactor.lookup()
organism = bt.Organism.lookup()

artifact.labels.add(efs.fluorescence_activated_cell_sorting)
artifact.labels.add(organism.human)
artifact.describe()
Hide code cell output
Artifact(uid='pNVChGDo9gJecbF00000', is_latest=True, description='Oetjen18_t1', suffix='.h5ad', type='dataset', size=46506448, hash='WbPHGIMM_5GT68rC8ZydHA', n_observations=241552, _hash_type='md5', _accessor='AnnData', visibility=1, _key_is_virtual=True, created_at=2024-11-21 06:55:34 UTC)
  Provenance
    .storage = '/home/runner/work/lamin-usecases/lamin-usecases/docs/test-facs'
    .transform = 'Append a new dataset'
    .run = 2024-11-21 06:55:30 UTC
    .created_by = 'testuser1'
  Labels
    .organisms = 'human'
    .experimental_factors = 'fluorescence-activated cell sorting'
  Feature sets
    'var' = 'Cd4', 'CD8', 'CD3', 'CD27', 'Ccr7', 'CD45RA', 'CD95', 'CXCR4', 'CD49B', 'CD69', 'CD103', 'CD14/19'

Inspect a PCA fo QC - this collection looks much like noise:

import scanpy as sc

markers = bt.CellMarker.lookup()

sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd8.name)
Hide code cell output
_images/6f0b83c97d50bbb8ff95d006bce67f5258061ecd896bfa6ae0b1b09c86d94e70.png

Create a new version of the collection by appending a artifact

Query the old version:

collection_v1 = ln.Collection.get(name="My versioned cytometry collection")
collection_v2 = ln.Collection(
    [artifact, collection_v1.ordered_artifacts[0]],
    is_new_version_of=collection_v1,
    version="2",
)
collection_v2.describe()
Hide code cell output
! `is_new_version_of` will be removed soon, please use `revises`
• adding collection ids [1] as inputs for run 2, adding parent transform 1
• adding artifact ids [1] as inputs for run 2, adding parent transform 1
Collection(uid='XxPj4nx9D7IZLyUF0001', version='2', is_latest=True, name='My versioned cytometry collection', hash='aIyjTZDm9LEyi4udLlQ-FA', visibility=1)
  Provenance
    .created_by = 'testuser1'
    .transform = 'Append a new dataset'
    .run = 2024-11-21 06:55:30 UTC
collection_v2.save()
Hide code cell output
Collection(uid='XxPj4nx9D7IZLyUF0001', version='2', is_latest=True, name='My versioned cytometry collection', hash='aIyjTZDm9LEyi4udLlQ-FA', visibility=1, created_by_id=1, transform_id=2, run_id=2, created_at=2024-11-21 06:55:35 UTC)