Flow cytometry
¶
You’ll learn how to manage a growing number of flow cytometry datasets as a single queryable collection.
Specifically, you will
read a single
.fcsfile as anAnnDataand seed a versioned collection with it (, current page)
append a new dataset (a new
.fcsfile) to create a new version of the collection ()
# !pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-facs --modules bionty
import lamindb as ln
import bionty as bt
import readfcs
bt.settings.organism = "human" # globally set organism to human
ln.track("OWuTtS4SApon0000")
Ingest a first artifact¶
Access
¶
We start with a flow cytometry file from Alpert et al., Nat. Med. (2019).
Calling the following function downloads the artifact and pre-populates a few relevant registries:
ln.core.datasets.file_fcs_alpert19(populate_registries=True)
We use readfcs to read the raw fcs file into memory and create an AnnData object:
adata = readfcs.read("Alpert19.fcs")
adata
It has the following features:
adata.var.head(10)
Transform: normalize
¶
In this use case, we’d like to ingest & store curated data, and hence, we split signal and normalize using the pytometry package.
import pytometry as pm
First, we’ll split the signal from heigh and area metadata:
pm.pp.split_signal(adata, var_key="channel", data_type="cytof")
adata
Normalize the collection:
pm.tl.normalize_arcsinh(adata, cofactor=150)
Note
If the collection was a flow collection, you’ll also have to compensate the data, if possible. The metadata should contain a compensation matrix, which could then be run by the pytometry compensation function. In the case here, its a cyTOF collection, which doesn’t (really) require compensation.
Validate: cell markers
¶
First, we validate features in .var using CellMarker:
validated = bt.CellMarker.validate(adata.var.index)
We see that many features aren’t validated because they’re not standardized.
Hence, let’s standardize feature names & validate again:
adata.var.index = bt.CellMarker.standardize(adata.var.index)
validated = bt.CellMarker.validate(adata.var.index)
The remaining non-validated features don’t appear to be cell markers but rather metadata features.
Let’s move them into adata.obs:
adata.obs = adata[:, ~validated].to_df()
adata = adata[:, validated].copy()
Now we have a clean panel of 35 validated cell markers:
validated = bt.CellMarker.validate(adata.var.index)
assert all(validated) # all markers are validated
Register: metadata
¶
Next, let’s register the metadata features we moved to .obs.
For this, we create one feature record for each column in the .obs dataframe:
features = ln.Feature.from_dataframe(adata.obs)
ln.save(features)
We use the Experimental Factor Ontology through Bionty to create a “FACS” label:
bt.ExperimentalFactor.public().search("FACS").head(2) # search the public ontology
We found one for “FACS”, let’s save it to our in-house registry:
# import the FACS record from the public ontology and save it to the registry
facs = bt.ExperimentalFactor.from_source(ontology_id="EFO:0009108")
facs.save()
We don’t find one for “CyToF”, however, so, let’s create it without importing from a public ontology but label it as a child of “is_cytometry_assay”:
cytof = bt.ExperimentalFactor(name="CyTOF")
cytof.save()
is_cytometry_assay = bt.ExperimentalFactor(name="is_cytometry_assay")
is_cytometry_assay.save()
cytof.parents.add(is_cytometry_assay)
facs.parents.add(is_cytometry_assay)
is_cytometry_assay.view_parents(with_children=True)
Let us look at the content of the registry:
bt.ExperimentalFactor.to_dataframe()
Register: save & annotate with metadata
¶
var_schema = ln.Schema(
name="FACS-cell-markers",
itype=bt.CellMarker,
).save()
obs_schema = ln.Schema(
name="FACS-sample-metadata",
itype=ln.Feature,
flexible=True,
).save()
schema = ln.Schema(
name="FACS-AnnData-schema",
otype="AnnData",
slots={"obs": obs_schema, "var.T": var_schema},
).save()
curator = ln.curators.AnnDataCurator(adata, schema=schema)
artifact = curator.save_artifact(description="Alpert19")
Add more labels:
experimental_factors = bt.ExperimentalFactor.lookup()
organisms = bt.Organism.lookup()
artifact.labels.add(experimental_factors.cytof)
artifact.labels.add(organisms.human)
Inspect the saved artifact¶
Inspect features on a high level:
artifact.features
Inspect low-level features in .var:
artifact.features.slots["var.T"].members.to_dataframe().head()
Use auto-complete for marker names in the var featureset:
markers = artifact.features.slots["var.T"].members.lookup()
markers.cd14
In a plot, we can now easily also show gene symbol and Uniprot ID:
import scanpy as sc
sc.pp.pca(adata)
sc.pl.pca(
adata,
color=markers.cd14.name,
title=(
f"{markers.cd14.name} / {markers.cd14.gene_symbol} /"
f" {markers.cd14.uniprotkb_id}"
),
)
Create a collection from the artifact¶
ln.Collection(artifact, key="My versioned cytometry collection", version="1").save()