Append a new dataset¶

We have one dataset in storage and are about to receive a new dataset.

In this notebook, we’ll see how to manage the situation.

import lamindb as ln
import bionty as bt
import readfcs

bt.settings.organism = "human"

ln.track("SmQmhrhigFPL0000")

→ connected lamindb: testuser1/test-facs

→ created Transform('SmQmhrhigFPL0000', key='facs2.ipynb'), started new Run('YD95QA7iuL7l7mB8') at 2025-11-26 11:09:03 UTC

→ notebook imports: bionty==1.9.1 lamindb==1.16.1 pytometry==0.1.6 readfcs==2.1.0 scanpy==1.11.5

Ingest a new artifact¶

Access ¶

Let us validate and register another .fcs file from Oetjen18:

filepath = readfcs.datasets.Oetjen18_t1()

adata = readfcs.read(filepath)
# since anndata>=0.12.0, `/` is not allowed in keys
adata.var.index = adata.var.index.str.replace("/", "|")
adata.var["marker"] = adata.var["marker"].str.replace("/", "|")
adata.uns["meta"]["spill"].index = adata.uns["meta"]["spill"].index.str.replace(
    "/", "|"
)
adata.uns["meta"]["spill"].columns = adata.uns["meta"]["spill"].columns.str.replace(
    "/", "|"
)
adata

Transform: normalize ¶

import pytometry as pm

pm.pp.split_signal(adata, var_key="channel")

pm.pp.compensate(adata)

pm.tl.normalize_biExp(adata)

adata = adata[  # subset to rows that do not have nan values
    adata.to_df().isna().sum(axis=1) == 0
]

adata.to_df().describe()

Show code cell output Hide code cell output

	CD95	CD8	CD27	CXCR4	CCR7	LIVE\|DEAD	CD4	CD45RA	CD3	CD49B	CD14\|19	CD69	CD103
count	241552.000000	241552.000000	241552.000000	241552.000000	241552.000000	241552.000000	241552.000000	241552.000000	241552.000000	241552.000000	241552.000000	241552.000000	241552.000000
mean	887.579860	1302.985717	1221.257257	877.533482	977.505533	1883.358298	556.687953	929.493316	941.166747	966.012244	1210.769935	741.523184	1003.064857
std	573.549695	827.850302	672.851319	411.966073	584.217139	932.113729	480.875917	795.550133	658.984751	456.437094	694.622980	473.287558	642.728024
min	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
25%	462.757715	493.413744	605.463427	588.047798	495.437303	1063.670965	240.623098	404.087640	477.932659	592.294399	575.401173	380.247262	475.108131
50%	774.350833	1207.624048	1110.367681	782.939692	782.981430	1951.855099	484.355203	557.904360	655.909639	800.280049	1124.574275	705.802991	775.101973
75%	1327.792103	2036.849496	1721.730010	1070.479036	1453.929567	2623.975657	729.754419	1345.771633	1218.445208	1347.042403	1742.288464	1069.175380	1420.744291
max	4053.903716	4065.495666	4095.351322	4025.827267	3999.075551	4096.000000	4088.719985	3961.255364	3940.061146	4089.445928	3982.769373	3810.774988	4023.968008

Validate cell markers ¶

Let’s see how many markers validate:

validated = bt.CellMarker.validate(adata.var.index)

Let’s standardize and re-validate:

adata.var.index = bt.CellMarker.standardize(adata.var.index)
validated = bt.CellMarker.validate(adata.var.index)

Next, register non-validated markers from Bionty:

records = bt.CellMarker.from_values(adata.var.index[~validated])
ln.save(records)

Manually create 1 marker:

bt.CellMarker(name="CD14|19").save()

Move metadata to obs:

validated = bt.CellMarker.validate(adata.var.index)
adata.obs = adata[:, ~validated].to_df()
adata = adata[:, validated].copy()

Now all markers pass validation:

validated = bt.CellMarker.validate(adata.var.index)
assert all(validated)

Register ¶

curate = ln.Curator.from_anndata(adata, var_index=bt.CellMarker.name, categoricals={})
curate.validate()

artifact = curate.save_artifact(description="Oetjen18_t1")

Annotate with more labels:

efs = bt.ExperimentalFactor.lookup()
organism = bt.Organism.lookup()

artifact.labels.add(efs.fluorescence_activated_cell_sorting)
artifact.labels.add(organism.human)

artifact.describe()

Show code cell output Hide code cell output

Artifact:  (0000)
|   description: Oetjen18_t1
├── uid: uct9SsfDvjiPNini0000            run: YD95QA7 (facs2.ipynb)
│   kind: dataset                        otype: AnnData            
│   hash: BkQOx3xp3OR4FoOq4CsuJA         size: 44.4 MB             
│   branch: main                         space: all                
│   created_at: 2025-11-26 11:09:06 UTC  created_by: testuser1     
│   n_observations: 241552                                         
├── storage/path: /home/runner/work/lamin-usecases/lamin-usecases/docs/test-facs/.lamindb/uct9SsfDvjiPNini0000.h5ad
├── Dataset features
│   └── var (12 bionty.CellMarker)                                                                                 
│       Cd4                             float                                                                      
│       CD8                             float                                                                      
│       CD3                             float                                                                      
│       CD27                            float                                                                      
│       Ccr7                            float                                                                      
│       CD45RA                          float                                                                      
│       CD95                            float                                                                      
│       CXCR4                           float                                                                      
│       CD49B                           float                                                                      
│       CD69                            float                                                                      
│       CD103                           float                                                                      
│       CD14|19                         float                                                                      
└── Labels
    └── .organisms                      bionty.Organism                    human                                   
        .experimental_factors           bionty.ExperimentalFactor          fluorescence-activated cell sorting

Inspect a PCA fo QC - this collection looks much like noise:

import scanpy as sc

markers = bt.CellMarker.lookup()

sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd8.name)

Create a new version of the collection by appending a artifact¶

Query the old version:

collection_v1 = ln.Collection.get(key="My versioned cytometry collection")

collection_v2 = ln.Collection(
    [artifact, collection_v1.ordered_artifacts[0]],
    revises=collection_v1,
    version="2",
)
collection_v2.describe()

collection_v2.save()