sc-imaging¶
Here, you will learn how to structure, featurize, and make a large imaging collection queryable for large-scale machine learning:
Load and annotate a
Collection
of microscopy images ()
First, we load and annotate a collection of microscopy images in TIFF format that was previously uploaded.
The images used here were acquired as part of a study on autophagy, a cellular process during which cells recycle their components in autophagosomes. The study tracked genetic determinants of autophagy through fluorescence microscopy of human U2OS cells.
# pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-sc-imaging --modules bionty
Show code cell output
→ initialized lamindb: testuser1/test-sc-imaging
import lamindb as ln
import bionty as bt
from tifffile import imread
import matplotlib.pyplot as plt
ln.track()
Show code cell output
→ connected lamindb: testuser1/test-sc-imaging
→ created Transform('zoWuUQp9TxFF0000'), started new Run('2gf1XFsb...') at 2025-07-08 11:00:10 UTC
→ notebook imports: bionty==1.6.0 lamindb==1.7.1 matplotlib==3.10.3 tifffile==2025.6.11
• recommendation: to identify the notebook across renames, pass the uid: ln.track("zoWuUQp9TxFF")
All image metadata is stored in an already ingested .csv
file on the scportrait/examples
instance.
metadata_files = (
ln.Artifact.using("scportrait/examples")
.get(key="input_data_imaging_usecase/metadata_files.csv")
.load()
)
metadata_files.head(2)
Show code cell output
→ transferred: Artifact(uid='jdjwcUB0w1QAHAYm0000'), Storage(uid='r7YUayXjktSb')
image_path | genotype | stimulation | cell_line | cell_line_clone | channel | FOV | magnification | microscope | imaged structure | resolution | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | input_data_imaging_usecase/images/Timepoint001... | WT | 14h Torin-1 | U2OS | U2OS lcklip-mNeon mCherryLC3B clone 1 | Alexa488 | FOV1 | 20X | Opera Phenix | LckLip-mNeon | 0.597976 |
1 | input_data_imaging_usecase/images/Timepoint001... | WT | 14h Torin-1 | U2OS | U2OS lcklip-mNeon mCherryLC3B clone 1 | Alexa488 | FOV2 | 20X | Opera Phenix | LckLip-mNeon | 0.597976 |
metadata_files.apply(lambda col: col.unique())
Show code cell output
image_path [input_data_imaging_usecase/images/Timepoint00...
genotype [WT, EI24KO]
stimulation [14h Torin-1, untreated]
cell_line [U2OS]
cell_line_clone [U2OS lcklip-mNeon mCherryLC3B clone 1, U2OS l...
channel [Alexa488, DAPI, mCherry]
FOV [FOV1, FOV2]
magnification [20X]
microscope [Opera Phenix]
imaged structure [LckLip-mNeon, DNA, mCherry-LC3B]
resolution [0.597976081]
dtype: object
Curating artifacts¶
All images feature the U2OS cell line, captured using an Opera Phenix microscope at 20X magnification.
To induce autophagy, cells were treated under two conditions:
Treated: Exposed to
Torin-1
(a starvation-mimicking small molecule) for 14 hoursControl: Left untreated
The U2OS cells were genetically engineered with fluorescently tagged proteins to visualize the process of autophagosome formation:
LC3B
-> Autophagosome marker (visible in mCherry channel)LckLip
-> Membrane-targeted fluorescence protein for cell boundary visualization (visible in Alexa488 channel)Hoechst
-> DNA stain for nucleus identification (visible in DAPI channel)
Each image contains three separate channels:
Channel |
Imaged Structure |
Fluorescent Marker |
---|---|---|
1 |
DNA |
|
2 |
Autophagosomes |
|
3 |
Plasma Membrane |
|
Two genotypes were analyzed:
WT (Wild-type cells)
EI24KO (
EI24
gene knockout cells)
For each genotype, two different clonal cell lines were studied, with multiple fields of view (FOVs) captured per experimental condition.
All images are annotated with corresponding metadata to enable efficient querying and analysis.
Define a schema¶
We define a Schema
to curate metadata.
ulabel_names = ["genotype", "stimulation", "cell_line_clone", "channel",
"FOV", "magnification", "microscope", "imaged structure"]
autophagy_imaging_schema = ln.Schema(
name="Autophagy imaging schema",
features=[
*[ln.Feature(name=name, dtype=ln.ULabel.name).save() for name in ulabel_names],
ln.Feature(name="image_path", dtype=str, description="image path").save(),
ln.Feature(name="cell_line", dtype=bt.CellLine.name).save(),
ln.Feature(name="resolution", dtype=float, description="conversion factor for px to µm").save(),
],
coerce_dtype=True,
).save()
Show code cell output
! you are trying to create a record with name='cell_line' but a record with similar name exists: 'cell_line_clone'. Did you mean to load it?
Curate the dataset¶
curator = ln.curators.DataFrameCurator(metadata_files, autophagy_imaging_schema)
try:
curator.validate()
except ln.core.exceptions.ValidationError as e:
print(e)
Show code cell output
! 2 terms not validated in feature 'genotype': 'WT', 'EI24KO'
→ fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('genotype')
! 2 terms not validated in feature 'stimulation': '14h Torin-1', 'untreated'
→ fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('stimulation')
! 4 terms not validated in feature 'cell_line_clone': 'U2OS lcklip-mNeon mCherryLC3B clone 1', 'U2OS lcklip-mNeon mCherryLC3B clone 2', 'U2OS lcklip-mNeon mCherryLC3B EI24 KO clone 1', 'U2OS lcklip-mNeon mCherryLC3B EI24 KO clone 2'
→ fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('cell_line_clone')
! 3 terms not validated in feature 'channel': 'Alexa488', 'DAPI', 'mCherry'
→ fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('channel')
! 2 terms not validated in feature 'FOV': 'FOV1', 'FOV2'
→ fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('FOV')
! 1 term not validated in feature 'magnification': '20X'
→ fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('magnification')
! 1 term not validated in feature 'microscope': 'Opera Phenix'
→ fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('microscope')
! 3 terms not validated in feature 'imaged structure': 'LckLip-mNeon', 'DNA', 'mCherry-LC3B'
→ fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('imaged structure')
! 1 term not validated in feature 'cell_line': 'U2OS'
1 synonym found: "U2OS" → "U-2 OS cell"
→ curate synonyms via: .standardize("cell_line")
1 term not validated in feature 'cell_line': 'U2OS'
1 synonym found: "U2OS" → "U-2 OS cell"
→ curate synonyms via: .standardize("cell_line")
Add and standardize missing terms:
curator.cat.standardize("cell_line")
for key in curator.cat.non_validated.keys():
curator.cat.add_new_from(key)
curator.validate()
Show code cell output
! you are trying to create a record with name='mCherry' but records with similar names exist: 'U2OS lcklip-mNeon mCherryLC3B clone 1', 'U2OS lcklip-mNeon mCherryLC3B clone 2', 'U2OS lcklip-mNeon mCherryLC3B EI24 KO clone 1'. Did you mean to load one of them?
! you are trying to create a record with name='LckLip-mNeon' but records with similar names exist: 'U2OS lcklip-mNeon mCherryLC3B clone 1', 'U2OS lcklip-mNeon mCherryLC3B clone 2', 'U2OS lcklip-mNeon mCherryLC3B EI24 KO clone 1'. Did you mean to load one of them?
Annotate images with metadata¶
We add images to our lamindb
instance and annotate them with their metadata.
# Create study feature and associated label
ln.Feature(name="study", dtype=ln.ULabel).save()
ln.ULabel(name="autophagy imaging").save()
artifacts = []
for _, row in metadata_files.iterrows():
artifact = ln.Artifact.using("scportrait/examples").filter(key__icontains=row["image_path"]).one()
artifact.save()
artifact.cell_lines.add(bt.CellLine.filter(name=row.cell_line).one())
artifact.features.add_values({
col: row[col] for col in ["genotype", "stimulation", "cell_line_clone",
"channel", "FOV", "magnification", "microscope", "resolution"]
} | {
"imaged structure": row["imaged structure"],
"study": "autophagy imaging"
})
artifacts.append(artifact)
Show code cell output
→ transferred: Artifact(uid='Md4OouMExlWS2YfZ0000')
→ transferred: Artifact(uid='KKbVRkOjQ1jdA2fx0000')
→ transferred: Artifact(uid='CiQYTBNZrj0CPejK0000')
→ transferred: Artifact(uid='W6tzE7JNiM80Ruho0000')
→ transferred: Artifact(uid='YGiNq6DPfIEjtt9j0000')
→ transferred: Artifact(uid='uuh41FAHEz0ASL2N0000')
→ transferred: Artifact(uid='Gtwi9Pcyx8maQEWB0000')
→ transferred: Artifact(uid='nSZhAypqiNZ2Ylbe0000')
→ transferred: Artifact(uid='sHHpiiFYWsIXMZNV0000')
→ transferred: Artifact(uid='jzqxtoduIJ3hCbB40000')
→ transferred: Artifact(uid='hNMkrIHce1XrLZHY0000')
→ transferred: Artifact(uid='M06liaIzh2OVEuJ40000')
→ transferred: Artifact(uid='yiIMSAddDWgLgki70000')
→ transferred: Artifact(uid='2ie2Kjzn1O7UYhuq0000')
→ transferred: Artifact(uid='DEzw4QQAsjVZ010b0000')
→ transferred: Artifact(uid='fOQSb7JCK67aeN6a0000')
→ transferred: Artifact(uid='hbVyCGFARHU91Kax0000')
→ transferred: Artifact(uid='qHQpdWcFu7FzF6l50000')
→ transferred: Artifact(uid='mXWwV1x42Jz9RoSO0000')
→ transferred: Artifact(uid='Ov8FnKzHMNY0XVJa0000')
→ transferred: Artifact(uid='gj0HHnoVpEqbaUJb0000')
→ transferred: Artifact(uid='Lwk8shsYe0V5bMgd0000')
→ transferred: Artifact(uid='1XnEyqVt6UGXCTmV0000')
→ transferred: Artifact(uid='YuyVn060M4FxATPz0000')
→ transferred: Artifact(uid='jVytS8AyAHmHkYR30000')
→ transferred: Artifact(uid='vCVbKkzz4CnJPPKF0000')
→ transferred: Artifact(uid='cw4F6bUB9zuMthCY0000')
→ transferred: Artifact(uid='5kMhlcDNek4RMeQF0000')
→ transferred: Artifact(uid='7TZGXvbA0JLL68hR0000')
→ transferred: Artifact(uid='9BmbViqMmlVhpfS00000')
→ transferred: Artifact(uid='ThuJnRAhqkp54kyU0000')
→ transferred: Artifact(uid='h4EKWveW36LIzXez0000')
→ transferred: Artifact(uid='PquYNyshQTDd24Vw0000')
→ transferred: Artifact(uid='fdem35nw5ztUnEIM0000')
→ transferred: Artifact(uid='8SFUmW0RhBNySxBO0000')
→ transferred: Artifact(uid='VkmKLUCaMsYFCuGE0000')
→ transferred: Artifact(uid='OS0wBE7bviIlW7qj0000')
→ transferred: Artifact(uid='9ZVngbl0JUS0XdZ70000')
→ transferred: Artifact(uid='ixOpuSTsyrPXdYuA0000')
→ transferred: Artifact(uid='IzP3IAwIhmM7OORD0000')
→ transferred: Artifact(uid='6uMjKAk1aYlAV7Cf0000')
→ transferred: Artifact(uid='RRVS8qVx3VSw02Xu0000')
→ transferred: Artifact(uid='QDPX1ljp0eCMz80o0000')
→ transferred: Artifact(uid='Oww4y0yYuR8pxV9q0000')
→ transferred: Artifact(uid='Cvamog4G3a2XYGM80000')
→ transferred: Artifact(uid='AhBvnNKg5yJcG6LU0000')
→ transferred: Artifact(uid='6uUjyphUD4D1Hixc0000')
→ transferred: Artifact(uid='AVRTVX9gEu4LrTAP0000')
artifacts[0].describe()
Artifact .tif ├── General │ ├── uid: Md4OouMExlWS2YfZ0000 hash: 0aoXxT857VvKAGo9UQo-8g │ ├── size: 2.2 MB space: all │ ├── branch: main created_at: 2025-03-07 13:51:25 │ ├── created_by: testuser1 (Test User1) │ ├── key: input_data_imaging_usecase/images/Timepoint001_Row01_Well01_Alexa488_zstack001_r003_c005.tif │ ├── storage location / path: s3://lamin-eu-central-1/r7YUayXjktSb/.lamindb/Md4OouMExlWS2YfZ0000.tif │ ├── description: raw image of U2OS cells stained for autophagy markers │ └── transform: __lamindb_transfer__/5WAovnZ5l7ij ├── Linked features │ └── FOV cat[ULabel] FOV1 │ cell_line_clone cat[ULabel] U2OS lcklip-mNeon mCherryLC3B clone 1 │ channel cat[ULabel] Alexa488 │ genotype cat[ULabel] WT │ imaged structure cat[ULabel] LckLip-mNeon │ magnification cat[ULabel] 20X │ microscope cat[ULabel] Opera Phenix │ stimulation cat[ULabel] 14h Torin-1 │ study cat[ULabel] autophagy imaging │ resolution float 0.597976081 └── Labels └── .cell_lines bionty.CellLine U-2 OS cell .ulabels ULabel WT, 14h Torin-1, U2OS lcklip-mNeon mChe…
In addition, we create a Collection
to hold all Artifact
that belong to this specific imaging study.
collection = ln.Collection(
artifacts,
key="Annotated autophagy imaging raw images",
description="annotated microscopy images of cells stained for autophagy markers",
).save()
Let’s look at some example images where we match images from the same clone, stimulation condition, and FOV to ensure correct channel alignment.
def plot_example_images(df, n_images=3, title_prefix=""):
"""Plot example images from dataframe"""
fig, axs = plt.subplots(1, n_images, figsize=(15, 5))
if n_images == 1:
axs = [axs]
for idx, row in df.iterrows():
path = ln.Artifact.using("scportrait/examples").get(key=row["image_path"]).cache()
image = imread(path)
axs[idx].imshow(image)
axs[idx].set_title(f"{title_prefix}{row['imaged structure']}")
axs[idx].axis("off")
return fig, axs
sorted_metadata = metadata_files.sort_values(by=["cell_line_clone", "stimulation", "FOV"])
# Plot first 3 and last 3
plot_example_images(sorted_metadata.head(3).reset_index(drop=True));
plot_example_images(sorted_metadata.tail(3).reset_index(drop=True));
→ mapped: Artifact(uid='jVytS8AyAHmHkYR30000')
→ mapped: Artifact(uid='cw4F6bUB9zuMthCY0000')
→ mapped: Artifact(uid='7TZGXvbA0JLL68hR0000')
→ mapped: Artifact(uid='Ov8FnKzHMNY0XVJa0000')
→ mapped: Artifact(uid='Lwk8shsYe0V5bMgd0000')
→ mapped: Artifact(uid='YuyVn060M4FxATPz0000')


ln.finish()
Show code cell output
→ finished Run('2gf1XFsb') after 34s at 2025-07-08 11:00:45 UTC