imaging1/4 Jupyter Notebook

sc-imaging

Here, you will learn how to structure, featurize, and make a large imaging collection queryable for large-scale machine learning:

  1. Load and annotate a Collection of microscopy images (sc-imaging1/4)

  2. Generate single-cell images (sc-imaging2/4)

  3. Featurize single-cell images (sc-imaging3/4)

  4. Train model to identify autophagy positive cells (sc-imaging4/4)

First, we load and annotate a collection of microscopy images in TIFF format that was previously uploaded.

The images used here were acquired as part of a study on autophagy, a cellular process during which cells recycle their components in autophagosomes. The study tracked genetic determinants of autophagy through fluorescence microscopy of human U2OS cells.

# pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-sc-imaging --modules bionty
Hide code cell output
 initialized lamindb: testuser1/test-sc-imaging
import lamindb as ln
import bionty as bt
from tifffile import imread
import matplotlib.pyplot as plt

ln.track()
Hide code cell output
 connected lamindb: testuser1/test-sc-imaging
 created Transform('zoWuUQp9TxFF0000'), started new Run('2gf1XFsb...') at 2025-07-08 11:00:10 UTC
 notebook imports: bionty==1.6.0 lamindb==1.7.1 matplotlib==3.10.3 tifffile==2025.6.11
 recommendation: to identify the notebook across renames, pass the uid: ln.track("zoWuUQp9TxFF")

All image metadata is stored in an already ingested .csv file on the scportrait/examples instance.

metadata_files = (
    ln.Artifact.using("scportrait/examples")
    .get(key="input_data_imaging_usecase/metadata_files.csv")
    .load()
)

metadata_files.head(2)
Hide code cell output
 transferred: Artifact(uid='jdjwcUB0w1QAHAYm0000'), Storage(uid='r7YUayXjktSb')
image_path genotype stimulation cell_line cell_line_clone channel FOV magnification microscope imaged structure resolution
0 input_data_imaging_usecase/images/Timepoint001... WT 14h Torin-1 U2OS U2OS lcklip-mNeon mCherryLC3B clone 1 Alexa488 FOV1 20X Opera Phenix LckLip-mNeon 0.597976
1 input_data_imaging_usecase/images/Timepoint001... WT 14h Torin-1 U2OS U2OS lcklip-mNeon mCherryLC3B clone 1 Alexa488 FOV2 20X Opera Phenix LckLip-mNeon 0.597976
metadata_files.apply(lambda col: col.unique())
Hide code cell output
image_path          [input_data_imaging_usecase/images/Timepoint00...
genotype                                                 [WT, EI24KO]
stimulation                                  [14h Torin-1, untreated]
cell_line                                                      [U2OS]
cell_line_clone     [U2OS lcklip-mNeon mCherryLC3B clone 1, U2OS l...
channel                                     [Alexa488, DAPI, mCherry]
FOV                                                      [FOV1, FOV2]
magnification                                                   [20X]
microscope                                             [Opera Phenix]
imaged structure                    [LckLip-mNeon, DNA, mCherry-LC3B]
resolution                                              [0.597976081]
dtype: object

Curating artifacts

All images feature the U2OS cell line, captured using an Opera Phenix microscope at 20X magnification.

To induce autophagy, cells were treated under two conditions:

  • Treated: Exposed to Torin-1 (a starvation-mimicking small molecule) for 14 hours

  • Control: Left untreated

The U2OS cells were genetically engineered with fluorescently tagged proteins to visualize the process of autophagosome formation:

  • LC3B -> Autophagosome marker (visible in mCherry channel)

  • LckLip -> Membrane-targeted fluorescence protein for cell boundary visualization (visible in Alexa488 channel)

  • Hoechst -> DNA stain for nucleus identification (visible in DAPI channel)

Each image contains three separate channels:

Channel

Imaged Structure

Fluorescent Marker

1

DNA

Hoechst (DAPI)

2

Autophagosomes

LC3B (mCherry)

3

Plasma Membrane

LckLip (Alexa488)

Two genotypes were analyzed:

  • WT (Wild-type cells)

  • EI24KO (EI24 gene knockout cells)

For each genotype, two different clonal cell lines were studied, with multiple fields of view (FOVs) captured per experimental condition.

All images are annotated with corresponding metadata to enable efficient querying and analysis.

Define a schema

We define a Schema to curate metadata.

ulabel_names = ["genotype", "stimulation", "cell_line_clone", "channel", 
                "FOV", "magnification", "microscope", "imaged structure"]

autophagy_imaging_schema = ln.Schema(
    name="Autophagy imaging schema",
    features=[
        *[ln.Feature(name=name, dtype=ln.ULabel.name).save() for name in ulabel_names],
        ln.Feature(name="image_path", dtype=str, description="image path").save(),
        ln.Feature(name="cell_line", dtype=bt.CellLine.name).save(),
        ln.Feature(name="resolution", dtype=float, description="conversion factor for px to µm").save(),
    ],
    coerce_dtype=True,
).save()
Hide code cell output
! you are trying to create a record with name='cell_line' but a record with similar name exists: 'cell_line_clone'. Did you mean to load it?

Curate the dataset

curator = ln.curators.DataFrameCurator(metadata_files, autophagy_imaging_schema)

try:
    curator.validate()
except ln.core.exceptions.ValidationError as e:
    print(e)
Hide code cell output
! 2 terms not validated in feature 'genotype': 'WT', 'EI24KO'
    → fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('genotype')
! 2 terms not validated in feature 'stimulation': '14h Torin-1', 'untreated'
    → fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('stimulation')
! 4 terms not validated in feature 'cell_line_clone': 'U2OS lcklip-mNeon mCherryLC3B clone 1', 'U2OS lcklip-mNeon mCherryLC3B clone 2', 'U2OS lcklip-mNeon mCherryLC3B EI24 KO clone 1', 'U2OS lcklip-mNeon mCherryLC3B EI24 KO clone 2'
    → fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('cell_line_clone')
! 3 terms not validated in feature 'channel': 'Alexa488', 'DAPI', 'mCherry'
    → fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('channel')
! 2 terms not validated in feature 'FOV': 'FOV1', 'FOV2'
    → fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('FOV')
! 1 term not validated in feature 'magnification': '20X'
    → fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('magnification')
! 1 term not validated in feature 'microscope': 'Opera Phenix'
    → fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('microscope')
! 3 terms not validated in feature 'imaged structure': 'LckLip-mNeon', 'DNA', 'mCherry-LC3B'
    → fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('imaged structure')
! 1 term not validated in feature 'cell_line': 'U2OS'
    1 synonym found: "U2OS" → "U-2 OS cell"
    → curate synonyms via: .standardize("cell_line")
1 term not validated in feature 'cell_line': 'U2OS'
    1 synonym found: "U2OS" → "U-2 OS cell"
    → curate synonyms via: .standardize("cell_line")

Add and standardize missing terms:

curator.cat.standardize("cell_line")

for key in curator.cat.non_validated.keys():
    curator.cat.add_new_from(key)

curator.validate()
Hide code cell output
! you are trying to create a record with name='mCherry' but records with similar names exist: 'U2OS lcklip-mNeon mCherryLC3B clone 1', 'U2OS lcklip-mNeon mCherryLC3B clone 2', 'U2OS lcklip-mNeon mCherryLC3B EI24 KO clone 1'. Did you mean to load one of them?
! you are trying to create a record with name='LckLip-mNeon' but records with similar names exist: 'U2OS lcklip-mNeon mCherryLC3B clone 1', 'U2OS lcklip-mNeon mCherryLC3B clone 2', 'U2OS lcklip-mNeon mCherryLC3B EI24 KO clone 1'. Did you mean to load one of them?

Annotate images with metadata

We add images to our lamindb instance and annotate them with their metadata.

# Create study feature and associated label
ln.Feature(name="study", dtype=ln.ULabel).save()
ln.ULabel(name="autophagy imaging").save()

artifacts = []

for _, row in metadata_files.iterrows():
    artifact = ln.Artifact.using("scportrait/examples").filter(key__icontains=row["image_path"]).one()
    artifact.save()
    artifact.cell_lines.add(bt.CellLine.filter(name=row.cell_line).one())
    
    artifact.features.add_values({
        col: row[col] for col in ["genotype", "stimulation", "cell_line_clone", 
                                  "channel", "FOV", "magnification", "microscope", "resolution"]
    } | {
        "imaged structure": row["imaged structure"],
        "study": "autophagy imaging"
    })
    
    artifacts.append(artifact)
Hide code cell output
 transferred: Artifact(uid='Md4OouMExlWS2YfZ0000')
 transferred: Artifact(uid='KKbVRkOjQ1jdA2fx0000')
 transferred: Artifact(uid='CiQYTBNZrj0CPejK0000')
 transferred: Artifact(uid='W6tzE7JNiM80Ruho0000')
 transferred: Artifact(uid='YGiNq6DPfIEjtt9j0000')
 transferred: Artifact(uid='uuh41FAHEz0ASL2N0000')
 transferred: Artifact(uid='Gtwi9Pcyx8maQEWB0000')
 transferred: Artifact(uid='nSZhAypqiNZ2Ylbe0000')
 transferred: Artifact(uid='sHHpiiFYWsIXMZNV0000')
 transferred: Artifact(uid='jzqxtoduIJ3hCbB40000')
 transferred: Artifact(uid='hNMkrIHce1XrLZHY0000')
 transferred: Artifact(uid='M06liaIzh2OVEuJ40000')
 transferred: Artifact(uid='yiIMSAddDWgLgki70000')
 transferred: Artifact(uid='2ie2Kjzn1O7UYhuq0000')
 transferred: Artifact(uid='DEzw4QQAsjVZ010b0000')
 transferred: Artifact(uid='fOQSb7JCK67aeN6a0000')
 transferred: Artifact(uid='hbVyCGFARHU91Kax0000')
 transferred: Artifact(uid='qHQpdWcFu7FzF6l50000')
 transferred: Artifact(uid='mXWwV1x42Jz9RoSO0000')
 transferred: Artifact(uid='Ov8FnKzHMNY0XVJa0000')
 transferred: Artifact(uid='gj0HHnoVpEqbaUJb0000')
 transferred: Artifact(uid='Lwk8shsYe0V5bMgd0000')
 transferred: Artifact(uid='1XnEyqVt6UGXCTmV0000')
 transferred: Artifact(uid='YuyVn060M4FxATPz0000')
 transferred: Artifact(uid='jVytS8AyAHmHkYR30000')
 transferred: Artifact(uid='vCVbKkzz4CnJPPKF0000')
 transferred: Artifact(uid='cw4F6bUB9zuMthCY0000')
 transferred: Artifact(uid='5kMhlcDNek4RMeQF0000')
 transferred: Artifact(uid='7TZGXvbA0JLL68hR0000')
 transferred: Artifact(uid='9BmbViqMmlVhpfS00000')
 transferred: Artifact(uid='ThuJnRAhqkp54kyU0000')
 transferred: Artifact(uid='h4EKWveW36LIzXez0000')
 transferred: Artifact(uid='PquYNyshQTDd24Vw0000')
 transferred: Artifact(uid='fdem35nw5ztUnEIM0000')
 transferred: Artifact(uid='8SFUmW0RhBNySxBO0000')
 transferred: Artifact(uid='VkmKLUCaMsYFCuGE0000')
 transferred: Artifact(uid='OS0wBE7bviIlW7qj0000')
 transferred: Artifact(uid='9ZVngbl0JUS0XdZ70000')
 transferred: Artifact(uid='ixOpuSTsyrPXdYuA0000')
 transferred: Artifact(uid='IzP3IAwIhmM7OORD0000')
 transferred: Artifact(uid='6uMjKAk1aYlAV7Cf0000')
 transferred: Artifact(uid='RRVS8qVx3VSw02Xu0000')
 transferred: Artifact(uid='QDPX1ljp0eCMz80o0000')
 transferred: Artifact(uid='Oww4y0yYuR8pxV9q0000')
 transferred: Artifact(uid='Cvamog4G3a2XYGM80000')
 transferred: Artifact(uid='AhBvnNKg5yJcG6LU0000')
 transferred: Artifact(uid='6uUjyphUD4D1Hixc0000')
 transferred: Artifact(uid='AVRTVX9gEu4LrTAP0000')
artifacts[0].describe()
Artifact .tif
├── General
│   ├── uid: Md4OouMExlWS2YfZ0000          hash: 0aoXxT857VvKAGo9UQo-8g
│   ├── size: 2.2 MB                       space: all
│   ├── branch: main                       created_at: 2025-03-07 13:51:25
│   ├── created_by: testuser1 (Test User1)
│   ├── key: input_data_imaging_usecase/images/Timepoint001_Row01_Well01_Alexa488_zstack001_r003_c005.tif
│   ├── storage location / path: s3://lamin-eu-central-1/r7YUayXjktSb/.lamindb/Md4OouMExlWS2YfZ0000.tif
│   ├── description: raw image of U2OS cells stained for autophagy markers
│   └── transform: __lamindb_transfer__/5WAovnZ5l7ij
├── Linked features
│   └── FOV                             cat[ULabel]                        FOV1                                    
cell_line_clone                 cat[ULabel]                        U2OS lcklip-mNeon mCherryLC3B clone 1   
channel                         cat[ULabel]                        Alexa488                                
genotype                        cat[ULabel]                        WT                                      
imaged structure                cat[ULabel]                        LckLip-mNeon                            
magnification                   cat[ULabel]                        20X                                     
microscope                      cat[ULabel]                        Opera Phenix                            
stimulation                     cat[ULabel]                        14h Torin-1                             
study                           cat[ULabel]                        autophagy imaging                       
resolution                      float                              0.597976081                             
└── Labels
    └── .cell_lines                     bionty.CellLine                    U-2 OS cell                             
        .ulabels                        ULabel                             WT, 14h Torin-1, U2OS lcklip-mNeon mChe…

In addition, we create a Collection to hold all Artifact that belong to this specific imaging study.

collection = ln.Collection(
    artifacts,
    key="Annotated autophagy imaging raw images",
    description="annotated microscopy images of cells stained for autophagy markers",
).save()

Let’s look at some example images where we match images from the same clone, stimulation condition, and FOV to ensure correct channel alignment.

def plot_example_images(df, n_images=3, title_prefix=""):
    """Plot example images from dataframe"""
    fig, axs = plt.subplots(1, n_images, figsize=(15, 5))
    if n_images == 1:
        axs = [axs]
    for idx, row in df.iterrows():
        path = ln.Artifact.using("scportrait/examples").get(key=row["image_path"]).cache()
        image = imread(path)
        axs[idx].imshow(image)
        axs[idx].set_title(f"{title_prefix}{row['imaged structure']}")
        axs[idx].axis("off")
    return fig, axs

sorted_metadata = metadata_files.sort_values(by=["cell_line_clone", "stimulation", "FOV"])

# Plot first 3 and last 3
plot_example_images(sorted_metadata.head(3).reset_index(drop=True));
plot_example_images(sorted_metadata.tail(3).reset_index(drop=True));
 mapped: Artifact(uid='jVytS8AyAHmHkYR30000')
 mapped: Artifact(uid='cw4F6bUB9zuMthCY0000')
 mapped: Artifact(uid='7TZGXvbA0JLL68hR0000')
 mapped: Artifact(uid='Ov8FnKzHMNY0XVJa0000')
 mapped: Artifact(uid='Lwk8shsYe0V5bMgd0000')
 mapped: Artifact(uid='YuyVn060M4FxATPz0000')
_images/e49d5bdf330af67f1fea694fe6e391ef44a968ee4730546abf2c495ba9bf0d35.png _images/68a157d1a6417d79512f50ebf9849add92150aab7f09bbd6ffe5fc58836599ee.png
ln.finish()
Hide code cell output
 finished Run('2gf1XFsb') after 34s at 2025-07-08 11:00:45 UTC