imaging4/4 Jupyter Notebook

Identify autophagy-positive cells

The goal of this notebook is to train a machine learning model to distininguish between autophagy-positive and autophagy-negative cells based on pre-calculated image features.

We have defined our classes as follows:

Class 0 : unstimulated WT cells
Class 1 : 14h Torin-1 stimulated WT cells

After training and evaluating our model, we want to compare cells without a functional EI24 gene (EI24KO cells) to WT cells.

import lamindb as ln
import matplotlib.pyplot as plt
import scportrait
import seaborn as sns
import numpy as np

from anndata import concat
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, roc_curve, auc

ln.track()
Hide code cell output
 connected lamindb: testuser1/test-sc-imaging
 created Transform('OCpnLJujls9F0000'), started new Run('aJLqJvFT...') at 2025-03-31 12:54:20 UTC
 notebook imports: anndata==0.11.3 lamindb==1.3.1 matplotlib==3.10.1 numpy==1.26.4 scikit-learn==1.6.1 scportrait==1.3.3 seaborn==0.13.2
def _get_cells(dataframe):
    """Extract and concatenate single-cell data for cells specified in the input dataframe."""
    sc_data = None
    for uid in dataframe.dataset.unique():
        _selected_cells = dataframe[dataframe.dataset == uid].copy()
        _sc_data = scportrait.io.read_h5sc(
            ln.Artifact.using("scportrait/examples").get(uid).cache()
        )
        _sc_data = _sc_data[
            _sc_data.obs.scportrait_cell_id.isin(_selected_cells.cell_id.values)
        ].copy()
        _sc_data.obs["score"] = _selected_cells.prob_class1.values

        if sc_data is None:
            sc_data = _sc_data
        else:
            sc_data = concat([sc_data, _sc_data], uns_merge="first", index_unique="-")
            sc_data.obs.reset_index(inplace=True, drop=True)
            sc_data.obs.index = sc_data.obs.index.values.astype(str)
    return sc_data
# Define parameters for our RandomForest Classifier
ln.Param(name="random_state", dtype="int").save()
ln.Param(name="n_estimators", dtype="int").save()
ln.Param(name="max_depth", dtype="int").save()
ln.Param(name="min_samples_split", dtype="int").save()
ln.Param(name="min_samples_leaf", dtype="int").save()
ln.Param(name="max_features", dtype="str").save()
ln.Param(name="criterion", dtype="str").save()
ln.Param(name="bootstrap", dtype="bool").save()

# Define parameter values
rfc_params = {
    "random_state": 42,
    "n_estimators": 100,
    "max_depth": 10,
    "min_samples_split": 2,
    "min_samples_leaf": 1,
    "max_features": "sqrt",
    "criterion": "gini",
    "bootstrap": True,
}

ln.track(params=rfc_params)
Hide code cell output
! record with similar name exists! did you mean to load it?
name dtype is_type _expect_many space_id type_id run_id created_at created_by_id _aux _branch_code
id
5 min_samples_split int None False 1 None 135 2025-03-31 12:54:20.828000+00:00 1 None 1
 loaded Transform('OCpnLJujls9F0000'), re-started Run('aJLqJvFT...') at 2025-03-31 12:54:20 UTC
→ params: random_state=42, n_estimators=100, max_depth=10, min_samples_split=2, min_samples_leaf=1, max_features=sqrt, criterion=gini, bootstrap=True
 notebook imports: anndata==0.11.3 lamindb==1.3.1 matplotlib==3.10.1 numpy==1.26.4 scikit-learn==1.6.1 scportrait==1.3.3 seaborn==0.13.2
study = ln.ULabel.using("scportrait/examples").get(name="autophagy imaging")

sc_datasets = (
    ln.Artifact.using("scportrait/examples")
    .filter(ulabels=study)
    .filter(ulabels__name="scportrait single-cell images")
)
featurized_datasets = (
    ln.Artifact.using("scportrait/examples")
    .filter(ulabels=study)
    .filter(description="featurized single-cell images", is_latest=True)
)

class_lookup = {0: "untreated", 1: "14h Torin-1"}
label_lookup = {v: k for k, v in class_lookup.items()}

First, lets look at some example images from our two classes. As we can see the cells look very distinct to one another. Hopefully our ML model will be able to seperate them as well.

# get example images for positive and negative autophagy
num_rows, num_cols = 4, 4
n_cells = num_rows * num_cols
channel_of_interest = 4  # LC3B channel a key autophagosome marker

autophagy_positive_example = scportrait.io.read_h5sc(
    sc_datasets.filter(ulabels__name="WT")
    .filter(ulabels__name="14h Torin-1")[0]
    .cache()
)
autophagy_negative_example = scportrait.io.read_h5sc(
    sc_datasets.filter(ulabels__name="WT").filter(ulabels__name="untreated")[0].cache()
)

# create a figure with two panels for negative and postive examples
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15, 7))
scportrait.pl.cell_grid_single_channel(
    autophagy_negative_example,
    select_channel=channel_of_interest,
    ax=axes[0],
    title="Autophagy negative cells (LC3B distribution)",
    show_fig=False,
)
scportrait.pl.cell_grid_single_channel(
    autophagy_positive_example,
    select_channel=channel_of_interest,
    ax=axes[1],
    title="Autophagy postive cells (LC3B distribution)",
    show_fig=False,
)
Hide code cell output
 completing transfer to track Artifact('zGFV103h') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='zGFV103h7KW1AbmE0000'), ULabel(uid='9hHptuyb'), ULabel(uid='A2945i5P'), ULabel(uid='KH1brNUW'), ULabel(uid='CrR7fgIZ'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('iuuMnf7x') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='iuuMnf7xC4wYmkv80000'), ULabel(uid='9hHptuyb'), ULabel(uid='EgihgpJC'), ULabel(uid='KH1brNUW'), ULabel(uid='CrR7fgIZ'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
_images/98b2f531cd3ca416454fb51af0f8dfb793445ecd919eba68ea7351c8effa41ff.png

Train ML model

# load data
wt_cells_afs = (
    featurized_datasets.filter(ulabels__name="WT", is_latest=True).distinct().one()
)
features_wt = wt_cells_afs.load()

ko_cells_afs = (
    featurized_datasets.filter(ulabels__name="EI24KO", is_latest=True).distinct().one()
)
features_ko = ko_cells_afs.load()
Hide code cell output
 completing transfer to track Artifact('JE2LqNnN') as input
 mapped records: CellLine(uid='4oLoqaUP'), ULabel(uid='9hHptuyb'), ULabel(uid='PKiCEP1h')
 transferred records: Artifact(uid='JE2LqNnNFZf1fhYz0003')
 completing transfer to track Artifact('M9qK19vd') as input
 mapped records: CellLine(uid='4oLoqaUP'), ULabel(uid='Aj8KGwbh'), ULabel(uid='PKiCEP1h')
 transferred records: Artifact(uid='M9qK19vdRPprxvA40003')
# split data into training and testing
data_train, data_test = train_test_split(features_wt, test_size=0.4, random_state=42)

# prepare data for model training by removing columns we don't want to train on
_data_train = data_train.drop(columns=["dataset", "cell_id"])
_data_train = _data_train.drop(
    columns=[col for col in data_train.columns if "mCherry" in col]
)  # subset to only include features from our channel of interest

_data_test = data_test.drop(columns=["dataset", "cell_id"])
_data_test = _data_test.drop(
    columns=[col for col in data_test.columns if "mCherry" in col]
)  # subset to only include features from our channel of interest

# Separate features and target
X_train = _data_train.drop("class", axis=1)
y_train = _data_train["class"]

X_test = _data_test.drop("class", axis=1)
y_test = _data_test["class"]

# Train model
clf = RandomForestClassifier(**rfc_params)
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)
y_scores = clf.predict_proba(X_test)

data_test["predicted_class"] = y_pred
data_test["prob_class1"] = y_scores[:, 1]
# compute confusion matrix
cm = confusion_matrix(y_test, y_pred)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
labels = np.unique(y_test)
labels = [class_lookup[label] for label in labels]

# Compute ROC curve
fpr, tpr, _ = roc_curve(y_test, y_scores[:, 1])
roc_auc = auc(fpr, tpr)

# Plot the confusion matrix and ROC Curve
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12, 4))
sns.heatmap(
    cm_normalized,
    annot=True,
    fmt=".2f",
    cmap="Blues",
    xticklabels=labels,
    yticklabels=labels,
    ax=axes[1],
    cbar=False,
)
axes[1].set_xlabel("Predicted Label")
axes[1].set_ylabel("True Label")
axes[1].set_title("Confusion Matrix [% of true label]")

axes[0].plot(fpr, tpr, color="blue", lw=2, label=f"ROC curve (AUC = {roc_auc:.2f})")
axes[0].plot([0, 1], [0, 1], color="gray", linestyle="--")  # Diagonal line
axes[0].set_xlim([0.0, 1.0])
axes[0].set_ylim([0.0, 1.0])
axes[0].set_xlabel("False Positive Rate")
axes[0].set_ylabel("True Positive Rate")
axes[0].set_title("ROC Curve")
axes[0].legend(loc="lower right", frameon=False)
fig.tight_layout()
_images/4e21136de5932e21383194bd2e4c57ecbdd691aac4688c995cf4d7dd11a9a304.png

Considering that our model was only trained on very few input images, it performs well with an AUC of 0.77.

However, since the AUC is < 1, our model is still appears to make some mistakes. We can further investigate this through a confusion matrix. Our classifier is performing well at recognizing autophagy negative cells, correctly identifying 92% of the examples in our dataset. Unfortunately, the classifier is not yet very good at correctly identifying autophagy-postive cells. Less than 50% of these examples are classified correctly.

Let’s visualize some of the cells from each class to see if we can gain some insights into what our model is doing.

# visualize some example cells with classification scores from the test dataset
num_rows, num_cols = 3, 3
n_cells = num_rows * num_cols
channel_of_interest = 4  # LC3B channel a key autophagosome marker

# annotate dataset with TP, TN, FN, FP
data_test["FP"] = (data_test["class"] == 0) & (data_test["predicted_class"] == 1)
data_test["FN"] = (data_test["class"] == 1) & (data_test["predicted_class"] == 0)
data_test["TP"] = (data_test["class"] == 1) & (data_test["predicted_class"] == 1)
data_test["TN"] = (data_test["class"] == 0) & (data_test["predicted_class"] == 0)

# get example cells for each class
cells_TP = (
    data_test[data_test.TP].sort_values("prob_class1", ascending=False).head(n_cells)
)
cells_TN = (
    data_test[data_test.TN].sort_values("prob_class1", ascending=False).tail(n_cells)
)
cells_FN = (
    data_test[data_test.FN].sort_values("prob_class1", ascending=False).tail(n_cells)
)
cells_FP = (
    data_test[data_test.FP].sort_values("prob_class1", ascending=False).head(n_cells)
)

cell_sets = {
    "TP": _get_cells(cells_TP),
    "TN": _get_cells(cells_TN),
    "FN": _get_cells(cells_FN),
    "FP": _get_cells(cells_FP),
}

# make the plot
n_panel_rows = 2
n_panel_cols = 2
fig, axes = plt.subplots(nrows=n_panel_rows, ncols=n_panel_cols, figsize=(13, 13))

for j in range(n_panel_rows):
    for i in range(n_panel_cols):
        ax = axes[j, i]
        title, cells = cell_sets.popitem()
        scportrait.pl.cell_grid_single_channel(
            cells,
            cell_ids=cells.obs.scportrait_cell_id,
            cell_labels=cells.obs.score.round(2).values,
            select_channel=channel_of_interest,
            ax=ax,
            title=f"{title} with Prob(stimulated)",
            show_fig=False,
        )
Hide code cell output
 completing transfer to track Artifact('89C8kQyV') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='89C8kQyV4Kjzj4SB0000'), ULabel(uid='9hHptuyb'), ULabel(uid='A2945i5P'), ULabel(uid='KH1brNUW'), ULabel(uid='1KPobNqK'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('zGFV103h') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='zGFV103h7KW1AbmE0000'), ULabel(uid='9hHptuyb'), ULabel(uid='A2945i5P'), ULabel(uid='KH1brNUW'), ULabel(uid='CrR7fgIZ'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('p8J4ly0v') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='p8J4ly0vv0QjuPEe0000'), ULabel(uid='9hHptuyb'), ULabel(uid='A2945i5P'), ULabel(uid='GadMJgvN'), ULabel(uid='1KPobNqK'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('1GKwxrAp') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='1GKwxrAp7XJmAqpt0000'), ULabel(uid='9hHptuyb'), ULabel(uid='A2945i5P'), ULabel(uid='GadMJgvN'), ULabel(uid='CrR7fgIZ'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('iuuMnf7x') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='iuuMnf7xC4wYmkv80000'), ULabel(uid='9hHptuyb'), ULabel(uid='EgihgpJC'), ULabel(uid='KH1brNUW'), ULabel(uid='CrR7fgIZ'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('uJ9W0phl') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='uJ9W0phl9z0QhFOY0000'), ULabel(uid='9hHptuyb'), ULabel(uid='EgihgpJC'), ULabel(uid='GadMJgvN'), ULabel(uid='1KPobNqK'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('uTNKe0Um') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='uTNKe0UmY5IOowhC0000'), ULabel(uid='9hHptuyb'), ULabel(uid='EgihgpJC'), ULabel(uid='GadMJgvN'), ULabel(uid='CrR7fgIZ'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('9m0dxLtx') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='9m0dxLtxu35ludr70000'), ULabel(uid='9hHptuyb'), ULabel(uid='EgihgpJC'), ULabel(uid='KH1brNUW'), ULabel(uid='1KPobNqK'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('89C8kQyV') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='89C8kQyV4Kjzj4SB0000'), ULabel(uid='9hHptuyb'), ULabel(uid='A2945i5P'), ULabel(uid='KH1brNUW'), ULabel(uid='1KPobNqK'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('zGFV103h') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='zGFV103h7KW1AbmE0000'), ULabel(uid='9hHptuyb'), ULabel(uid='A2945i5P'), ULabel(uid='KH1brNUW'), ULabel(uid='CrR7fgIZ'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('p8J4ly0v') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='p8J4ly0vv0QjuPEe0000'), ULabel(uid='9hHptuyb'), ULabel(uid='A2945i5P'), ULabel(uid='GadMJgvN'), ULabel(uid='1KPobNqK'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('uJ9W0phl') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='uJ9W0phl9z0QhFOY0000'), ULabel(uid='9hHptuyb'), ULabel(uid='EgihgpJC'), ULabel(uid='GadMJgvN'), ULabel(uid='1KPobNqK'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('uTNKe0Um') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='uTNKe0UmY5IOowhC0000'), ULabel(uid='9hHptuyb'), ULabel(uid='EgihgpJC'), ULabel(uid='GadMJgvN'), ULabel(uid='CrR7fgIZ'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('9m0dxLtx') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='9m0dxLtxu35ludr70000'), ULabel(uid='9hHptuyb'), ULabel(uid='EgihgpJC'), ULabel(uid='KH1brNUW'), ULabel(uid='1KPobNqK'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
_images/9dccade3f13d9aa80cde32095e71beb06bd5eba0a2635b1e36ebbb1c98f3d958.png

Looking at the FPs (i.e. cells that look stimulated despite not having been stimulated) we can see two different types of cells:

  1. Very small cells that do not show any visible autophagosomes

  2. Larger cells that clearly show some autophagosomes

The type 1 cells look like they are the result of our model making mistakes, but the type 2 cells look much more similar to TP cells than TN cells. Based on literature, we know that cells can under go spontaneous autophagy even in the absence of Torin-1, for example, as a result of nutrient scarcity. Since our class labelling is just based on cells having not been treated with Torin-1, we would be annotating these cells incorrectly. In these cases our model does not make a mistake, but has in fact uncovered mislabelled examples in the dataset.

Looking at the FNs (i.e. cells that the model identifies as having been unstimulated despite having been stimulated) the cells appear homogenous and look more comparable to the TP Population. So in this case it seems our model is making a mistake.

Before applying this model in a biological context, we would therefore probably need to invest some more time and effort into improving it. We could for example:

  1. Train our model on additional input data to make it more robust

  2. Perform a pre-screening of our training data to ensure we remove any incorrectly labelled cells

  3. Improve the image features we are training our model on to better reflect the biology we are interested in

If you are interested in finding out more you can check out the original paper where the authors trained a deep learning model to classify these cells with a much higher accuracy.

Investigate the EI24 KO cells

Now lets take a look at the EI24-deficient cells.

data_ko = features_ko.drop(columns=["dataset", "cell_id"])
data_ko = data_ko.drop(columns=[x for x in data_ko.columns if "mCherry" in x])
X_ko = data_ko.drop("class", axis=1)
y_true = data_ko["class"]
predictions_ko = clf.predict(X_ko)
# compute confusion matrix
cm = confusion_matrix(y_true, predictions_ko)
cm_normalized = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis]
labels = np.unique(y_test)
labels = [class_lookup[label] for label in labels]

# Plot the confusion matrix and ROC Curve
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(4, 2))
sns.heatmap(
    cm_normalized,
    annot=True,
    fmt=".2f",
    cmap="Blues",
    xticklabels=labels,
    yticklabels=labels,
    ax=axes,
    cbar=False,
)
axes.set_xlabel("Predicted Label")
axes.set_ylabel("True Label")
axes.set_title("Confusion Matrix [% of true label]")
fig.tight_layout()
_images/7ffedeb72db6c0c2581fce6db6f0db3a64bc4ff9a366a33d60a4d66a105bb81e.png

Interestingly, our model classifies a high percentage of stimulated EI24-KO cells as being unstimulated. Lets take a look at the images again.

# compare WT And EI24KO cells
num_rows, num_cols = 4, 4
n_cells = num_rows * num_cols
channel_of_interest = 4  # LC3B channel a key autophagosome marker

EI24_KO_stimulated = scportrait.io.read_h5sc(
    sc_datasets.filter(ulabels__name="EI24KO")
    .filter(ulabels__name="14h Torin-1")[0]
    .cache()
)
EI24_KO_unstimulated = scportrait.io.read_h5sc(
    sc_datasets.filter(ulabels__name="EI24KO")
    .filter(ulabels__name="untreated")[0]
    .cache()
)

# create a figure with two panels for negative and postive examples
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 12))
scportrait.pl.cell_grid_single_channel(
    autophagy_negative_example,
    select_channel=channel_of_interest,
    ax=axes[0, 0],
    title="WT autophagy negative cells",
    show_fig=False,
)
scportrait.pl.cell_grid_single_channel(
    autophagy_positive_example,
    select_channel=channel_of_interest,
    ax=axes[0, 1],
    title="WT autophagy postive cells",
    show_fig=False,
)
scportrait.pl.cell_grid_single_channel(
    EI24_KO_unstimulated,
    select_channel=channel_of_interest,
    ax=axes[1, 0],
    title="EI24KO unstimulated cells",
    show_fig=False,
)
scportrait.pl.cell_grid_single_channel(
    EI24_KO_stimulated,
    select_channel=channel_of_interest,
    ax=axes[1, 1],
    title="EI24KO stimulated cells",
    show_fig=False,
)
 completing transfer to track Artifact('XaTalaUN') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='XaTalaUNv7d3QwXc0000'), ULabel(uid='Aj8KGwbh'), ULabel(uid='A2945i5P'), ULabel(uid='joRCMMWX'), ULabel(uid='CrR7fgIZ'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
 completing transfer to track Artifact('cHKg1yCq') as input
 returning existing schema with same hash: Schema(uid='3Ajy5VY7H6PVoAhsFLfn', name='single-cell image dataset schema obs', n=1, itype='Feature', is_type=False, hash='0Wb4Qaes_RIx5s4g5SWeDA', minimal_set=True, ordered_set=False, maximal_set=False, space_id=1, created_by_id=1, run_id=3, created_at=2025-03-31 12:47:28 UTC)
 mapped records: Artifact(uid='cHKg1yCqvJgShsKc0001'), ULabel(uid='Aj8KGwbh'), ULabel(uid='EgihgpJC'), ULabel(uid='joRCMMWX'), ULabel(uid='CrR7fgIZ'), ULabel(uid='QrU6fxsG'), ULabel(uid='xhpmj7p7'), ULabel(uid='xHqZKcIG'), ULabel(uid='JWE2jNdk'), ULabel(uid='PKiCEP1h'), ULabel(uid='HRRTqARL'), ULabel(uid='e82fx2wm')
 transferred records: 
_images/a81238a1635c37a9e2e2e5083474174bc6a2e9c54dd6e681cfbacb0e0db6423f.png

The EI24 KO cells have fewer LC3 puncta, and seem to show a defect in the formation of autophagosomes. Even when stimulated, EI24 KO cells look comparable to unstimated cells. In this instance, out model appears to correctly identify the biological effect of a deficiency in the EI24 gene.

Trace cells back to see them in their original context

As image analysis advances, obtaining the full context of a small section of the original image is often essential.

# select a random cell from the WT dataset
cell = features_wt.sample(1, random_state=42)
dataset = cell["dataset"].values[0]
cell_id = cell["cell_id"].values[0]

# get SpatialData object and single-cell image dataset
sdata = (
    ln.Artifact.using("scportrait/examples")
    .get(
        key=ln.Artifact.using("scportrait/examples")
        .get(dataset)
        .key.replace("single_cell_data.h5ad", "spatialdata.zarr")
    )
    .load()
)
single_cell_images = scportrait.io.read_h5sc(ln.Artifact.get(dataset).cache())

# lookup location of cell of interest
x, y = sdata["centers_seg_all_nucleus"].compute().loc[cell_id, :]

# plot single-cell images and spatial context
fig, axs = plt.subplots(1, 2, figsize=(12, 3.5))
scportrait.pl.cell_grid_multi_channel(
    single_cell_images, cell_ids=cell_id, axs=axs[0], show_fig=False
)
scportrait.pl.plot_segmentation_mask(
    sdata,
    masks=["seg_all_nucleus", "seg_all_cytosol"],
    select_region=(x, y),
    max_width=100,
    axs=axs[1],
    show_fig=False,
)
axs[1].scatter(
    x,
    y,
    color="red",
    s=200,
)
fig.tight_layout()
 completing transfer to track Artifact('2OIRJYiQ') as input
 mapped records: 
 transferred records: Artifact(uid='2OIRJYiQBNZ2Mpl30003')
_images/522f1ca356f20a6dbe596b42285c7c10850b226b28e978d402ce6d0156bd9416.png
ln.finish()
Hide code cell output
! cells [(13, 15)] were not run consecutively
 finished Run('aJLqJvFT') after 2m at 2025-03-31 12:57:19 UTC
Hide code cell content
!rm -rf test-imaging
!lamin delete --force test-imaging
'testuser1/test-imaging' not found: 'instance-not-found'
Check your permissions: https://lamin.ai/testuser1/test-imaging