AW Dev Rethought

Programs must be written for people to read, and only incidentally for machines to execute - Harold Abelson

🧠 AI with Python – 📈 Model Calibration Curves


Description:

Accuracy alone does not tell the full story of a classification model. In many real-world systems, what truly matters is whether predicted probabilities reflect reality.

If a model predicts a 70% probability of an event, does that event actually occur about 70% of the time?

In this project, we explore Model Calibration Curves, also known as Reliability Diagrams, to evaluate how trustworthy predicted probabilities really are.


Understanding the Problem

Most classification models output probabilities using predict_proba().

However:

  • A highly accurate model can still be poorly calibrated.
  • Some models are overconfident.
  • Others are systematically under-confident.

This becomes critical in domains like:

  • Healthcare risk prediction
  • Credit scoring
  • Fraud detection
  • Insurance pricing

Calibration ensures probability estimates are meaningful.


What Is a Calibration Curve?

A calibration curve compares:

  • Predicted probabilities
  • Observed true frequencies

If a model is perfectly calibrated, the curve follows a diagonal line.

If it deviates:

  • Below diagonal → overconfident model
  • Above diagonal → under-confident model

1. Train a Classification Model

We first train a classifier on structured data.

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(
    n_estimators=200,
    random_state=42
)

model.fit(X_train, y_train)

2. Generate Predicted Probabilities

We use probabilities for the positive class.

y_probs = model.predict_proba(X_test)[:, 1]

3. Compute the Calibration Curve

from sklearn.calibration import calibration_curve

prob_true, prob_pred = calibration_curve(
    y_test,
    y_probs,
    n_bins=10
)

The dataset is divided into bins, and average predicted probability is compared to actual frequency.


4. Plot the Calibration Curve

plt.plot([0, 1], [0, 1], linestyle="--")
plt.plot(prob_pred, prob_true, marker="o")

The dashed line represents perfect calibration.


Why Calibration Matters

Even a model with high ROC-AUC can be poorly calibrated.

Calibration affects:

  • Risk thresholds
  • Decision policies
  • Business costs
  • Regulatory compliance

Understanding calibration is essential for building trustworthy ML systems.


Key Takeaways

  1. Calibration measures probability reliability.
  2. High accuracy does not guarantee trustworthy probabilities.
  3. Overconfident models can be dangerous in production.
  4. Calibration curves visually diagnose prediction bias.
  5. Essential for risk-sensitive and decision-based ML systems.

Conclusion

Model calibration curves help ensure that predicted probabilities truly reflect real-world outcomes. In probability-driven systems, calibration is just as important as accuracy — making this technique a critical addition to the Advanced Visualisation & Interpretability toolkit.


Code Snippet:

# 📦 Import Required Libraries
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.calibration import calibration_curve


# 🧩 Load Dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target


# ✂️ Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42,
    stratify=y
)


# 🤖 Train Model
model = RandomForestClassifier(
    n_estimators=200,
    random_state=42
)

model.fit(X_train, y_train)


# 📊 Generate Predicted Probabilities
y_probs = model.predict_proba(X_test)[:, 1]


# 📈 Compute Calibration Curve
prob_true, prob_pred = calibration_curve(
    y_test,
    y_probs,
    n_bins=10
)


# 📊 Plot Calibration Curve
plt.figure(figsize=(6, 6))

# Perfect calibration reference
plt.plot([0, 1], [0, 1], linestyle="--", label="Perfect Calibration")

# Model curve
plt.plot(prob_pred, prob_true, marker="o", label="Model")

plt.xlabel("Predicted Probability")
plt.ylabel("True Probability")
plt.title("Model Calibration Curve")
plt.legend()
plt.tight_layout()
plt.show()

Link copied!

Comments

Add Your Comment

Comment Added!