AW Dev Rethought

🌟 The best way to predict the future is to invent it - Alan Kay

⚡️ Saturday ML Spark – 🎯 Threshold Tuning


Description:

Most classification models return probabilities, but the final decision — whether something is classified as 0 or 1 — depends on a threshold. By default, this threshold is set to 0.5. But in real-world systems, this default is often not optimal.

In this project, we explore how to tune the classification threshold to better align model predictions with real-world needs.


Understanding the Problem

A classifier outputs probabilities like:

  • 0.2 → likely negative
  • 0.8 → likely positive

To convert these into labels, we apply a rule:

If probability ≥ 0.5 → classify as positive

However, this assumption may not always make sense.

For example:

  • In fraud detection → missing fraud is costly → prefer high recall
  • In spam detection → false alarms are costly → prefer high precision

The threshold directly controls this trade-off.


What Is Threshold Tuning?

Threshold tuning is the process of selecting a cutoff value that determines how probabilities are converted into class labels.

Instead of using:

y_pred = model.predict(X_test)

we use probabilities:

y_probs = model.predict_proba(X_test)[:, 1]

and apply custom thresholds.


1. Applying the Default Threshold

We first evaluate the model using the default threshold of 0.5.

y_pred_default = (y_probs >= 0.5).astype(int)

This gives us a baseline for comparison.


2. Evaluating Multiple Thresholds

We test different threshold values and observe how metrics change.

for threshold in thresholds:
    y_pred = (y_probs >= threshold).astype(int)

For each threshold, we compute:

  • Precision
  • Recall
  • F1 Score

3. Understanding the Trade-Off

Changing the threshold affects model behaviour:

  • Lower threshold → more positives → higher recall, lower precision
  • Higher threshold → fewer positives → higher precision, lower recall

There is no single “best” threshold — it depends on the problem.


4. Visualising Threshold Impact

Plotting metrics across thresholds makes the trade-off clearer.

plt.plot(thresholds, precision)
plt.plot(thresholds, recall)
plt.plot(thresholds, f1)

This helps identify the point where the desired balance is achieved.


Why Threshold Tuning Matters

Threshold tuning allows us to:

  • align model decisions with business goals
  • control false positives and false negatives
  • improve practical usefulness of models
  • move beyond generic accuracy metrics

It is widely used in:

  • fraud detection
  • medical diagnosis
  • recommendation systems
  • lead scoring

Key Takeaways

  1. Classification models output probabilities, not final decisions.
  2. The default threshold of 0.5 is not always optimal.
  3. Lower thresholds increase recall but reduce precision.
  4. Higher thresholds increase precision but reduce recall.
  5. Threshold tuning aligns ML models with real-world objectives.

Conclusion

Threshold tuning is a simple yet powerful technique that transforms how classification models are used in practice. Instead of relying on default settings, we can tailor model behaviour to match real-world priorities and constraints.

This makes threshold tuning an essential concept in the Saturday ML Spark ⚡️ – Advanced & Practical series.


Code Snippet:

# 📦 Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix


# 🧩 Load Dataset
data = load_breast_cancer()

X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target


# ✂️ Split Data
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42,
    stratify=y
)


# 🤖 Train Model
model = LogisticRegression(max_iter=5000)
model.fit(X_train, y_train)


# 📊 Generate Predicted Probabilities
y_probs = model.predict_proba(X_test)[:, 1]


# 🎯 Default Threshold (0.5)
y_pred_default = (y_probs >= 0.5).astype(int)

print("=== Default Threshold = 0.5 ===")
print("Precision:", precision_score(y_test, y_pred_default))
print("Recall:", recall_score(y_test, y_pred_default))
print("F1 Score:", f1_score(y_test, y_pred_default))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_default))


# 🔁 Evaluate Multiple Thresholds
thresholds = np.arange(0.1, 0.91, 0.1)

results = []

for threshold in thresholds:
    y_pred = (y_probs >= threshold).astype(int)

    results.append({
        "threshold": threshold,
        "precision": precision_score(y_test, y_pred),
        "recall": recall_score(y_test, y_pred),
        "f1": f1_score(y_test, y_pred)
    })

results_df = pd.DataFrame(results)

print("\n=== Threshold Tuning Results ===")
print(results_df)


# 📈 Plot Metrics vs Threshold
plt.figure(figsize=(8, 5))

plt.plot(results_df["threshold"], results_df["precision"], marker="o", label="Precision")
plt.plot(results_df["threshold"], results_df["recall"], marker="o", label="Recall")
plt.plot(results_df["threshold"], results_df["f1"], marker="o", label="F1 Score")

plt.xlabel("Threshold")
plt.ylabel("Score")
plt.title("Threshold Tuning for Classification")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()


# ✅ Select Best Threshold (based on F1 Score)
best_threshold = results_df.loc[results_df["f1"].idxmax(), "threshold"]

print("\nBest Threshold based on F1 Score:", best_threshold)


# 🔍 Predictions using Best Threshold
y_pred_best = (y_probs >= best_threshold).astype(int)

print("\n=== Using Best Threshold ===")
print("Precision:", precision_score(y_test, y_pred_best))
print("Recall:", recall_score(y_test, y_pred_best))
print("F1 Score:", f1_score(y_test, y_pred_best))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_best))

Link copied!

Comments

Add Your Comment

Comment Added!