⚡️ Saturday ML Spark – 🎯 Threshold Tuning
Posted on: April 11, 2026
Description:
Most classification models return probabilities, but the final decision — whether something is classified as 0 or 1 — depends on a threshold. By default, this threshold is set to 0.5. But in real-world systems, this default is often not optimal.
In this project, we explore how to tune the classification threshold to better align model predictions with real-world needs.
Understanding the Problem
A classifier outputs probabilities like:
- 0.2 → likely negative
- 0.8 → likely positive
To convert these into labels, we apply a rule:
If probability ≥ 0.5 → classify as positive
However, this assumption may not always make sense.
For example:
- In fraud detection → missing fraud is costly → prefer high recall
- In spam detection → false alarms are costly → prefer high precision
The threshold directly controls this trade-off.
What Is Threshold Tuning?
Threshold tuning is the process of selecting a cutoff value that determines how probabilities are converted into class labels.
Instead of using:
y_pred = model.predict(X_test)
we use probabilities:
y_probs = model.predict_proba(X_test)[:, 1]
and apply custom thresholds.
1. Applying the Default Threshold
We first evaluate the model using the default threshold of 0.5.
y_pred_default = (y_probs >= 0.5).astype(int)
This gives us a baseline for comparison.
2. Evaluating Multiple Thresholds
We test different threshold values and observe how metrics change.
for threshold in thresholds:
y_pred = (y_probs >= threshold).astype(int)
For each threshold, we compute:
- Precision
- Recall
- F1 Score
3. Understanding the Trade-Off
Changing the threshold affects model behaviour:
- Lower threshold → more positives → higher recall, lower precision
- Higher threshold → fewer positives → higher precision, lower recall
There is no single “best” threshold — it depends on the problem.
4. Visualising Threshold Impact
Plotting metrics across thresholds makes the trade-off clearer.
plt.plot(thresholds, precision)
plt.plot(thresholds, recall)
plt.plot(thresholds, f1)
This helps identify the point where the desired balance is achieved.
Why Threshold Tuning Matters
Threshold tuning allows us to:
- align model decisions with business goals
- control false positives and false negatives
- improve practical usefulness of models
- move beyond generic accuracy metrics
It is widely used in:
- fraud detection
- medical diagnosis
- recommendation systems
- lead scoring
Key Takeaways
- Classification models output probabilities, not final decisions.
- The default threshold of 0.5 is not always optimal.
- Lower thresholds increase recall but reduce precision.
- Higher thresholds increase precision but reduce recall.
- Threshold tuning aligns ML models with real-world objectives.
Conclusion
Threshold tuning is a simple yet powerful technique that transforms how classification models are used in practice. Instead of relying on default settings, we can tailor model behaviour to match real-world priorities and constraints.
This makes threshold tuning an essential concept in the Saturday ML Spark ⚡️ – Advanced & Practical series.
Code Snippet:
# 📦 Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix
# 🧩 Load Dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
# ✂️ Split Data
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.3,
random_state=42,
stratify=y
)
# 🤖 Train Model
model = LogisticRegression(max_iter=5000)
model.fit(X_train, y_train)
# 📊 Generate Predicted Probabilities
y_probs = model.predict_proba(X_test)[:, 1]
# 🎯 Default Threshold (0.5)
y_pred_default = (y_probs >= 0.5).astype(int)
print("=== Default Threshold = 0.5 ===")
print("Precision:", precision_score(y_test, y_pred_default))
print("Recall:", recall_score(y_test, y_pred_default))
print("F1 Score:", f1_score(y_test, y_pred_default))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_default))
# 🔁 Evaluate Multiple Thresholds
thresholds = np.arange(0.1, 0.91, 0.1)
results = []
for threshold in thresholds:
y_pred = (y_probs >= threshold).astype(int)
results.append({
"threshold": threshold,
"precision": precision_score(y_test, y_pred),
"recall": recall_score(y_test, y_pred),
"f1": f1_score(y_test, y_pred)
})
results_df = pd.DataFrame(results)
print("\n=== Threshold Tuning Results ===")
print(results_df)
# 📈 Plot Metrics vs Threshold
plt.figure(figsize=(8, 5))
plt.plot(results_df["threshold"], results_df["precision"], marker="o", label="Precision")
plt.plot(results_df["threshold"], results_df["recall"], marker="o", label="Recall")
plt.plot(results_df["threshold"], results_df["f1"], marker="o", label="F1 Score")
plt.xlabel("Threshold")
plt.ylabel("Score")
plt.title("Threshold Tuning for Classification")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
# ✅ Select Best Threshold (based on F1 Score)
best_threshold = results_df.loc[results_df["f1"].idxmax(), "threshold"]
print("\nBest Threshold based on F1 Score:", best_threshold)
# 🔍 Predictions using Best Threshold
y_pred_best = (y_probs >= best_threshold).astype(int)
print("\n=== Using Best Threshold ===")
print("Precision:", precision_score(y_test, y_pred_best))
print("Recall:", recall_score(y_test, y_pred_best))
print("F1 Score:", f1_score(y_test, y_pred_best))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_best))
No comments yet. Be the first to comment!