🧠 AI with Python – 📊 Precision, Recall, and F1-Score Explained

Posted On: October 21, 2025

Description:

When evaluating a machine learning model, accuracy is often the first metric we look at.

However, in real-world scenarios — especially when dealing with imbalanced datasets — accuracy alone can be misleading.

Metrics like Precision, Recall, and F1-Score give a much deeper insight into how well your model is performing across different types of errors.

Why Accuracy Isn’t Always Enough

Imagine a dataset where 95% of samples belong to one class.

A model that simply predicts everything as that class would achieve 95% accuracy, yet it completely fails to identify the minority class.

That’s where Precision, Recall, and F1-Score come in — helping us understand how a model performs, not just how often it’s right.

Dataset and Model Setup

We’ll use the Iris dataset, restricting it to two classes for simplicity.

A Logistic Regression model will serve as our classifier.

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

iris = load_iris()
X, y = iris.data, iris.target

# Convert to binary classification
X, y = X[y != 2], y[y != 2]

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

model = LogisticRegression()
model.fit(X_train, y_train)

Evaluating Model Predictions

After training the model, we can compute Precision, Recall, and F1-Score using scikit-learn’s built-in metrics.

from sklearn.metrics import precision_score, recall_score, f1_score, classification_report

y_pred = model.predict(X_test)

precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {f1:.2f}")

print("\nClassification Report:\n")
print(classification_report(y_test, y_pred))

The classification report provides per-class metrics, along with overall averages — making it a one-stop summary for model evaluation.

Understanding the Metrics

Metric	Formula	Meaning
Precision	TP / (TP + FP)	Out of all predicted positives, how many were correct.
Recall	TP / (TP + FN)	Out of all actual positives, how many were identified correctly.
F1-Score	2 × (Precision × Recall) / (Precision + Recall)	Harmonic mean of Precision and Recall — balances both metrics.

TP (True Positive): Model predicted positive and was correct.
FP (False Positive): Model predicted positive but was wrong.
FN (False Negative): Model predicted negative but missed a positive.

When to Focus on Each Metric

High Precision Needed:

When false positives are costly (e.g., spam detection, medical diagnosis).
High Recall Needed:

When missing a positive instance is more critical (e.g., fraud detection, cancer screening).
F1-Score Balances Both:

Useful when both false positives and false negatives are important.

Example Output

A sample result from the Iris dataset:

Precision: 1.00
Recall: 0.93
F1-Score: 0.96

              precision    recall  f1-score   support
           0       1.00      1.00      1.00        16
           1       1.00      0.93      0.96        14
    accuracy                           0.98        30

Key Takeaways

Precision measures quality of positive predictions.
Recall measures completeness of positive predictions.
F1-Score balances both, especially when data is imbalanced.
Use all three metrics together for a complete evaluation picture.

Conclusion

Accuracy may tell you how often your model is correct, but Precision, Recall, and F1-Score reveal why it performs the way it does.

These metrics are indispensable in classification — especially when the cost of errors differs across outcomes.

By understanding and tracking them, you move from just “building models” to truly evaluating intelligence.

Code Snippet:

# Import libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import precision_score, recall_score, f1_score, classification_report


# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Use only two classes (binary classification)
X, y = X[y != 2], y[y != 2]

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


# Train the model
model = LogisticRegression()
model.fit(X_train, y_train)


# Make predictions
y_pred = model.predict(X_test)

# Calculate metrics
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1-Score: {f1:.2f}")

# Complete classification report
print("\nClassification Report:\n")
print(classification_report(y_test, y_pred))

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

🧠 AI with Python – 📊 Precision, Recall, and F1-Score Explained

Description:

Why Accuracy Isn’t Always Enough

Dataset and Model Setup

Evaluating Model Predictions

Understanding the Metrics

When to Focus on Each Metric

Example Output

Key Takeaways

Conclusion

Code Snippet:

Comments

Add Your Comment

🧠 AI with Python – 📊 Precision, Recall, and F1-Score Explained

Description:

Why Accuracy Isn’t Always Enough

Dataset and Model Setup

Evaluating Model Predictions

Understanding the Metrics

When to Focus on Each Metric

Example Output

Key Takeaways

Conclusion

Code Snippet:

Comments Show Comments

Add Your Comment

Related Posts

🧠 AI with Python – 📊 Multi-class Classification Evaluation with classification_report

🧠 AI with Python – Regularization in Linear Models (Ridge vs Lasso)

⚡️ Saturday AI Sparks 🤖 - 🌐📝#️⃣ Translate → Summarize → Hashtags

Comments