AW Dev Rethought

Truth can only be found in one place: the code - Robert C. Martin

🧠 AI with Python – 🧩 Stacking Models


Description:

In machine learning, no single model is perfect for every problem. Different algorithms capture different types of patterns, strengths, and weaknesses.

What if we could combine multiple models into one smarter system? This idea leads to stacking, one of the most powerful ensemble learning techniques in machine learning.

In this project, we explore how stacking works and how a meta-model combines predictions from multiple base models.


Understanding the Problem

Different models behave differently on the same dataset.

For example:

  • RandomForest → strong at non-linear relationships
  • SVM → strong margin-based classifier
  • Logistic Regression → simple and stable linear model

Each model may perform well in certain areas while failing in others.

Instead of choosing only one model, stacking combines them.


What Is Stacking?

Stacking is an ensemble learning technique where:

  1. Multiple base models are trained independently
  2. Their predictions are collected
  3. A meta-model learns from those predictions

The final prediction is generated by the meta-model.


1. Define Base Models

We first create multiple base learners.

base_models = [
    ("rf", RandomForestClassifier()),
    ("svc", SVC(probability=True)),
    ("lr", LogisticRegression())
]

Each model contributes different predictive behavior.


2. Define the Meta-Model

The meta-model combines outputs from base models.

meta_model = LogisticRegression()

This model learns how to optimally combine predictions.


3. Build the Stacking Ensemble

stacking_model = StackingClassifier(
    estimators=base_models,
    final_estimator=meta_model
)

The stacking system automatically handles prediction flow between models.


4. Train the Ensemble

stacking_model.fit(X_train, y_train)

Base models learn first, then the meta-model learns from their outputs.


5. Generate Predictions

y_pred = stacking_model.predict(X_test)

The final prediction is produced by the ensemble system.


Why Stacking Works

Stacking works because:

  • different models learn different patterns
  • weaknesses of one model may be compensated by another
  • combining models improves generalization

This often leads to better predictive performance.


Where Stacking Is Used

Stacking is widely used in:

  • Kaggle competitions
  • fraud detection
  • recommendation systems
  • financial prediction
  • high-performance ML systems

It is especially useful when maximizing predictive quality is important.


Advantages of Stacking

  • improved accuracy
  • stronger generalization
  • reduced individual model weaknesses
  • flexible model combinations

Key Takeaways

  1. Stacking combines multiple ML models into one ensemble.
  2. Base models learn independently from the same dataset.
  3. A meta-model learns from base model predictions.
  4. Different algorithms contribute different strengths.
  5. Stacking is a powerful advanced ensemble technique.

Conclusion

Stacking is one of the most effective ways to combine machine learning models into a stronger predictive system. By using a meta-model to learn from multiple base learners, we can build ensembles that outperform individual algorithms in many real-world scenarios.


Code Snippet:

# 📦 Import Required Libraries
import pandas as pd

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, StackingClassifier
from sklearn.svm import SVC


# 🧩 Load Dataset
data = load_breast_cancer()

X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target


# ✂️ Split Data
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42,
    stratify=y
)


# =========================================================
# 🤖 Define Base Models
# =========================================================

base_models = [
    ("rf", RandomForestClassifier(random_state=42)),
    ("svc", SVC(probability=True, random_state=42)),
    ("lr", LogisticRegression(max_iter=5000))
]


# =========================================================
# 🧠 Define Meta-Model
# =========================================================

meta_model = LogisticRegression(max_iter=5000)


# =========================================================
# 🚀 Build Stacking Classifier
# =========================================================

stacking_model = StackingClassifier(
    estimators=base_models,
    final_estimator=meta_model
)


# =========================================================
# 🤖 Train Stacking Model
# =========================================================

stacking_model.fit(X_train, y_train)


# =========================================================
# 📊 Generate Predictions
# =========================================================

y_pred = stacking_model.predict(X_test)


# =========================================================
# ✅ Evaluate Model
# =========================================================

print("Accuracy:", accuracy_score(y_test, y_pred))

print("\nClassification Report:\n")
print(classification_report(y_test, y_pred))


# =========================================================
# 🔍 Predict on New Samples
# =========================================================

sample = X_test.iloc[:5]

sample_predictions = stacking_model.predict(sample)

print("\nSample Predictions:")
print(sample_predictions)

Link copied!

Comments

Add Your Comment

Comment Added!