AW Dev Rethought

Programs must be written for people to read, and only incidentally for machines to execute - Harold Abelson

⚡️ Saturday ML Spark – 🤝 Ensemble Voting Classifier


Description:

In real-world machine learning, relying on a single model is often risky. Different algorithms learn different patterns, and each has its own strengths and weaknesses.

In this project, we explore the Ensemble Voting Classifier — a practical technique that combines multiple models to make more stable and accurate predictions.


Understanding the Problem

Single models can suffer from:

  • high variance (overfitting)
  • bias toward certain patterns
  • instability across different data splits

Instead of betting on one model, ensemble methods aggregate decisions from multiple models, reducing individual weaknesses and improving overall reliability.


What Is a Voting Classifier?

A Voting Classifier is an ensemble technique where:

  • multiple base classifiers are trained independently
  • predictions are combined using a voting strategy

There are two common approaches:

  • Hard voting → majority vote of predicted labels
  • Soft voting → average of predicted probabilities (preferred)

1. Preparing Multiple Base Models

We begin by training different types of classifiers so each contributes a unique perspective.

lr = LogisticRegression(max_iter=1000)
knn = KNeighborsClassifier(n_neighbors=5)
dt = DecisionTreeClassifier(random_state=42)

Using diverse models is key to an effective ensemble.


2. Building the Voting Classifier

We combine all base models into a single ensemble using soft voting.

voting_clf = VotingClassifier(
    estimators=[
        ("lr", lr),
        ("knn", knn),
        ("dt", dt)
    ],
    voting="soft"
)

Soft voting works by averaging prediction probabilities across models.


3. Training the Ensemble

The Voting Classifier trains each base model internally.

voting_clf.fit(X_train, y_train)

Once trained, it behaves like a single unified model.


4. Comparing Ensemble vs Individual Models

To understand the benefit of ensembling, we compare accuracy across models.

for name, model in models.items():
    preds = model.predict(X_test)
    acc = accuracy_score(y_test, preds)
    print(f"{name}: {acc:.3f}")

In most cases, the ensemble matches or outperforms individual classifiers.


Why Ensemble Voting Works

  • Reduces overfitting by averaging predictions
  • Balances bias and variance
  • Improves robustness to noise
  • Widely used in Kaggle competitions and production systems

Voting classifiers are simple yet extremely effective.


Key Takeaways

  1. Ensemble models combine strengths of multiple algorithms.
  2. Voting classifiers improve prediction stability.
  3. Soft voting is preferred when probabilities are available.
  4. Ensembles often outperform single models.
  5. A practical and production-ready ML technique.

Conclusion

The Ensemble Voting Classifier demonstrates how combining multiple models can lead to better, more reliable machine learning systems. By aggregating predictions instead of trusting a single algorithm, we build models that generalize better and perform more consistently in real-world scenarios.

This makes ensemble voting a core technique in Saturday ML Spark ⚡️ – Advanced & Practical machine learning workflows.


Code Snippet:

# 📦 Import Required Libraries
import pandas as pd

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score


# 🧩 Load Dataset
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target


# ✂️ Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.3,
    random_state=42,
    stratify=y
)


# 🤖 Define Base Classifiers
lr = LogisticRegression(max_iter=1000)
knn = KNeighborsClassifier(n_neighbors=5)
dt = DecisionTreeClassifier(random_state=42)


# 🤝 Create Voting Classifier (Soft Voting)
voting_clf = VotingClassifier(
    estimators=[
        ("lr", lr),
        ("knn", knn),
        ("dt", dt)
    ],
    voting="soft"
)


# 🚀 Train Ensemble Model
voting_clf.fit(X_train, y_train)


# 📊 Evaluate Individual Models vs Ensemble
models = {
    "Logistic Regression": lr,
    "KNN": knn,
    "Decision Tree": dt,
    "Voting Classifier": voting_clf
}

print("Model Accuracy Comparison:\n")

for name, model in models.items():
    model.fit(X_train, y_train)
    preds = model.predict(X_test)
    acc = accuracy_score(y_test, preds)
    print(f"{name}: {acc:.3f}")

Link copied!

Comments

Add Your Comment

Comment Added!