⚡ Saturday ML Sparks – Logistic Regression Classifier 🧠📈


Description:

Logistic Regression is one of the simplest yet most powerful algorithms in machine learning.

Despite its name, it’s not used for regression problems — but for classification.

It predicts discrete outcomes, such as “spam vs. ham” or “positive vs. negative,” based on input features.

In this post, we’ll explore how to train a Logistic Regression model using scikit-learn, visualize its decision boundaries, and interpret its performance metrics.


Understanding Logistic Regression

Unlike Linear Regression, which fits a straight line to predict continuous values,

Logistic Regression models the probability that a sample belongs to a certain class.

It uses the sigmoid (logistic) function to squash outputs into the range [0, 1],

allowing the model to express probabilities and classify samples based on a threshold (commonly 0.5).


Dataset and Model Setup

To demonstrate, we’ll use synthetic data generated with scikit-learn’s make_classification(). This helps visualize the algorithm’s decision-making clearly.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Generate a simple binary classification dataset
X, y = make_classification(
    n_samples=200, n_features=2, n_redundant=0, n_clusters_per_class=1, random_state=42
)

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

This dataset contains two numerical features and a binary target — perfect for understanding the fundamentals of logistic regression.


Training the Model

We initialize and train a Logistic Regression model using scikit-learn. The algorithm learns a linear decision boundary that best separates the two classes.

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)

Once trained, the model’s parameters define the separating line between classes.


Evaluating Model Performance

To evaluate how well the model performs, we’ll compute metrics such as accuracy and visualize the confusion matrix.

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print(classification_report(y_test, y_pred))

The accuracy gives an overall measure of correctness, while the confusion matrix shows how well each class is being predicted.


Visualizing the Decision Boundary

A Logistic Regression classifier separates data using a straight line (in 2D). Visualizing this helps understand how the model generalizes the data.

import numpy as np
import matplotlib.pyplot as plt

# Create a grid for visualization
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))

Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', cmap=plt.cm.coolwarm)
plt.title("Decision Boundary - Logistic Regression")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

This plot visually confirms how well the model separates classes —

points on different sides of the boundary belong to different predicted labels.


Key Takeaways

  • Logistic Regression is a simple yet powerful classifier for binary problems.
  • It outputs probabilities that are later mapped to class labels.
  • Visualizing decision boundaries offers insight into model behavior.
  • The algorithm performs well on linearly separable data and serves as a baseline model for many ML tasks.

Conclusion

Logistic Regression remains a core building block in machine learning.

Understanding its mechanics helps you appreciate more advanced algorithms that build upon it.

With just a few lines of Python, you can visualize, train, and evaluate models that predict outcomes with surprising accuracy.


Full Script

The blog covers the essentials — find the complete notebook with all snippets & extras on GitHub Repo 👉 ML Sparks



Link copied!

Comments

Add Your Comment

Comment Added!