AW Dev Rethought

⚖️ There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies - C.A.R. Hoare

⚡️ Saturday ML Sparks – 🧩 Customer Segmentation with Clustering


Description:

Customer segmentation is one of the most practical and widely used applications of unsupervised learning.

Instead of predicting labels, clustering helps businesses understand patterns in customer behavior and group similar customers together.

In this project, we use KMeans clustering to segment customers based on income and spending behavior — a classic real-world analytics use case.


Understanding the Problem

Businesses often work with large customer bases where behavior varies significantly.

Without segmentation, all customers are treated the same, leading to inefficient marketing and poor personalization.

Customer segmentation helps answer questions like:

  • Which customers spend the most?
  • Are there distinct buying patterns?
  • How can marketing strategies be tailored for different groups?

Since there are no predefined labels, this becomes an unsupervised learning problem.


1. Preparing Customer Data

For demonstration, we simulate customer attributes commonly used in segmentation.

import numpy as np
import pandas as pd

np.random.seed(42)

data = {
    "annual_income": np.random.randint(20000, 120000, 300),
    "spending_score": np.random.randint(1, 100, 300)
}

df = pd.DataFrame(data)

These features are representative of real-world customer analytics datasets.


2. Feature Scaling Before Clustering

KMeans relies on distance calculations, making feature scaling essential.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(df)

Without scaling, features with larger ranges (like income) would dominate the clustering process.


3. Applying KMeans Clustering

We cluster customers into four segments.

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=4, random_state=42)
df["cluster"] = kmeans.fit_predict(X_scaled)

Each customer is now assigned to a cluster based on similarity.


4. Visualizing Customer Segments

Visualization makes clusters interpretable and actionable.

import matplotlib.pyplot as plt

for cluster in df["cluster"].unique():
    subset = df[df["cluster"] == cluster]
    plt.scatter(
        subset["annual_income"],
        subset["spending_score"],
        label=f"Cluster {cluster}"
    )

plt.xlabel("Annual Income")
plt.ylabel("Spending Score")
plt.title("Customer Segmentation using KMeans")
plt.legend()
plt.show()

This plot reveals distinct customer groups based on income and spending patterns.


Interpreting the Clusters

Each cluster typically represents a unique customer segment, such as:

  • High income, high spenders
  • High income, low spenders
  • Low income, high spenders
  • Low income, low spenders

These insights enable targeted promotions, pricing strategies, and personalized experiences.


Key Takeaways

  1. Customer segmentation is a core unsupervised learning use case.
  2. KMeans clusters customers based on similarity, not labels.
  3. Feature scaling is mandatory for distance-based algorithms.
  4. Visualizing clusters helps translate ML output into business insight.
  5. Segmentation supports personalization and data-driven decision-making.

Conclusion

Customer segmentation demonstrates the real power of unsupervised learning in business contexts.

By grouping customers based on behavior, organizations can move beyond one-size-fits-all strategies and deliver more personalized, effective experiences.

This Saturday ML Spark highlights how a simple clustering technique like KMeans can unlock actionable insights from raw customer data — making it a foundational skill for applied machine learning and analytics.


Code Snippet:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler


np.random.seed(42)

data = {
    "annual_income": np.random.randint(20000, 120000, 300),
    "spending_score": np.random.randint(1, 100, 300)
}

df = pd.DataFrame(data)


scaler = StandardScaler()
X_scaled = scaler.fit_transform(df)


kmeans = KMeans(
    n_clusters=4,
    random_state=42
)

df["cluster"] = kmeans.fit_predict(X_scaled)


plt.figure(figsize=(8, 5))

for cluster in df["cluster"].unique():
    subset = df[df["cluster"] == cluster]
    plt.scatter(
        subset["annual_income"],
        subset["spending_score"],
        label=f"Cluster {cluster}"
    )

plt.xlabel("Annual Income")
plt.ylabel("Spending Score")
plt.title("Customer Segmentation using KMeans")
plt.legend()
plt.grid(True)
plt.show()

Link copied!

Comments

Add Your Comment

Comment Added!