⚡️ Saturday ML Sparks – 🧩 Customer Segmentation with Clustering
Posted on: January 10, 2026
Description:
Customer segmentation is one of the most practical and widely used applications of unsupervised learning.
Instead of predicting labels, clustering helps businesses understand patterns in customer behavior and group similar customers together.
In this project, we use KMeans clustering to segment customers based on income and spending behavior — a classic real-world analytics use case.
Understanding the Problem
Businesses often work with large customer bases where behavior varies significantly.
Without segmentation, all customers are treated the same, leading to inefficient marketing and poor personalization.
Customer segmentation helps answer questions like:
- Which customers spend the most?
- Are there distinct buying patterns?
- How can marketing strategies be tailored for different groups?
Since there are no predefined labels, this becomes an unsupervised learning problem.
1. Preparing Customer Data
For demonstration, we simulate customer attributes commonly used in segmentation.
import numpy as np
import pandas as pd
np.random.seed(42)
data = {
"annual_income": np.random.randint(20000, 120000, 300),
"spending_score": np.random.randint(1, 100, 300)
}
df = pd.DataFrame(data)
These features are representative of real-world customer analytics datasets.
2. Feature Scaling Before Clustering
KMeans relies on distance calculations, making feature scaling essential.
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df)
Without scaling, features with larger ranges (like income) would dominate the clustering process.
3. Applying KMeans Clustering
We cluster customers into four segments.
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=4, random_state=42)
df["cluster"] = kmeans.fit_predict(X_scaled)
Each customer is now assigned to a cluster based on similarity.
4. Visualizing Customer Segments
Visualization makes clusters interpretable and actionable.
import matplotlib.pyplot as plt
for cluster in df["cluster"].unique():
subset = df[df["cluster"] == cluster]
plt.scatter(
subset["annual_income"],
subset["spending_score"],
label=f"Cluster {cluster}"
)
plt.xlabel("Annual Income")
plt.ylabel("Spending Score")
plt.title("Customer Segmentation using KMeans")
plt.legend()
plt.show()
This plot reveals distinct customer groups based on income and spending patterns.
Interpreting the Clusters
Each cluster typically represents a unique customer segment, such as:
- High income, high spenders
- High income, low spenders
- Low income, high spenders
- Low income, low spenders
These insights enable targeted promotions, pricing strategies, and personalized experiences.
Key Takeaways
- Customer segmentation is a core unsupervised learning use case.
- KMeans clusters customers based on similarity, not labels.
- Feature scaling is mandatory for distance-based algorithms.
- Visualizing clusters helps translate ML output into business insight.
- Segmentation supports personalization and data-driven decision-making.
Conclusion
Customer segmentation demonstrates the real power of unsupervised learning in business contexts.
By grouping customers based on behavior, organizations can move beyond one-size-fits-all strategies and deliver more personalized, effective experiences.
This Saturday ML Spark highlights how a simple clustering technique like KMeans can unlock actionable insights from raw customer data — making it a foundational skill for applied machine learning and analytics.
Code Snippet:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
np.random.seed(42)
data = {
"annual_income": np.random.randint(20000, 120000, 300),
"spending_score": np.random.randint(1, 100, 300)
}
df = pd.DataFrame(data)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(df)
kmeans = KMeans(
n_clusters=4,
random_state=42
)
df["cluster"] = kmeans.fit_predict(X_scaled)
plt.figure(figsize=(8, 5))
for cluster in df["cluster"].unique():
subset = df[df["cluster"] == cluster]
plt.scatter(
subset["annual_income"],
subset["spending_score"],
label=f"Cluster {cluster}"
)
plt.xlabel("Annual Income")
plt.ylabel("Spending Score")
plt.title("Customer Segmentation using KMeans")
plt.legend()
plt.grid(True)
plt.show()
No comments yet. Be the first to comment!