⚡️ Saturday ML Sparks – Elbow & Silhouette Scores 📊🧠

Posted on: December 20, 2025

Description:

Choosing the right number of clusters is one of the most important—and most confusing—parts of unsupervised learning.

Unlike supervised models, clustering algorithms don’t come with labels to validate results automatically.

In this ML Spark, we explore two widely used techniques—the Elbow Method and the Silhouette Score—to evaluate and select the optimal number of clusters when using KMeans.

Understanding the Problem

KMeans requires you to specify the number of clusters (k) before training.

Choosing a value that’s too small oversimplifies the data, while too many clusters can lead to overfitting and meaningless groupings.

To make this decision more objective, we rely on evaluation metrics that quantify clustering quality instead of guessing.

1. Generate Unlabeled Data

We start by creating a synthetic dataset with natural cluster structure.

from sklearn.datasets import make_blobs

X, _ = make_blobs(
    n_samples=400,
    centers=4,
    cluster_std=0.60,
    random_state=42
)

This simulates a real-world scenario where labels are unavailable.

2. Elbow Method – Measuring Inertia

The Elbow Method evaluates how compact clusters are using inertia, which represents the sum of squared distances between points and their assigned centroids.

inertias = []

for k in range(2, 11):
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(X)
    inertias.append(kmeans.inertia_)

As k increases, inertia always decreases—but at a diminishing rate.

3. Visualizing the Elbow Curve

Plotting inertia against the number of clusters reveals the “elbow”.

plt.plot(range(2, 11), inertias, marker="o")
plt.xlabel("Number of Clusters (k)")
plt.ylabel("Inertia")
plt.title("Elbow Method for Optimal k")
plt.show()

The elbow point indicates where adding more clusters no longer yields significant improvement.

4. Silhouette Score – Measuring Cluster Separation

The Silhouette Score evaluates how well-separated clusters are by comparing intra-cluster and inter-cluster distances.

from sklearn.metrics import silhouette_score

scores = []

for k in range(2, 11):
    labels = KMeans(n_clusters=k, random_state=42).fit_predict(X)
    scores.append(silhouette_score(X, labels))

Higher silhouette scores indicate better-defined clusters.

5. Visualizing Silhouette Scores

plt.plot(range(2, 11), scores, marker="o")
plt.xlabel("Number of Clusters (k)")
plt.ylabel("Silhouette Score")
plt.title("Silhouette Score vs k")
plt.show()

The value of k with the highest score is often the best choice.

How to Use Both Together

Use the Elbow Method to narrow down a reasonable range
Use the Silhouette Score to validate cluster separation
Prefer values where both methods agree

This combined approach leads to more confident clustering decisions.

Key Takeaways

Selecting the correct number of clusters is critical in unsupervised learning.
The Elbow Method highlights diminishing returns as clusters increase.
Silhouette Score quantifies how well clusters are separated.
Both metrics should be used together for reliable evaluation.
These techniques are essential before finalizing any KMeans model.

Conclusion

Clustering without evaluation can easily lead to misleading insights.

By using Elbow and Silhouette methods, you replace guesswork with measurable evidence—making unsupervised learning more reliable and interpretable.

These techniques are foundational tools for anyone working with clustering algorithms and real-world unlabeled data.

Code Snippet:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score


X, _ = make_blobs(
    n_samples=400,
    centers=4,
    cluster_std=0.60,
    random_state=42
)


inertias = []
k_range = range(2, 11)

for k in k_range:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(X)
    inertias.append(kmeans.inertia_)


plt.figure(figsize=(8, 5))
plt.plot(k_range, inertias, marker="o")
plt.xlabel("Number of Clusters (k)")
plt.ylabel("Inertia")
plt.title("Elbow Method for Optimal k")
plt.grid(True)
plt.show()


silhouette_scores = []

for k in k_range:
    kmeans = KMeans(n_clusters=k, random_state=42)
    labels = kmeans.fit_predict(X)
    score = silhouette_score(X, labels)
    silhouette_scores.append(score)


plt.figure(figsize=(8, 5))
plt.plot(k_range, silhouette_scores, marker="o", color="green")
plt.xlabel("Number of Clusters (k)")
plt.ylabel("Silhouette Score")
plt.title("Silhouette Score vs Number of Clusters")
plt.grid(True)
plt.show()

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

⚡️ Saturday ML Sparks – Elbow & Silhouette Scores 📊🧠

Description:

Understanding the Problem

1. Generate Unlabeled Data

2. Elbow Method – Measuring Inertia

3. Visualizing the Elbow Curve

4. Silhouette Score – Measuring Cluster Separation

5. Visualizing Silhouette Scores

How to Use Both Together

Key Takeaways

Conclusion

Code Snippet:

Comments

Add Your Comment

⚡️ Saturday ML Sparks – Elbow & Silhouette Scores 📊🧠

Description:

Understanding the Problem

1. Generate Unlabeled Data

2. Elbow Method – Measuring Inertia

3. Visualizing the Elbow Curve

4. Silhouette Score – Measuring Cluster Separation

5. Visualizing Silhouette Scores

How to Use Both Together

Key Takeaways

Conclusion

Code Snippet:

Comments Show Comments

Add Your Comment

Related Posts

⚡️ Saturday ML Spark – 📈 Gradient Boosting Classifier

⚡️ Saturday ML Spark – 🤝 Ensemble Voting Classifier

⚡️ Saturday ML Spark – 🚨 Anomaly Detection with Isolation Forest

7-Day AI Crash Course

Comments