🧠 AI with Python – 📉 Customer Churn Prediction (RF + SMOTE)

Posted on: January 6, 2026

Description:

Customer churn is one of the most impactful problems businesses face today. Retaining an existing customer is often far cheaper than acquiring a new one, which makes early churn prediction a critical business capability.

In this project, we build a customer churn prediction model using Random Forest, while addressing a common real-world challenge: class imbalance. To handle this, we use SMOTE (Synthetic Minority Over-sampling Technique) to ensure the model learns churn patterns effectively.

Understanding the Problem

In churn datasets, the number of customers who do not churn is usually much higher than those who do. This imbalance creates two major issues:

Models become biased toward the majority (non-churn) class
Accuracy appears high even when churned customers are poorly detected

To solve this, we must:

Balance the dataset
Use evaluation metrics that focus on minority class performance

1. Preparing a Churn-Like Dataset

For demonstration, we simulate a churn-style dataset with imbalance.

from sklearn.datasets import make_classification
import pandas as pd

X, y = make_classification(
    n_samples=5000,
    n_features=10,
    n_informative=6,
    n_redundant=2,
    weights=[0.75, 0.25],
    random_state=42
)

df = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(10)])
df["churn"] = y

This structure closely mirrors real churn datasets where churned users form a minority.

2. Train/Test Split with Stratification

We preserve class proportions while splitting the data.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    df.drop("churn", axis=1),
    df["churn"],
    test_size=0.3,
    stratify=df["churn"],
    random_state=42
)

Stratification prevents accidental class skew in the test set.

3. Handling Class Imbalance with SMOTE

SMOTE generates synthetic examples for the minority class.

from imblearn.over_sampling import SMOTE

smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

This helps the model learn decision boundaries for churned customers.

4. Training a Random Forest Model

Random Forest is well-suited for tabular business data with non-linear relationships.

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(
    n_estimators=200,
    max_depth=8,
    random_state=42
)

model.fit(X_resampled, y_resampled)

Ensemble methods like Random Forest are robust and handle feature interactions naturally.

5. Evaluating the Churn Model

Evaluation focuses on minority class performance.

from sklearn.metrics import classification_report, confusion_matrix

y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

Metrics like recall and F1-score are more important than raw accuracy for churn prediction.

6. Interpreting Feature Importance

Understanding why customers churn is as important as predicting it.

import pandas as pd

importances = pd.Series(
    model.feature_importances_,
    index=X_train.columns
).sort_values(ascending=False)

print(importances.head())

This insight helps teams design targeted retention strategies.

Key Takeaways

Customer churn prediction is inherently an imbalanced classification problem.
SMOTE helps models learn patterns from minority churned users.
Random Forest performs strongly on tabular customer data.
Recall and F1-score are critical metrics for churn use cases.
Feature importance bridges ML predictions with business decisions.

Conclusion

Customer churn prediction is a classic example of how machine learning directly supports business outcomes.

By combining SMOTE for imbalance handling and Random Forest for modeling, we build a practical, real-world churn prediction system.

This project demonstrates the transition from academic ML examples to production-relevant ML workflows, making it a strong addition to any applied machine learning toolkit.

Code Snippet:

import pandas as pd
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=5000,
    n_features=10,
    n_informative=6,
    n_redundant=2,
    weights=[0.75, 0.25],   # churn imbalance
    random_state=42
)

df = pd.DataFrame(X, columns=[f"feature_{i}" for i in range(10)])
df["churn"] = y


from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    df.drop("churn", axis=1),
    df["churn"],
    test_size=0.3,
    stratify=df["churn"],
    random_state=42
)


from imblearn.over_sampling import SMOTE

smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)


from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(
    n_estimators=200,
    max_depth=8,
    random_state=42
)

model.fit(X_resampled, y_resampled)


from sklearn.metrics import classification_report, confusion_matrix

y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))


import pandas as pd

importances = pd.Series(
    model.feature_importances_,
    index=X_train.columns
).sort_values(ascending=False)

print(importances.head())

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

🧠 AI with Python – 📉 Customer Churn Prediction (RF + SMOTE)

Description:

Understanding the Problem

1. Preparing a Churn-Like Dataset

2. Train/Test Split with Stratification

3. Handling Class Imbalance with SMOTE

4. Training a Random Forest Model

5. Evaluating the Churn Model

6. Interpreting Feature Importance

Key Takeaways

Conclusion

Code Snippet:

Comments

Add Your Comment

🧠 AI with Python – 📉 Customer Churn Prediction (RF + SMOTE)

Description:

Understanding the Problem

1. Preparing a Churn-Like Dataset

2. Train/Test Split with Stratification

3. Handling Class Imbalance with SMOTE

4. Training a Random Forest Model

5. Evaluating the Churn Model

6. Interpreting Feature Importance

Key Takeaways

Conclusion

Code Snippet:

Comments Show Comments

Add Your Comment

Related Posts

🧠 AI with Python – 📊 Reliability Diagrams

🧠 AI with Python – 📈 Model Calibration Curves

🧠 AI with Python – 🔥 Feature Interaction Heatmaps

7-Day AI Crash Course

Comments