🧠 AI with Python – 📈 Partial Dependence Plots (PDP)
Posted on: February 10, 2026
Description:
As machine learning models become more powerful, understanding how individual features influence predictions becomes increasingly important. Accuracy alone is not enough — we need clarity into model behaviour.
In this project, we explore Partial Dependence Plots (PDP), a classic interpretability technique that helps visualise the average effect of a feature on model predictions.
Understanding the Problem
Many machine learning models learn complex, non-linear relationships between features and the target. While this improves predictive power, it also makes models harder to interpret.
Key questions often arise:
- Does increasing this feature increase or decrease predictions?
- Is the relationship linear or non-linear?
- Are there thresholds where behaviour changes?
Partial Dependence Plots help answer these questions in a simple, visual way.
What Are Partial Dependence Plots?
A Partial Dependence Plot shows how the average predicted outcome changes as the value of a feature changes, while all other features are averaged out.
In simple terms:
“If this feature changes, what happens to the prediction on average?”
PDPs can be created for:
- single features
- pairs of features (2D PDPs)
1. Training a Model
To demonstrate PDPs, we first train a machine learning model on tabular data.
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(
n_estimators=200,
random_state=42
)
model.fit(X_train, y_train)
Tree-based models are well-suited for PDP analysis.
2. Generating Partial Dependence Plots
Once the model is trained, we generate PDPs for selected features.
from sklearn.inspection import PartialDependenceDisplay
features = ["mean radius", "mean texture"]
PartialDependenceDisplay.from_estimator(
model,
X_train,
features=features,
kind="average"
)
Each plot shows how the model’s prediction changes as the feature value varies.
How to Read a PDP
- Upward curve → increasing feature value increases prediction
- Downward curve → increasing feature value decreases prediction
- Flat line → feature has little influence
- Curves / bends → non-linear relationships
These insights are extremely useful for validating whether the model’s behavior aligns with domain knowledge.
When PDPs Can Be Misleading
PDPs assume that features are independent.
If features are highly correlated, the plotted values may represent unrealistic combinations.
Because of this:
- PDPs work best with low-correlated features
- They should be used alongside other techniques like SHAP or ICE plots
Why PDPs Matter in Practice
Partial Dependence Plots help:
- build trust with stakeholders
- debug unexpected model behavior
- validate feature engineering decisions
- support explainability requirements
They are often used in regulated or high-stakes domains.
Key Takeaways
- Partial Dependence Plots show average feature effects on predictions.
- They help visualize non-linear relationships.
- PDPs marginalize all other features.
- Best used when features are not highly correlated.
- A foundational interpretability tool for ML practitioners.
Conclusion
Partial Dependence Plots provide a clear and intuitive way to understand how individual features influence machine learning models. While they do not capture every interaction, they offer valuable insights into model behavior at a global level.
When combined with techniques like SHAP and permutation importance, PDPs become a powerful part of the Advanced Visualization & Interpretability toolkit in the AI with Python series.
Code Snippet:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.inspection import PartialDependenceDisplay
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.3,
random_state=42,
stratify=y
)
model = RandomForestClassifier(
n_estimators=200,
random_state=42
)
model.fit(X_train, y_train)
features = ["mean radius", "mean texture"]
PartialDependenceDisplay.from_estimator(
model,
X_train,
features=features,
kind="average"
)
plt.tight_layout()
plt.show()
No comments yet. Be the first to comment!