🧠 AI with Python – Polynomial Regression with scikit-learn
Posted On: October 9, 2025
Description:
Introduction
Linear regression is powerful, but it struggles when relationships are non-linear.
Polynomial Regression extends linear regression by first transforming the features into polynomial terms (e.g., x, x^2, x^3), then fitting a linear model on those expanded features.
In this post, we’ll build a Pipeline with PolynomialFeatures and LinearRegression, evaluate with standard metrics (MAE, MSE, R²), and visualize how well the curve fits a synthetic quadratic dataset.
When to Use Polynomial Regression
- Your residuals show curvature (systematic under/over-prediction with a line).
- You want a lightweight alternative to tree models for smooth trends.
- You need an interpretable curve with a small number of terms.
Minimal Implementation (Core Idea)
Build a Pipeline:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
degree = 2
model = Pipeline([
("poly", PolynomialFeatures(degree=degree, include_bias=False)),
("linreg", LinearRegression())
])
Fit & Evaluate (sketch):
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Use MAE, MSE, and R² to compare different degrees (e.g., 1–5).
Visualising the Fit
To see how well the model captures the pattern, sort X across the range, predict smoothly, and overlay the fitted curve on top of the scatter plot of noisy points and the true function.
Practical Tips
- Bias–Variance tradeoff: Higher degree reduces bias but can overfit.
- Try a small grid of degrees with cross-validation (GridSearchCV over poly__degree).
- For multi-feature data, polynomial terms explode combinatorially → consider regularization (Ridge/Lasso).
- Keep include_bias=False because LinearRegression already learns an intercept.
Key Takeaways
- Polynomial Regression = Linear Regression on engineered polynomial features.
- Use a Pipeline to keep transformations consistent and avoid leakage.
- Select the degree carefully; validate with proper metrics and CV.
- Add regularization for higher-degree polynomials or many features.
Code Snippet:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
rng = np.random.RandomState(42)
# Feature (single input) and target
X = np.linspace(-5, 5, 120).reshape(-1, 1)
true_y = 0.5 * X ** 2 - 2 * X + 3
y = (true_y + rng.normal(scale=2.0, size=X.shape)).ravel() # add Gaussian noise
# Quick visual check of raw data
plt.figure(figsize=(6.5, 4.5))
plt.scatter(X, y, s=18, alpha=0.7, label="Noisy observations")
plt.plot(X, true_y, lw=2, label="True function", color="orange")
plt.title("Synthetic Non-Linear Data")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.tight_layout()
plt.show()
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.25, random_state=42
)
degree = 2 # try 2 for quadratic; experiment with 3 or 4 as well
poly_reg = Pipeline(steps=[
("poly", PolynomialFeatures(degree=degree, include_bias=False)),
("linreg", LinearRegression())
])
poly_reg.fit(X_train, y_train)
y_pred = poly_reg.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Degree: {degree}")
print(f"MAE: {mae:.3f}")
print(f"MSE: {mse:.3f}")
print(f"R² : {r2:.3f}")
# Sort for a smooth fitted curve
X_plot = np.linspace(X.min(), X.max(), 400).reshape(-1, 1)
y_fit = poly_reg.predict(X_plot)
plt.figure(figsize=(6.8, 4.8))
plt.scatter(X_train, y_train, s=18, alpha=0.6, label="Train")
plt.scatter(X_test, y_test, s=18, alpha=0.8, label="Test")
plt.plot(X_plot, y_fit, lw=2.5, color="crimson", label=f"Polynomial fit (degree={degree})")
plt.plot(X, true_y, lw=2, color="orange", alpha=0.8, label="True function")
plt.title("Polynomial Regression Fit")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.tight_layout()
plt.show()
No comments yet. Be the first to comment!