🧠 AI with Python – Polynomial Regression with scikit-learn


Description:

Introduction

Linear regression is powerful, but it struggles when relationships are non-linear.

Polynomial Regression extends linear regression by first transforming the features into polynomial terms (e.g., x, x^2, x^3), then fitting a linear model on those expanded features.

In this post, we’ll build a Pipeline with PolynomialFeatures and LinearRegression, evaluate with standard metrics (MAE, MSE, R²), and visualize how well the curve fits a synthetic quadratic dataset.


When to Use Polynomial Regression

  • Your residuals show curvature (systematic under/over-prediction with a line).
  • You want a lightweight alternative to tree models for smooth trends.
  • You need an interpretable curve with a small number of terms.

Minimal Implementation (Core Idea)

Build a Pipeline:

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

degree = 2
model = Pipeline([
    ("poly", PolynomialFeatures(degree=degree, include_bias=False)),
    ("linreg", LinearRegression())
])

Fit & Evaluate (sketch):

model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Use MAE, MSE, and to compare different degrees (e.g., 1–5).


Visualising the Fit

To see how well the model captures the pattern, sort X across the range, predict smoothly, and overlay the fitted curve on top of the scatter plot of noisy points and the true function.


Practical Tips

  • Bias–Variance tradeoff: Higher degree reduces bias but can overfit.
  • Try a small grid of degrees with cross-validation (GridSearchCV over poly__degree).
  • For multi-feature data, polynomial terms explode combinatorially → consider regularization (Ridge/Lasso).
  • Keep include_bias=False because LinearRegression already learns an intercept.

Key Takeaways

  • Polynomial Regression = Linear Regression on engineered polynomial features.
  • Use a Pipeline to keep transformations consistent and avoid leakage.
  • Select the degree carefully; validate with proper metrics and CV.
  • Add regularization for higher-degree polynomials or many features.

Code Snippet:

import numpy as np
import matplotlib.pyplot as plt

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score


rng = np.random.RandomState(42)

# Feature (single input) and target
X = np.linspace(-5, 5, 120).reshape(-1, 1)
true_y = 0.5 * X ** 2 - 2 * X + 3
y = (true_y + rng.normal(scale=2.0, size=X.shape)).ravel()  # add Gaussian noise

# Quick visual check of raw data
plt.figure(figsize=(6.5, 4.5))
plt.scatter(X, y, s=18, alpha=0.7, label="Noisy observations")
plt.plot(X, true_y, lw=2, label="True function", color="orange")
plt.title("Synthetic Non-Linear Data")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.tight_layout()
plt.show()


X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)


degree = 2  # try 2 for quadratic; experiment with 3 or 4 as well

poly_reg = Pipeline(steps=[
    ("poly", PolynomialFeatures(degree=degree, include_bias=False)),
    ("linreg", LinearRegression())
])


poly_reg.fit(X_train, y_train)


y_pred = poly_reg.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Degree: {degree}")
print(f"MAE: {mae:.3f}")
print(f"MSE: {mse:.3f}")
print(f"R² : {r2:.3f}")


# Sort for a smooth fitted curve
X_plot = np.linspace(X.min(), X.max(), 400).reshape(-1, 1)
y_fit = poly_reg.predict(X_plot)

plt.figure(figsize=(6.8, 4.8))
plt.scatter(X_train, y_train, s=18, alpha=0.6, label="Train")
plt.scatter(X_test, y_test, s=18, alpha=0.8, label="Test")
plt.plot(X_plot, y_fit, lw=2.5, color="crimson", label=f"Polynomial fit (degree={degree})")
plt.plot(X, true_y, lw=2, color="orange", alpha=0.8, label="True function")
plt.title("Polynomial Regression Fit")
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.tight_layout()
plt.show()

Link copied!

Comments

Add Your Comment

Comment Added!