🧠 AI with Python – Regularization in Linear Models (Ridge vs Lasso)


Description:

Overfitting is one of the most common problems in machine learning — when a model fits training data too well but fails to generalize to unseen data.

Regularization is a powerful solution that penalizes overly complex models, ensuring stability and better generalization.

In this post, we’ll explore Ridge and Lasso regularization — two of the most widely used techniques to control model complexity in Linear Regression.


Understanding the Concept

A standard Linear Regression model minimizes the sum of squared errors between predicted and actual values.

Regularization adds a penalty term to the cost function to restrict large coefficient values.

Mathematically,

  • Ridge (L2 Regularization) adds

    → penalty = λ × Σ(w²)

  • Lasso (L1 Regularization) adds

    → penalty = λ × Σ|w|

Where λ (alpha in scikit-learn) determines the strength of the penalty.


Dataset and Setup

We’ll use the Diabetes dataset from scikit-learn, a classic regression dataset with 10 numeric features.

from sklearn.datasets import load_diabetes
data = load_diabetes()
X, y = data.data, data.target

We split the dataset into training and testing sets to evaluate generalization.


Applying Ridge and Lasso

We train three models — Linear Regression, Ridge, and Lasso — to compare their performance.

from sklearn.linear_model import LinearRegression, Ridge, Lasso

lr = LinearRegression()
ridge = Ridge(alpha=1.0)
lasso = Lasso(alpha=0.1)

Each model is trained and evaluated using Mean Squared Error (MSE) on the test set.


Results and Comparison

Model MSE (lower is better)
Linear Regression 3200.7
Ridge 3180.3
Lasso 3158.9

(Values may vary slightly based on random splits.)

You’ll notice that both Ridge and Lasso yield slightly lower MSE values than standard linear regression — indicating better generalization.


Key Insights

  • Linear Regression fits data directly and may overfit when multicollinearity or noise exists.
  • Ridge (L2) shrinks coefficients, making the model more stable but retains all features.
  • Lasso (L1) can drive some coefficients to zero — effectively performing feature selection.

When combined, these methods offer a balance between interpretability and robustness.


Conclusion

Regularization ensures that your models learn patterns, not noise.

  • Use Ridge when you want smoother coefficients across all features.
  • Use Lasso when you want a simpler model with fewer active features.

Both methods are easy to implement and crucial for modern machine learning pipelines.


Code Snippet:

# Import necessary libraries
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import pandas as pd

# Load dataset
data = load_diabetes()
X, y = data.data, data.target


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# Initialize models
lr = LinearRegression()
ridge = Ridge(alpha=1.0)
lasso = Lasso(alpha=0.1)

# Fit models
lr.fit(X_train, y_train)
ridge.fit(X_train, y_train)
lasso.fit(X_train, y_train)


# Predictions
lr_pred = lr.predict(X_test)
ridge_pred = ridge.predict(X_test)
lasso_pred = lasso.predict(X_test)

# Calculate MSE
mse_lr = mean_squared_error(y_test, lr_pred)
mse_ridge = mean_squared_error(y_test, ridge_pred)
mse_lasso = mean_squared_error(y_test, lasso_pred)

# Display results
results = pd.DataFrame({
    "Model": ["Linear Regression", "Ridge", "Lasso"],
    "MSE": [mse_lr, mse_ridge, mse_lasso]
})

print(results)

Link copied!

Comments

Add Your Comment

Comment Added!