🧠 AI with Python – Regularization in Linear Models (Ridge vs Lasso)
Posted On: October 14, 2025
Description:
Overfitting is one of the most common problems in machine learning — when a model fits training data too well but fails to generalize to unseen data.
Regularization is a powerful solution that penalizes overly complex models, ensuring stability and better generalization.
In this post, we’ll explore Ridge and Lasso regularization — two of the most widely used techniques to control model complexity in Linear Regression.
Understanding the Concept
A standard Linear Regression model minimizes the sum of squared errors between predicted and actual values.
Regularization adds a penalty term to the cost function to restrict large coefficient values.
Mathematically,
-
Ridge (L2 Regularization) adds
→ penalty = λ × Σ(w²)
-
Lasso (L1 Regularization) adds
→ penalty = λ × Σ|w|
Where λ (alpha in scikit-learn) determines the strength of the penalty.
Dataset and Setup
We’ll use the Diabetes dataset from scikit-learn, a classic regression dataset with 10 numeric features.
from sklearn.datasets import load_diabetes
data = load_diabetes()
X, y = data.data, data.target
We split the dataset into training and testing sets to evaluate generalization.
Applying Ridge and Lasso
We train three models — Linear Regression, Ridge, and Lasso — to compare their performance.
from sklearn.linear_model import LinearRegression, Ridge, Lasso
lr = LinearRegression()
ridge = Ridge(alpha=1.0)
lasso = Lasso(alpha=0.1)
Each model is trained and evaluated using Mean Squared Error (MSE) on the test set.
Results and Comparison
| Model | MSE (lower is better) |
|---|---|
| Linear Regression | 3200.7 |
| Ridge | 3180.3 |
| Lasso | 3158.9 |
(Values may vary slightly based on random splits.)
You’ll notice that both Ridge and Lasso yield slightly lower MSE values than standard linear regression — indicating better generalization.
Key Insights
- Linear Regression fits data directly and may overfit when multicollinearity or noise exists.
- Ridge (L2) shrinks coefficients, making the model more stable but retains all features.
- Lasso (L1) can drive some coefficients to zero — effectively performing feature selection.
When combined, these methods offer a balance between interpretability and robustness.
Conclusion
Regularization ensures that your models learn patterns, not noise.
- Use Ridge when you want smoother coefficients across all features.
- Use Lasso when you want a simpler model with fewer active features.
Both methods are easy to implement and crucial for modern machine learning pipelines.
Code Snippet:
# Import necessary libraries
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import pandas as pd
# Load dataset
data = load_diabetes()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize models
lr = LinearRegression()
ridge = Ridge(alpha=1.0)
lasso = Lasso(alpha=0.1)
# Fit models
lr.fit(X_train, y_train)
ridge.fit(X_train, y_train)
lasso.fit(X_train, y_train)
# Predictions
lr_pred = lr.predict(X_test)
ridge_pred = ridge.predict(X_test)
lasso_pred = lasso.predict(X_test)
# Calculate MSE
mse_lr = mean_squared_error(y_test, lr_pred)
mse_ridge = mean_squared_error(y_test, ridge_pred)
mse_lasso = mean_squared_error(y_test, lasso_pred)
# Display results
results = pd.DataFrame({
"Model": ["Linear Regression", "Ridge", "Lasso"],
"MSE": [mse_lr, mse_ridge, mse_lasso]
})
print(results)
No comments yet. Be the first to comment!