⚡ Saturday ML Sparks – Linear Regression on Real Data 🧠📊

Posted On: October 18, 2025

Description:

Linear Regression is one of the most fundamental algorithms in machine learning.

It’s used to model the relationship between a dependent variable (the target) and one or more independent variables (the features).

In this post, we’ll explore how to apply Linear Regression to a real-world dataset and interpret its performance.

What Is Linear Regression?

Linear Regression attempts to fit a straight line (or hyperplane in higher dimensions) that best describes the relationship between features and the target.

Mathematically, it follows the equation:

y = β_0 + β_1x_1 + β_2x_2 + … + β_nx_n

Where:

y → predicted value
β_0 → intercept
β_1, β_2, … β_n → coefficients for each feature
x_1, x_2, … x_n → feature values

The goal is to minimize the Mean Squared Error (MSE) between predicted and actual values.

Dataset and Setup

We’ll use the California Housing dataset — a popular regression dataset that predicts median house value based on features such as income, house age, and population.

In case of connectivity or SSL issues, we’ll fall back to the Diabetes dataset, which is included locally with scikit-learn.

from sklearn.datasets import fetch_california_housing, load_diabetes
import pandas as pd

try:
    data = fetch_california_housing(as_frame=True)
    df = data.frame
    target_col = "MedHouseVal"
    print("Using California Housing dataset.")
except Exception:
    data = load_diabetes(as_frame=True)
    df = pd.concat([data.data, data.target.rename("target")], axis=1)
    target_col = "target"
    print("Using Diabetes dataset (fallback).")

This ensures the script works seamlessly both online and offline.

Building and Training the Model

We split the data into training and testing sets and fit a Linear Regression model.

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

X = df.drop(columns=[target_col])
y = df[target_col]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)

The model learns a set of coefficients (β) that define how each feature contributes to the prediction.

Evaluating Model Performance

Once trained, we evaluate how well the model generalizes to unseen data using Mean Squared Error (MSE) and R² Score.

from sklearn.metrics import mean_squared_error, r2_score

y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"R² Score: {r2:.2f}")

MSE measures average squared differences between predictions and actual values. Lower is better.
R² (Coefficient of Determination) shows how much variance in the target is explained by the features. Closer to 1 indicates a better fit.

Visualizing Predictions

A scatter plot of actual vs predicted values helps visualize model accuracy.

Points lying close to the diagonal line indicate strong predictive performance.

import matplotlib.pyplot as plt

plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred, alpha=0.5, color='teal')
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Actual vs Predicted – Linear Regression")
plt.plot([0, 5], [0, 5], color="orange", linestyle="--")
plt.grid(True, alpha=0.3)
plt.show()

Sample Output

The resulting plot looks like this:

Actual vs Predicted – Linear Regression

Figure: Actual vs Predicted – Linear Regression.

Interpreting the Results

Good Fit: Points closely align along the diagonal line.
Underfitting: High MSE, low R² — model is too simple or lacks features.
Overfitting: Perfect fit on training data but poor generalization on test data — consider regularization or feature reduction.

Even with its simplicity, Linear Regression remains one of the most useful baselines for regression tasks.

Key Takeaways

Linear Regression models the linear relationship between features and a continuous target.
Use MSE and R² to measure performance.
Always visualize predictions to spot anomalies or bias.
For larger or non-linear datasets, consider extensions such as Ridge, Lasso, or Polynomial Regression.

Conclusion

Linear Regression is the foundation of many predictive modeling techniques.

By learning to train, evaluate, and interpret such models, you gain essential skills for tackling more advanced machine learning problems.

Start simple — and let the data guide your next steps.

Full Script

The blog covers the essentials — find the complete notebook with all snippets & extras on GitHub Repo 👉 ML Sparks

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

⚡ Saturday ML Sparks – Linear Regression on Real Data 🧠📊

Description:

What Is Linear Regression?

Dataset and Setup

Building and Training the Model

Evaluating Model Performance

Visualizing Predictions

Sample Output

Interpreting the Results

Key Takeaways

Conclusion

Full Script

Comments

Add Your Comment

⚡ Saturday ML Sparks – Linear Regression on Real Data 🧠📊

Description:

What Is Linear Regression?

Dataset and Setup

Building and Training the Model

Evaluating Model Performance

Visualizing Predictions

Sample Output

Interpreting the Results

Key Takeaways

Conclusion

Full Script

Comments Show Comments

Add Your Comment

Related Posts

⚡ Saturday ML Sparks – Logistic Regression Classifier 🧠📈

Comments