AW Dev Rethought

⚖️ There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies - C.A.R. Hoare

🧠 AI with Python – ❤️ Predict Heart Disease using Logistic Regression


Description:

Predicting heart disease is one of the classic applications of machine learning in healthcare.

In this project, we train a Logistic Regression model using the UCI Heart Disease dataset — demonstrating how data scaling, class balancing, and evaluation metrics combine to build a reliable predictive model.


1. Load and Explore the Dataset

The dataset includes attributes like age, cholesterol, blood pressure, maximum heart rate, and more.

The target variable (target) indicates the presence (1) or absence (0) of heart disease.

We first import and preview the dataset using pandas:

df = pd.read_csv("heart.csv")
df.head()

2. Split and Scale the Data

Feature scaling ensures all attributes contribute equally to model training.

We standardize the features with StandardScaler and maintain feature names to avoid any warnings:

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Balanced datasets are critical in healthcare — we use class_weight='balanced' during model training to handle class imbalances effectively.


3. Train the Logistic Regression Model

We fit a Logistic Regression model using scaled data.

It’s a fast, interpretable choice for binary classification tasks like this:

model = LogisticRegression(max_iter=1000, class_weight='balanced', random_state=42)
model.fit(X_train_scaled, y_train)

4. Evaluate the Model

The model achieves 81% accuracy — a solid performance for this dataset.

Here’s the classification report:

Metric Score
Precision 0.76
Recall 0.91
F1-score 0.83
Accuracy 0.81

The confusion matrix shows strong separation between positive and negative classes:

Confusion Matrix:
[[70 30]
 [ 9 96]]

5. Insights & Takeaways

Observation Insight
🔹 High Recall (0.91) Model successfully identifies most patients with heart disease.
🔹 Balanced Precision (0.76) Reasonable false-positive rate — acceptable in medical screening.
🔹 Overall Accuracy (0.81) Well-performing baseline with interpretable coefficients.

The model proves that with proper scaling and balanced training, even simple algorithms like Logistic Regression can deliver strong, reliable medical predictions.


Conclusion

This project highlights how clean preprocessing and consistent evaluation help transform a medical dataset into a meaningful AI model.

You can easily extend it to other classifiers like RandomForest or XGBoost for comparison and feature importance analysis.


Code Snippet:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, ConfusionMatrixDisplay, classification_report


url = "datasets/heart.csv"
df = pd.read_csv(url)
df.head()


X = df.drop(columns=['target'])
y = df['target'].astype(int)

y.value_counts(normalize=True).rename({0:'no disease', 1:'disease'})


from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)


scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.transform(X_test)

# Keep feature names
X_train_scaled = pd.DataFrame(X_train_scaled, columns=X_train.columns, index=X_train.index)
X_test_scaled  = pd.DataFrame(X_test_scaled,  columns=X_test.columns,  index=X_test.index)


model = LogisticRegression(max_iter=1000, class_weight='balanced', random_state=42)
model.fit(X_train_scaled, y_train)


y_pred = model.predict(X_test_scaled)

print(f"Model Accuracy: {accuracy_score(y_test, y_pred):.3f}\n")
print("Classification Report:\n", classification_report(y_test, y_pred, digits=3))

cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot(cmap='Reds')
plt.title("Confusion Matrix – Heart Disease Prediction")
plt.tight_layout()
plt.show()

Link copied!

Comments

Add Your Comment

Comment Added!