🧠 AI with Python – 🫀 Heart Disease Prediction (UCI Dataset)

Posted on: January 13, 2026

Description:

Heart disease remains one of the leading causes of mortality worldwide.

Early detection and risk assessment play a crucial role in improving patient outcomes — and this is where machine learning can assist healthcare professionals.

In this project, we build a heart disease prediction model using the UCI Heart Disease dataset, demonstrating how supervised learning can be applied to real-world medical data.

Understanding the Problem

Medical datasets present unique challenges:

features have different scales (age, cholesterol, blood pressure)
false negatives are costly (missing a high-risk patient)
interpretability is critical
accuracy alone is not sufficient

The goal is not to replace clinicians, but to support decision-making with data-driven risk signals.

1. Loading the Heart Disease Dataset

We begin by loading a UCI-style heart disease dataset stored in CSV format.

import pandas as pd

df = pd.read_csv("heart.csv")
df.head()

The dataset includes clinical features such as age, sex, chest pain type, cholesterol, maximum heart rate, and a binary target indicating disease presence.

2. Inspecting Class Balance and Features

Before modeling, it’s important to understand the dataset structure.

print(df.info())
print(df["target"].value_counts())

This helps confirm that the dataset is suitable for binary classification and reveals whether class imbalance is present.

3. Train/Test Split with Stratification

We separate features and labels while preserving class distribution.

from sklearn.model_selection import train_test_split

X = df.drop("target", axis=1)
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.3,
    stratify=y,
    random_state=42
)

Stratification ensures fair evaluation, especially in medical datasets.

4. Feature Scaling

Clinical features often exist on different numeric scales.

Scaling ensures that no single feature dominates model learning.

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

This step is especially important for linear models like Logistic Regression.

5. Training a Baseline Medical Model

We use Logistic Regression, a widely accepted baseline in medical ML due to its interpretability.

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)
model.fit(X_train_scaled, y_train)

Logistic Regression provides clear probability estimates that clinicians can reason about.

6. Evaluating Medical Predictions

Model evaluation focuses on metrics relevant to healthcare decision-making.

from sklearn.metrics import classification_report, confusion_matrix

y_pred = model.predict(X_test_scaled)

print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

Recall and precision are more important than accuracy in medical risk prediction.

7. ROC–AUC for Risk Separation

ROC–AUC measures how well the model separates patients with and without heart disease.

from sklearn.metrics import roc_auc_score

y_proba = model.predict_proba(X_test_scaled)[:, 1]
auc = roc_auc_score(y_test, y_proba)

print("ROC–AUC:", auc)

This metric is threshold-independent and widely used in healthcare ML studies.

Key Takeaways

Heart disease prediction is a high-impact medical ML application.
Logistic Regression provides an interpretable and reliable baseline.
Feature scaling is essential for clinical datasets.
ROC–AUC and recall are more meaningful than accuracy in healthcare.
ML models should assist — not replace — clinical judgment.

Conclusion

Heart disease prediction highlights how machine learning can deliver real-world value when applied responsibly.

By combining careful preprocessing, interpretable models, and appropriate evaluation metrics, ML systems can support early detection and informed medical decisions.

This project demonstrates a complete end-to-end healthcare ML workflow, making it a strong addition to the AI with Python – Real-World Mini Projects (Advanced) series.

Code Snippet:

import pandas as pd

df = pd.read_csv("heart.csv")
df.head()


print(df.info())
print(df["target"].value_counts())


from sklearn.model_selection import train_test_split

X = df.drop("target", axis=1)
y = df["target"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.3,
    stratify=y,
    random_state=42
)


from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)
model.fit(X_train_scaled, y_train)


from sklearn.metrics import classification_report, confusion_matrix

y_pred = model.predict(X_test_scaled)

print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))


from sklearn.metrics import roc_auc_score

y_proba = model.predict_proba(X_test_scaled)[:, 1]
auc = roc_auc_score(y_test, y_proba)

print("ROC–AUC:", auc)

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

🧠 AI with Python – 🫀 Heart Disease Prediction (UCI Dataset)

Description:

Understanding the Problem

1. Loading the Heart Disease Dataset

2. Inspecting Class Balance and Features

3. Train/Test Split with Stratification

4. Feature Scaling

5. Training a Baseline Medical Model

6. Evaluating Medical Predictions

7. ROC–AUC for Risk Separation

Key Takeaways

Conclusion

Code Snippet:

Comments

Add Your Comment

🧠 AI with Python – 🫀 Heart Disease Prediction (UCI Dataset)

Description:

Understanding the Problem

1. Loading the Heart Disease Dataset

2. Inspecting Class Balance and Features

3. Train/Test Split with Stratification

4. Feature Scaling

5. Training a Baseline Medical Model

6. Evaluating Medical Predictions

7. ROC–AUC for Risk Separation

Key Takeaways

Conclusion

Code Snippet:

Comments Show Comments

Add Your Comment

Related Posts

🧠 AI with Python – 📊 Reliability Diagrams

🧠 AI with Python – 📈 Model Calibration Curves

🧠 AI with Python – 🔥 Feature Interaction Heatmaps

7-Day AI Crash Course

Comments