AW Dev Rethought

⚖️ There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies - C.A.R. Hoare

🧠 AI with Python – 🏦 Loan Approval Prediction


Description:

Loan approval is one of the most common and impactful decision-making problems in banking and fintech.

Every loan approval involves balancing business growth with financial risk, making accurate and explainable predictions essential.

In this project, we build a loan approval prediction model using supervised machine learning on tabular financial data, demonstrating how ML assists real-world credit decisioning systems.


Understanding the Problem

Loan datasets typically contain information about applicants such as income, credit history, employment status, and loan amount.

The challenge lies in:

  • handling missing and categorical data
  • managing class imbalance
  • minimizing risky approvals (false positives)
  • ensuring interpretability for regulatory compliance

Unlike academic problems, accuracy alone is not sufficient in financial decision systems.


1. Loading the Loan Dataset

We start with a loan approval dataset in CSV format.

import pandas as pd

df = pd.read_csv("loan_data.csv")
df.head()

The target variable indicates whether a loan was approved or rejected.


2. Inspecting Data Quality

Before modeling, we examine data structure and class distribution.

print(df.info())
print(df["loan_status"].value_counts())

Loan datasets often contain missing values and mixed data types, which must be addressed carefully.


3. Handling Missing Values

Missing data is common in real-world financial records.

We use simple but effective imputation strategies.

from sklearn.impute import SimpleImputer

num_cols = df.select_dtypes(include="number").columns
cat_cols = df.select_dtypes(exclude="number").columns.drop("loan_status")

num_imputer = SimpleImputer(strategy="median")
cat_imputer = SimpleImputer(strategy="most_frequent")

df[num_cols] = num_imputer.fit_transform(df[num_cols])
df[cat_cols] = cat_imputer.fit_transform(df[cat_cols])

This ensures the dataset is complete and model-ready.


4. Encoding Categorical Features

Machine learning models require numeric inputs, so categorical fields must be encoded.

from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
for col in cat_cols:
    df[col] = encoder.fit_transform(df[col])

This step converts categories like employment type or marital status into numeric form.


5. Train/Test Split with Stratification

We split the dataset while preserving class proportions.

from sklearn.model_selection import train_test_split

X = df.drop("loan_status", axis=1)
y = df["loan_status"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.3,
    stratify=y,
    random_state=42
)

Stratification ensures fair evaluation across approval and rejection cases.


6. Training a Loan Approval Model

We use Logistic Regression as a baseline model due to its interpretability and stability.

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

Logistic Regression provides probability outputs that align well with financial risk scoring.


7. Evaluating Model Performance

Evaluation focuses on metrics relevant to financial risk.

from sklearn.metrics import classification_report, confusion_matrix

y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

Precision is especially important to reduce the number of risky loan approvals.


8. ROC–AUC for Approval Ranking

ROC–AUC measures how well the model ranks approved vs rejected applicants.

from sklearn.metrics import roc_auc_score

y_proba = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, y_proba)

print("ROC–AUC:", auc)

This metric is threshold-independent and widely used in credit modeling.


Key Takeaways

  1. Loan approval prediction is a core real-world ML use case in finance.
  2. Data preprocessing plays a critical role in model reliability.
  3. Logistic Regression provides interpretable and stable risk scores.
  4. Precision and ROC–AUC are more meaningful than accuracy in credit decisions.
  5. ML assists decision-making but does not replace business rules or compliance checks.

Conclusion

Loan approval prediction showcases how machine learning supports high-stakes financial decisions.

By combining careful preprocessing, interpretable models, and appropriate evaluation metrics, ML systems can help institutions make faster, more consistent, and data-driven credit decisions.

This project represents a realistic end-to-end fintech ML workflow, making it a strong addition to the AI with Python – Real-World Mini Projects (Advanced) series.


Code Snippet:

import pandas as pd

df = pd.read_csv("loan_data.csv")
df.head()


print(df.info())
print(df["loan_status"].value_counts())


from sklearn.impute import SimpleImputer

num_cols = df.select_dtypes(include="number").columns
cat_cols = df.select_dtypes(exclude="number").columns.drop("loan_status")

num_imputer = SimpleImputer(strategy="median")
cat_imputer = SimpleImputer(strategy="most_frequent")

df[num_cols] = num_imputer.fit_transform(df[num_cols])
df[cat_cols] = cat_imputer.fit_transform(df[cat_cols])


from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
for col in cat_cols:
    df[col] = encoder.fit_transform(df[col])


from sklearn.model_selection import train_test_split

X = df.drop("loan_status", axis=1)
y = df["loan_status"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.3,
    stratify=y,
    random_state=42
)

from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)


from sklearn.metrics import classification_report, confusion_matrix

y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))


from sklearn.metrics import roc_auc_score

y_proba = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, y_proba)

print("ROC–AUC:", auc)

Link copied!

Comments

Add Your Comment

Comment Added!