🧠 AI with Python – 🛠️ Handle missing data with SimpleImputer

Posted on: July 17, 2025

Description:

Why Care About Missing Data?

Missing values can break your ML pipeline, cause training errors, and skew model performance.

Solution: SimpleImputer

SimpleImputer fills missing values with a strategy of your choice:

Mean (for numerical data)
Median (robust for skewed data)
Most frequent value (for categorical data)

NaN → mean(column)

Where It’s Useful?

Datasets with occasional NaN values in numeric or categorical columns.
When dropping rows isn’t an option (due to small dataset size).

Practical Takeaway

Never leave missing values untreated. Imputing ensures clean data and stable ML models without losing important information.

Code Snippet:

# Import SimpleImputer to handle missing values
# Import pandas to create and display the dataset
from sklearn.impute import SimpleImputer
import pandas as pd

# Create a dataset with some missing values (NaN)
data = {
    'Age': [25, 30, None, 45, 50],
    'Salary': [50000, None, 60000, 80000, None]
}

df = pd.DataFrame(data)
df

# Initialize the SimpleImputer with mean strategy
imputer = SimpleImputer(strategy='mean')

# Fit and transform the data
imputed_array = imputer.fit_transform(df)

# Convert the imputed NumPy array back to a DataFrame
# Use the original column names
imputed_df = pd.DataFrame(imputed_array, columns=df.columns)
imputed_df

print("Original Data (with NaNs):")
print(df)

print("\nImputed Data (filled using mean):")
print(imputed_df)

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

🧠 AI with Python – 🛠️ Handle missing data with SimpleImputer

Description:

Why Care About Missing Data?

Solution: SimpleImputer

Where It’s Useful?

Practical Takeaway

Code Snippet:

Comments