🧠 AI with Python – 🛠️ Handle missing data with SimpleImputer
Posted On: July 17, 2025
Description:
Why Care About Missing Data?
Missing values can break your ML pipeline, cause training errors, and skew model performance.
Solution: SimpleImputer
SimpleImputer fills missing values with a strategy of your choice:
- Mean (for numerical data)
- Median (robust for skewed data)
- Most frequent value (for categorical data)
NaN → mean(column)
Where It’s Useful?
- Datasets with occasional NaN values in numeric or categorical columns.
- When dropping rows isn’t an option (due to small dataset size).
Practical Takeaway
Never leave missing values untreated. Imputing ensures clean data and stable ML models without losing important information.
Code Snippet:
# Import SimpleImputer to handle missing values
# Import pandas to create and display the dataset
from sklearn.impute import SimpleImputer
import pandas as pd
# Create a dataset with some missing values (NaN)
data = {
'Age': [25, 30, None, 45, 50],
'Salary': [50000, None, 60000, 80000, None]
}
df = pd.DataFrame(data)
df
# Initialize the SimpleImputer with mean strategy
imputer = SimpleImputer(strategy='mean')
# Fit and transform the data
imputed_array = imputer.fit_transform(df)
# Convert the imputed NumPy array back to a DataFrame
# Use the original column names
imputed_df = pd.DataFrame(imputed_array, columns=df.columns)
imputed_df
print("Original Data (with NaNs):")
print(df)
print("\nImputed Data (filled using mean):")
print(imputed_df)
Link copied!
Comments
Add Your Comment
Comment Added!
No comments yet. Be the first to comment!