🧠 AI with Python – ⚖️ Standardize Your Data with StandardScaler


Description:

Why Standardization?

Many machine learning algorithms work better when input data is centered around zero and has unit variance. Unlike normalization (0–1 scaling), standardization ensures features have a mean of 0 and a standard deviation of 1.


What is StandardScaler?

StandardScaler from scikit-learn transforms each feature as:

X' = (X - μ) / σ

Where It’s Useful?

  • Linear Regression, Logistic Regression
  • PCA (Principal Component Analysis)
  • Neural Networks

Practical Takeaway

When your model relies on the distribution of data (like PCA or regression), standardization gives balanced performance, prevents some features from dominating, and often improves accuracy.


Code Snippet:

# Import StandardScaler for standardization
# Import pandas for handling tabular data
from sklearn.preprocessing import StandardScaler
import pandas as pd

# Create a sample dataset
data = {
    'Age': [22, 25, 47, 52, 46],        # Small scale feature
    'Income': [18000, 24000, 52000, 58000, 60000]  # Large scale feature
}

# Convert the dictionary into a pandas DataFrame
df = pd.DataFrame(data)

# Display the original data
df

# Initialize the scaler
scaler = StandardScaler()

# Fit the scaler on the data and transform it
# Returns a NumPy array with standardized values
standardized = scaler.fit_transform(df)

# Convert the standardized array back into a DataFrame
# Use original column names for readability
standardized_df = pd.DataFrame(standardized, columns=df.columns)

# Display the standardized data
standardized_df

# Compare the original and standardized values
print("Original Data:")
print(df)

print("\nStandardized Data:")
print(standardized_df)

Link copied!

Comments

Add Your Comment

Comment Added!