🧠 AI with Python – 🔤 Encode Categorical Data Using LabelEncoder

Posted On: July 15, 2025

Description:

The Problem with Categorical Features

Machine learning models work with numbers, not text. But real-world datasets often include features like “Country”, “Gender”, or “Category” that are non-numeric.

Solution: Label Encoding

LabelEncoder from scikit-learn converts categorical values into numeric codes. Example:

['Red', 'Blue', 'Red'] → [0, 1, 0]

Where It’s Useful?

Tree-based models like Decision Trees and Random Forests (they handle integer-based categories well).
As a preprocessing step before One-Hot Encoding.

Practical Takeaway

Always convert categorical data before model training. For nominal categories, label encoding is fast and easy to implement, making your dataset ML-ready in seconds.

Code Snippet:

# Import LabelEncoder for encoding categories
# Import pandas to create and manage the dataset
from sklearn.preprocessing import LabelEncoder
import pandas as pd

# Create a sample dataset with a categorical column
data = {'City': ['New York', 'Paris', 'Berlin', 'Paris', 'Berlin', 'New York']}
df = pd.DataFrame(data)

# Display the original dataset
df

# Initialize the encoder
le = LabelEncoder()

# Apply encoding on the 'City' column
df['City_encoded'] = le.fit_transform(df['City'])

# Display the DataFrame with encoded values
df

# View the mapping of labels to encoded values
label_map = dict(zip(le.classes_, le.transform(le.classes_)))
print("Label Mapping:", label_map)

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

🧠 AI with Python – 🔤 Encode Categorical Data Using LabelEncoder

Description:

The Problem with Categorical Features

Solution: Label Encoding

Where It’s Useful?

Practical Takeaway

Code Snippet:

Comments