🧠 AI with Python – 🔤 Encode Categorical Data Using LabelEncoder
Posted On: July 15, 2025
Description:
The Problem with Categorical Features
Machine learning models work with numbers, not text. But real-world datasets often include features like “Country”, “Gender”, or “Category” that are non-numeric.
Solution: Label Encoding
LabelEncoder from scikit-learn converts categorical values into numeric codes. Example:
['Red', 'Blue', 'Red'] → [0, 1, 0]
Where It’s Useful?
- Tree-based models like Decision Trees and Random Forests (they handle integer-based categories well).
- As a preprocessing step before One-Hot Encoding.
Practical Takeaway
Always convert categorical data before model training. For nominal categories, label encoding is fast and easy to implement, making your dataset ML-ready in seconds.
Code Snippet:
# Import LabelEncoder for encoding categories
# Import pandas to create and manage the dataset
from sklearn.preprocessing import LabelEncoder
import pandas as pd
# Create a sample dataset with a categorical column
data = {'City': ['New York', 'Paris', 'Berlin', 'Paris', 'Berlin', 'New York']}
df = pd.DataFrame(data)
# Display the original dataset
df
# Initialize the encoder
le = LabelEncoder()
# Apply encoding on the 'City' column
df['City_encoded'] = le.fit_transform(df['City'])
# Display the DataFrame with encoded values
df
# View the mapping of labels to encoded values
label_map = dict(zip(le.classes_, le.transform(le.classes_)))
print("Label Mapping:", label_map)
Link copied!
Comments
Add Your Comment
Comment Added!
No comments yet. Be the first to comment!