🧠 AI with Python – 🔤 Encode Categorical Data Using LabelEncoder


Description:

The Problem with Categorical Features

Machine learning models work with numbers, not text. But real-world datasets often include features like “Country”, “Gender”, or “Category” that are non-numeric.


Solution: Label Encoding

LabelEncoder from scikit-learn converts categorical values into numeric codes. Example:

['Red', 'Blue', 'Red'] → [0, 1, 0]

Where It’s Useful?

  • Tree-based models like Decision Trees and Random Forests (they handle integer-based categories well).
  • As a preprocessing step before One-Hot Encoding.

Practical Takeaway

Always convert categorical data before model training. For nominal categories, label encoding is fast and easy to implement, making your dataset ML-ready in seconds.


Code Snippet:

# Import LabelEncoder for encoding categories
# Import pandas to create and manage the dataset
from sklearn.preprocessing import LabelEncoder
import pandas as pd

# Create a sample dataset with a categorical column
data = {'City': ['New York', 'Paris', 'Berlin', 'Paris', 'Berlin', 'New York']}
df = pd.DataFrame(data)

# Display the original dataset
df

# Initialize the encoder
le = LabelEncoder()

# Apply encoding on the 'City' column
df['City_encoded'] = le.fit_transform(df['City'])

# Display the DataFrame with encoded values
df

# View the mapping of labels to encoded values
label_map = dict(zip(le.classes_, le.transform(le.classes_)))
print("Label Mapping:", label_map)

Link copied!

Comments

Add Your Comment

Comment Added!