🧠 AI with Python – 🧪 Train/Test Split using train_test_split
Posted On: July 22, 2025
Description:
Why Split Data?
To evaluate how well your model generalizes, you need separate training and testing sets. Training on all data may lead to overfitting, where the model performs well on known data but fails on new data.
The train_test_split Method
Scikit-learn’s train_test_split quickly divides data into:
- Training set (for model training)
- Testing set (for performance evaluation)
Dataset → Train (80%) + Test (20%)
Practical Takeaway
Always keep a dedicated testing dataset. A simple train_test_split gives an unbiased estimate of real-world performance.
Code Snippet:
# Import train_test_split from model_selection for splitting your dataset
from sklearn.model_selection import train_test_split
import pandas as pd
# Creating a simple DataFrame
data = {
    'Age': [22, 25, 47, 52, 46],
    'Salary': [18000, 24000, 52000, 58000, 60000],
    'Purchased': [0, 0, 1, 1, 1]
}
df = pd.DataFrame(data)
print("Original Data:")
print(df)
X = df[['Age', 'Salary']]  # Features
y = df['Purchased']  # Target
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)
print("X_train:")
print(X_train)
print("\nX_test:")
print(X_test)
print("\ny_train:")
print(y_train)
print("\ny_test:")
print(y_test)Link copied!
Comments
Add Your Comment
Comment Added!
 
           
            
             
            
             
            
            
No comments yet. Be the first to comment!