🧠 AI with Python – 🍷🌲 Predict Wine Quality using RandomForestClassifier

Posted on: November 4, 2025

Description:

Wine tasting might be an art — but what if we let machine learning do the judging? 🍷

In this post, we’ll use a Random Forest Classifier to predict the quality of wine based on its chemical properties such as acidity, residual sugar, and alcohol content.

The goal is to explore how ensemble learning can classify wines into quality tiers — demonstrating the power of Random Forests in handling structured data.

Why Random Forests?

Random Forests are ensemble models that combine multiple decision trees to improve accuracy and generalization.

Each tree learns from a random subset of features and data samples, reducing overfitting and improving robustness.

Advantages of Random Forests:

Handle both classification and regression tasks
Tolerate missing values and noise
Provide built-in feature importance scores
Require little to no feature scaling

These qualities make them an excellent choice for tabular datasets like wine quality analysis.

Dataset Overview

The original Wine Quality dataset (from the UCI Machine Learning Repository) contains two variants — red and white wines.

Each record includes physicochemical properties (inputs) and a quality score (output), rated from 3 to 8.

If the dataset isn’t available locally, the script uses scikit-learn’s load_wine() dataset as a fallback.

This ensures the notebook runs smoothly in any environment.

from sklearn.datasets import load_wine
import pandas as pd

wine = load_wine(as_frame=True)
df = wine.frame
X = df.drop(columns=["target"])
y = df["target"]

Training the Random Forest Model

Once the data is ready, we train a RandomForestClassifier with multiple trees (estimators) to make collective decisions.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = RandomForestClassifier(n_estimators=300, random_state=42, n_jobs=-1)
model.fit(X_train, y_train)

A high accuracy score indicates that the model effectively captures the relationship between chemical features and wine quality.

Feature Importance Visualization

One of the major strengths of Random Forests is their ability to rank features by importance.

This tells us which chemical properties contribute the most to quality prediction.

import pandas as pd
import matplotlib.pyplot as plt

importances = pd.Series(model.feature_importances_, index=X.columns).sort_values(ascending=False)
importances.plot(kind="barh", figsize=(8, 5), color="maroon")
plt.title("Top Feature Importances – Wine Quality Prediction")
plt.xlabel("Importance Score")
plt.tight_layout()
plt.show()

Interpretation:

Features like alcohol content, volatile acidity, and sulphates often rank high.
Higher alcohol and balanced acidity typically correlate with better quality wines.
These insights help wineries and researchers understand what makes a wine score higher.

Key Takeaways

Random Forests provide high accuracy and interpretability for structured datasets.
No feature scaling is needed, simplifying the preprocessing workflow.
The feature importance plot provides actionable insights into real-world properties.
You can easily adapt the same model for wine quality regression using RandomForestRegressor.

Conclusion

Machine learning isn’t just for numbers and pixels — it can even evaluate taste. 🍷

By using a Random Forest Classifier, we can predict wine quality with impressive accuracy and understand which factors make a wine “better.”

This combination of accuracy, interpretability, and ease of use makes Random Forests one of the most practical algorithms in any ML toolkit.

Code Snippet:

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, ConfusionMatrixDisplay



# Try local UCI CSVs first
csv_candidates = ["winequality-red.csv", "winequality-white.csv"]
csv_path = next((p for p in csv_candidates if os.path.exists(p)), None)

if csv_path:
    df = pd.read_csv(csv_path, sep=";")  # UCI CSVs are semicolon-separated
    print(f"✅ Using local dataset: {csv_path}")
    # UCI winequality target is typically an integer 3–8 (quality score). We'll model it as multiclass classification.
    X = df.drop(columns=["quality"])
    y = df["quality"].astype(int)
    feature_names = X.columns.tolist()
    dataset_name = f"UCI Wine Quality ({'red' if 'red' in csv_path else 'white'})"
else:
    print("⚠️ Local UCI CSV not found. Falling back to scikit-learn `load_wine()`.")
    wine = load_wine(as_frame=True)
    df = wine.frame.copy()
    # Use built-in class (0/1/2) as quality tiers for a drop-in classification demo
    X = df.drop(columns=["target"])
    y = df["target"]
    feature_names = X.columns.tolist()
    dataset_name = "sklearn load_wine (class tiers as quality)"
    
df.head()



X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)



rf = RandomForestClassifier(
    n_estimators=300,      # more trees → stabler estimates
    max_depth=None,        # let trees grow; forest reduces overfitting
    min_samples_leaf=1,
    random_state=42,
    n_jobs=-1
)
rf.fit(X_train, y_train)



y_pred = rf.predict(X_test)

acc = accuracy_score(y_test, y_pred)
print(f"Dataset: {dataset_name}")
print(f"Accuracy: {acc:.3f}\n")

print("Classification Report:")
print(classification_report(y_test, y_pred))



disp = ConfusionMatrixDisplay.from_predictions(y_test, y_pred, cmap="Blues")
disp.ax_.set_title("Confusion Matrix — RandomForest (Wine Quality)")
plt.tight_layout()
plt.show()



importances = pd.Series(rf.feature_importances_, index=feature_names).sort_values(ascending=False)

plt.figure(figsize=(8, 5))
importances.head(15).plot(kind="barh")
plt.gca().invert_yaxis()
plt.title("Top Feature Importances — RandomForest")
plt.xlabel("Importance")
plt.tight_layout()
plt.show()

importances.head(15)

← →	move
↑	rotate
↓	soft drop
Space	hard drop
P	pause / resume

🧠 AI with Python – 🍷🌲 Predict Wine Quality using RandomForestClassifier

Description:

Why Random Forests?

Dataset Overview

Training the Random Forest Model

Feature Importance Visualization

Key Takeaways

Conclusion

Code Snippet:

Comments

Add Your Comment

🧠 AI with Python – 🍷🌲 Predict Wine Quality using RandomForestClassifier

Description:

Why Random Forests?

Dataset Overview

Training the Random Forest Model

Feature Importance Visualization

Key Takeaways

Conclusion

Code Snippet:

Comments Show Comments

Add Your Comment

Related Posts

🧠 AI with Python – 📊 Reliability Diagrams

🧠 AI with Python – 📈 Model Calibration Curves

🧠 AI with Python – 🔥 Feature Interaction Heatmaps

7-Day AI Crash Course

Comments