🧠 AI with Python – 🐍💾 Export & Load Model with joblib


Description:

Training a machine learning model takes time and compute. In real-world applications, you don’t want to retrain the model every time your app starts. Instead, you train once, save it, and reload it whenever needed.

Scikit-learn recommends joblib for model persistence. It’s optimized for objects containing large numpy arrays.


Saving and Loading a Model

Train a simple classifier, then export it with joblib:

import joblib
joblib.dump(model, "iris_logreg_model.joblib")

Reload it later:

loaded_model = joblib.load("iris_logreg_model.joblib")
predictions = loaded_model.predict(X_test)

Why joblib?

  • Efficient for models with large numpy arrays.
  • Lightweight & fast → ideal for scikit-learn.
  • Deployment-friendly → ship the model file with your app.

Best Practices for Saving Models

  • Save the entire pipeline (preprocessing + model) instead of only the model, to ensure consistent transformations at inference.

    python joblib.dump(pipeline, "iris_pipeline.joblib")

  • Version your models → store with filenames like model_v1.joblib, model_v2.joblib to avoid overwriting and track improvements.

  • Store metadata → save additional info like training date, dataset version, and metrics in a JSON/YAML file alongside the model.
  • Test after loading → always run a quick prediction after reloading to ensure model integrity.

Key Takeaways

  • Save models once, reuse many times.
  • joblib works seamlessly with scikit-learn classifiers, regressors, and pipelines.
  • Crucial for production workflows where retraining is costly.

Code Snippet:

# Data & split
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Model
from sklearn.linear_model import LogisticRegression

# Save/load utility
import joblib


# Load the Iris dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

# Train a Logistic Regression classifier
model = LogisticRegression(max_iter=200, random_state=42)
model.fit(X_train, y_train)


# Save model to disk
joblib.dump(model, "iris_logreg_model.joblib")
print("Model saved successfully!")


# Load model back from disk
loaded_model = joblib.load("iris_logreg_model.joblib")

# Use it for prediction
print("Predictions on test set:", loaded_model.predict(X_test[:5]))

Link copied!

Comments

Add Your Comment

Comment Added!