AW Dev Rethought

⚖️ There are two ways of constructing a software design: one way is to make it so simple that there are obviously no deficiencies - C.A.R. Hoare

🧠 AI with Python – 🐳 Running ML Inference Inside Docker


Description:

Once a machine learning model is containerized, the next critical step is to ensure that inference actually works inside the container — not just that the container runs.

In this project, we focus on validating ML inference from within a running Docker container by sending real prediction requests and confirming consistent outputs. This step ensures your deployment is not just build-ready, but runtime-ready.


Understanding the Problem

It’s common to successfully build a Docker image but still encounter runtime issues such as:

  • model not loading correctly
  • missing dependencies
  • incorrect input shapes
  • API endpoints failing inside containers

That’s why running and testing inference inside Docker is a mandatory checkpoint before cloud deployment.

This step answers one key question:

Does my ML model behave the same way inside Docker as it does locally?


1. Install Required Packages

Before container execution, we ensure the inference API works locally.

pip install fastapi uvicorn scikit-learn joblib numpy

This keeps the workflow consistent with earlier AI with Python scripts.


2. Load the Trained Model in the API

The model is loaded once when the application starts, ensuring efficient inference.

import joblib
import numpy as np
from fastapi import FastAPI
from pydantic import BaseModel

model = joblib.load("iris_model.joblib")

app = FastAPI(title="ML Inference Inside Docker")

class InputData(BaseModel):
    features: list

Loading at startup avoids repeated disk reads during inference.


3. Define the Inference Endpoint

We define a simple prediction endpoint that reshapes inputs correctly and returns the model output.

@app.post("/predict")
def predict(data: InputData):
    arr = np.array(data.features).reshape(1, -1)
    prediction = model.predict(arr).tolist()
    return {"prediction": prediction}

This endpoint is identical to the local inference logic — ensuring consistency.


4. Run the Inference API Inside Docker

We now run the containerized ML application.

docker run -p 8000:8000 iris-ml-api

At this point:

  • the FastAPI server runs inside Docker
  • the model loads inside the container
  • port 8000 is exposed for inference requests

5. Send Inference Requests to the Container

We test inference using a real prediction request.

curl -X POST "http://127.0.0.1:8000/predict" \
     -H "Content-Type: application/json" \
     -d '{"features":[5.8,2.7,5.1,1.9]}'

A valid prediction response confirms that inference works end-to-end inside Docker.


Why This Validation Step Is Important

This step ensures that:

  • the ML model loads correctly inside Docker
  • the API receives and parses input correctly
  • inference logic behaves consistently
  • the container is ready for cloud hosting

Skipping this validation often leads to failures later during cloud deployment.


Key Takeaways

  1. Running inference inside Docker verifies real deployment readiness.
  2. Model loading should happen once at application startup.
  3. Containerized inference must match local inference behavior.
  4. Testing with real requests prevents hidden runtime failures.
  5. This step is essential before moving to cloud platforms.

Conclusion

Running ML inference inside a Docker container is more than a technical check — it’s a confidence checkpoint.

By validating predictions inside Docker, you ensure your model, API, and environment are truly aligned.

Once this step is complete, your ML application is no longer tied to your local system and is ready for cloud deployment, scaling, and real-world usage.


Link copied!

Comments

Add Your Comment

Comment Added!