🧠 AI with Python – 🐳 Running ML Inference Inside Docker
Posted on: December 25, 2025
Description:
Once a machine learning model is containerized, the next critical step is to ensure that inference actually works inside the container — not just that the container runs.
In this project, we focus on validating ML inference from within a running Docker container by sending real prediction requests and confirming consistent outputs. This step ensures your deployment is not just build-ready, but runtime-ready.
Understanding the Problem
It’s common to successfully build a Docker image but still encounter runtime issues such as:
- model not loading correctly
- missing dependencies
- incorrect input shapes
- API endpoints failing inside containers
That’s why running and testing inference inside Docker is a mandatory checkpoint before cloud deployment.
This step answers one key question:
Does my ML model behave the same way inside Docker as it does locally?
1. Install Required Packages
Before container execution, we ensure the inference API works locally.
pip install fastapi uvicorn scikit-learn joblib numpy
This keeps the workflow consistent with earlier AI with Python scripts.
2. Load the Trained Model in the API
The model is loaded once when the application starts, ensuring efficient inference.
import joblib
import numpy as np
from fastapi import FastAPI
from pydantic import BaseModel
model = joblib.load("iris_model.joblib")
app = FastAPI(title="ML Inference Inside Docker")
class InputData(BaseModel):
features: list
Loading at startup avoids repeated disk reads during inference.
3. Define the Inference Endpoint
We define a simple prediction endpoint that reshapes inputs correctly and returns the model output.
@app.post("/predict")
def predict(data: InputData):
arr = np.array(data.features).reshape(1, -1)
prediction = model.predict(arr).tolist()
return {"prediction": prediction}
This endpoint is identical to the local inference logic — ensuring consistency.
4. Run the Inference API Inside Docker
We now run the containerized ML application.
docker run -p 8000:8000 iris-ml-api
At this point:
- the FastAPI server runs inside Docker
- the model loads inside the container
- port 8000 is exposed for inference requests
5. Send Inference Requests to the Container
We test inference using a real prediction request.
curl -X POST "http://127.0.0.1:8000/predict" \
-H "Content-Type: application/json" \
-d '{"features":[5.8,2.7,5.1,1.9]}'
A valid prediction response confirms that inference works end-to-end inside Docker.
Why This Validation Step Is Important
This step ensures that:
- the ML model loads correctly inside Docker
- the API receives and parses input correctly
- inference logic behaves consistently
- the container is ready for cloud hosting
Skipping this validation often leads to failures later during cloud deployment.
Key Takeaways
- Running inference inside Docker verifies real deployment readiness.
- Model loading should happen once at application startup.
- Containerized inference must match local inference behavior.
- Testing with real requests prevents hidden runtime failures.
- This step is essential before moving to cloud platforms.
Conclusion
Running ML inference inside a Docker container is more than a technical check — it’s a confidence checkpoint.
By validating predictions inside Docker, you ensure your model, API, and environment are truly aligned.
Once this step is complete, your ML application is no longer tied to your local system and is ready for cloud deployment, scaling, and real-world usage.
No comments yet. Be the first to comment!