Traefik is the best reverse proxy for containerized ML services. It discovers services automatically through Docker labels – no config file reloads, no NGINX upstream blocks to maintain. You add a new model service to your Compose file, slap on a few labels, and Traefik routes traffic to it immediately. Add Let’s Encrypt TLS on top, and you have production-grade model serving without touching a single certificate manually.
Here’s the minimal setup: a FastAPI model server, a Dockerfile, and a docker-compose.yml with Traefik routing to two model services at different paths.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
| # docker-compose.yml
services:
traefik:
image: traefik:v3.2
command:
- "--api.insecure=true"
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entrypoints.web.address=:80"
ports:
- "80:80"
- "8080:8080" # Traefik dashboard
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
restart: unless-stopped
embeddings:
build: .
environment:
- MODEL_TYPE=embeddings
labels:
- "traefik.enable=true"
- "traefik.http.routers.embeddings.rule=PathPrefix(`/models/embeddings`)"
- "traefik.http.routers.embeddings.entrypoints=web"
- "traefik.http.services.embeddings.loadbalancer.server.port=8000"
- "traefik.http.middlewares.strip-embeddings.stripprefix.prefixes=/models/embeddings"
- "traefik.http.routers.embeddings.middlewares=strip-embeddings"
restart: unless-stopped
classification:
build: .
environment:
- MODEL_TYPE=classification
labels:
- "traefik.enable=true"
- "traefik.http.routers.classification.rule=PathPrefix(`/models/classification`)"
- "traefik.http.routers.classification.entrypoints=web"
- "traefik.http.services.classification.loadbalancer.server.port=8000"
- "traefik.http.middlewares.strip-classification.stripprefix.prefixes=/models/classification"
- "traefik.http.routers.classification.middlewares=strip-classification"
restart: unless-stopped
|
Requests to /models/embeddings/predict hit the embeddings service at /predict. The stripprefix middleware removes the path prefix before forwarding, so your FastAPI app doesn’t need to know about the routing structure.
The FastAPI Model Server#
Both services run the same application. The MODEL_TYPE environment variable controls which model loads at startup. This keeps the codebase simple – one image, multiple deployments.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
| # app.py
import os
import time
from contextlib import asynccontextmanager
from typing import Any
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
ml_state: dict[str, Any] = {}
@asynccontextmanager
async def lifespan(app: FastAPI):
model_type = os.getenv("MODEL_TYPE", "embeddings")
print(f"Loading model: {model_type}")
start = time.time()
if model_type == "embeddings":
# Replace with your actual model, e.g.:
# from sentence_transformers import SentenceTransformer
# ml_state["model"] = SentenceTransformer("all-MiniLM-L6-v2")
ml_state["model"] = lambda text: [0.1, 0.2, 0.3, 0.4] * 96 # 384-dim placeholder
ml_state["model_name"] = "all-MiniLM-L6-v2"
elif model_type == "classification":
# Replace with your actual model, e.g.:
# from transformers import pipeline
# ml_state["model"] = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
ml_state["model"] = lambda text: {"label": "POSITIVE", "score": 0.97}
ml_state["model_name"] = "distilbert-sst2"
else:
raise ValueError(f"Unknown MODEL_TYPE: {model_type}")
ml_state["model_type"] = model_type
ml_state["loaded_at"] = time.time()
print(f"Model loaded in {time.time() - start:.2f}s")
yield
ml_state.clear()
print("Model unloaded")
app = FastAPI(title="Model Serving API", lifespan=lifespan)
class PredictRequest(BaseModel):
text: str
@app.get("/health")
async def health():
if "model" not in ml_state:
raise HTTPException(status_code=503, detail="Model not loaded")
return {
"status": "healthy",
"model_type": ml_state["model_type"],
"model_name": ml_state["model_name"],
"uptime_seconds": round(time.time() - ml_state["loaded_at"], 1),
}
@app.post("/predict")
async def predict(req: PredictRequest):
if "model" not in ml_state:
raise HTTPException(status_code=503, detail="Model not loaded")
model = ml_state["model"]
result = model(req.text)
return {
"model_type": ml_state["model_type"],
"model_name": ml_state["model_name"],
"result": result,
}
|
The lifespan context manager is the right way to handle startup/shutdown in FastAPI. The old @app.on_event("startup") decorator is deprecated and will be removed. The lifespan pattern also gives you proper cleanup on shutdown – important when your model holds GPU memory or file handles.
The Dockerfile#
Keep it lean. Install only what the model needs, run as non-root, and add curl for health checks.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| # Dockerfile
FROM python:3.12-slim
WORKDIR /app
RUN apt-get update && \
apt-get install -y --no-install-recommends curl && \
rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir fastapi uvicorn pydantic
COPY app.py .
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "1"]
|
One Uvicorn worker per container is deliberate. Scale horizontally by running multiple containers behind Traefik’s load balancer – not by spawning workers inside a single container. Each container gets its own health check, resource limits, and restart policy.
Health Checks, Resource Limits, and Scaling#
The basic Compose file works, but production needs health checks so Traefik only routes to healthy backends, resource limits so a runaway model doesn’t eat all your RAM, and deploy.replicas for horizontal scaling.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
| # docker-compose.prod.yml
services:
traefik:
image: traefik:v3.2
command:
- "--api.insecure=true"
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
- "--ping=true"
ports:
- "80:80"
- "443:443"
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- letsencrypt:/letsencrypt
healthcheck:
test: ["CMD", "traefik", "healthcheck", "--ping"]
interval: 10s
timeout: 5s
retries: 3
restart: unless-stopped
deploy:
resources:
limits:
cpus: "0.50"
memory: 256M
embeddings:
build: .
environment:
- MODEL_TYPE=embeddings
labels:
- "traefik.enable=true"
- "traefik.http.routers.embeddings.rule=PathPrefix(`/models/embeddings`)"
- "traefik.http.routers.embeddings.entrypoints=web"
- "traefik.http.services.embeddings.loadbalancer.server.port=8000"
- "traefik.http.services.embeddings.loadbalancer.healthcheck.path=/health"
- "traefik.http.services.embeddings.loadbalancer.healthcheck.interval=15s"
- "traefik.http.middlewares.strip-embeddings.stripprefix.prefixes=/models/embeddings"
- "traefik.http.routers.embeddings.middlewares=strip-embeddings"
# Rate limiting: 100 requests per second average, burst of 200
- "traefik.http.middlewares.embeddings-ratelimit.ratelimit.average=100"
- "traefik.http.middlewares.embeddings-ratelimit.ratelimit.burst=200"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 15s
timeout: 5s
retries: 3
start_period: 30s
restart: unless-stopped
deploy:
replicas: 2
resources:
limits:
cpus: "2.0"
memory: 4G
reservations:
memory: 2G
classification:
build: .
environment:
- MODEL_TYPE=classification
labels:
- "traefik.enable=true"
- "traefik.http.routers.classification.rule=PathPrefix(`/models/classification`)"
- "traefik.http.routers.classification.entrypoints=web"
- "traefik.http.services.classification.loadbalancer.server.port=8000"
- "traefik.http.services.classification.loadbalancer.healthcheck.path=/health"
- "traefik.http.services.classification.loadbalancer.healthcheck.interval=15s"
- "traefik.http.middlewares.strip-classification.stripprefix.prefixes=/models/classification"
- "traefik.http.routers.classification.middlewares=strip-classification"
- "traefik.http.middlewares.classification-ratelimit.ratelimit.average=100"
- "traefik.http.middlewares.classification-ratelimit.ratelimit.burst=200"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 15s
timeout: 5s
retries: 3
start_period: 30s
restart: unless-stopped
deploy:
replicas: 2
resources:
limits:
cpus: "2.0"
memory: 4G
reservations:
memory: 2G
volumes:
letsencrypt:
|
The loadbalancer.healthcheck.path label tells Traefik to check /health on each container. If a container fails the health check, Traefik removes it from the rotation without dropping in-flight requests. The Docker-level healthcheck handles container restarts separately – they work together but serve different purposes.
With replicas: 2, Traefik automatically load-balances across both instances using round-robin. No extra configuration needed.
Build and start everything:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| docker compose -f docker-compose.prod.yml up --build -d
# Verify both replicas are running
docker compose -f docker-compose.prod.yml ps
# Test the embeddings endpoint
curl -X POST http://localhost/models/embeddings/predict \
-H "Content-Type: application/json" \
-d '{"text": "Traefik handles routing automatically"}'
# Test the classification endpoint
curl -X POST http://localhost/models/classification/predict \
-H "Content-Type: application/json" \
-d '{"text": "This deployment pipeline is solid"}'
# Check the Traefik dashboard
curl http://localhost:8080/api/http/services | python3 -m json.tool
|
Let’s Encrypt TLS with Traefik#
Traefik’s killer feature for production is automatic TLS. Add a certificate resolver, point it at Let’s Encrypt, and every router with the tls.certresolver label gets a real certificate. No certbot cron jobs, no manual renewal.
Update the Traefik service command and labels:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| # Add to the traefik service in docker-compose.prod.yml
services:
traefik:
image: traefik:v3.2
command:
- "--api.insecure=true"
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--entrypoints.web.address=:80"
- "--entrypoints.websecure.address=:443"
- "--certificatesresolvers.letsencrypt.acme.httpchallenge=true"
- "--certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web"
- "--certificatesresolvers.letsencrypt.acme.email=your-email@example.com"
- "--certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json"
- "--ping=true"
ports:
- "80:80"
- "443:443"
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- letsencrypt:/letsencrypt
|
Then update each model service’s labels to use HTTPS and the certificate resolver:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| # Updated labels for the embeddings service (same pattern for classification)
labels:
- "traefik.enable=true"
# HTTP router -- redirects to HTTPS
- "traefik.http.routers.embeddings-http.rule=PathPrefix(`/models/embeddings`)"
- "traefik.http.routers.embeddings-http.entrypoints=web"
- "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
- "traefik.http.routers.embeddings-http.middlewares=redirect-to-https"
# HTTPS router
- "traefik.http.routers.embeddings.rule=PathPrefix(`/models/embeddings`) && Host(`models.yourdomain.com`)"
- "traefik.http.routers.embeddings.entrypoints=websecure"
- "traefik.http.routers.embeddings.tls.certresolver=letsencrypt"
- "traefik.http.services.embeddings.loadbalancer.server.port=8000"
- "traefik.http.middlewares.strip-embeddings.stripprefix.prefixes=/models/embeddings"
- "traefik.http.routers.embeddings.middlewares=strip-embeddings"
|
Replace models.yourdomain.com with your actual domain. Traefik requests a certificate from Let’s Encrypt the first time a request hits the router, stores it in /letsencrypt/acme.json, and renews it automatically before expiry.
The HTTP-to-HTTPS redirect middleware ensures nobody accidentally hits the unencrypted endpoint. Every request on port 80 gets a 301 redirect to port 443.
Common Errors and Fixes#
404 page not found when hitting /models/embeddings/predict.
The PathPrefix rule is case-sensitive and must match exactly. Check that the request URL matches the label – a trailing slash mismatch is the most common cause. Also verify the stripprefix middleware is attached to the router. Without it, your FastAPI app receives /models/embeddings/predict instead of /predict and returns 404.
1
2
| # Debug routing by checking Traefik's API
curl http://localhost:8080/api/http/routers | python3 -m json.tool
|
502 Bad Gateway after starting Docker Compose.
Traefik tried to route to a container that hasn’t finished starting. The Docker-level health check and start_period handle the container restart side, but Traefik needs its own health check label (loadbalancer.healthcheck.path) to know when the backend is ready. Without it, Traefik routes traffic as soon as the container port is open – before the model finishes loading.
Gateway Timeout on large model inference.
Traefik’s default timeout for backend responses is 30 seconds. For models that take longer, set a custom timeout per service:
1
2
| # Add to your service labels
- "traefik.http.services.embeddings.loadbalancer.responsetimeout=120s"
|
too many certificates already issued from Let’s Encrypt.
You hit the rate limit – 50 certificates per registered domain per week. During development, use Let’s Encrypt’s staging server to avoid this:
1
2
| # Add to traefik command
- "--certificatesresolvers.letsencrypt.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory"
|
Switch back to the production server once your setup is confirmed working. Staging certificates are not trusted by browsers but the issuance flow is identical.
Traefik dashboard shows the service but with 0 servers.
The container is running but Traefik can’t reach it. Check that exposedbydefault=false is set and the service has traefik.enable=true. Also verify the loadbalancer.server.port label matches the port your FastAPI app listens on. If you changed Uvicorn’s port from 8000 to something else but didn’t update the label, Traefik sends traffic to the wrong port.
1
2
| # Check which containers Traefik sees
docker compose logs traefik | grep -i "server"
|