How to Build Blue-Green Deployments for ML Models

What Blue-Green Gets You

Blue-green deployment runs two identical environments. Blue is your current production model. Green is the new version you want to ship. You deploy the green version, validate it with health checks and test traffic, then flip a switch to route all production traffic from blue to green. If green breaks, you flip back to blue in seconds.

This is different from canary deployments where you gradually shift traffic. Blue-green is all-or-nothing: 100% blue, then 100% green. The tradeoff is simplicity. No traffic splitting logic, no weighted routing. Just two environments and a switch.

1
pip install fastapi uvicorn httpx pydantic

The Model Server

Each environment (blue and green) runs the same FastAPI server, but loads a different model version. The lifespan context manager handles model loading at startup and cleanup at shutdown.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# model_server.py
import os
import time
from contextlib import asynccontextmanager

from fastapi import FastAPI
from fastapi.responses import JSONResponse
from pydantic import BaseModel

MODEL_VERSION = os.getenv("MODEL_VERSION", "v1")
MODEL_NAME = os.getenv("MODEL_NAME", "sentiment-v1")

model_store = {}


def load_model(version: str) -> dict:
    """Simulate loading a model. Replace with your actual model loading logic."""
    # In production: model = torch.load(f"models/{version}/model.pt")
    return {
        "version": version,
        "loaded_at": time.time(),
        "weights": f"weights-for-{version}",
    }


@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup: load the model
    print(f"Loading model {MODEL_NAME} {MODEL_VERSION}...")
    model_store["model"] = load_model(MODEL_VERSION)
    model_store["request_count"] = 0
    model_store["error_count"] = 0
    print(f"Model {MODEL_VERSION} ready.")
    yield
    # Shutdown: clean up
    model_store.clear()
    print(f"Model {MODEL_VERSION} unloaded.")


app = FastAPI(lifespan=lifespan)


class PredictRequest(BaseModel):
    text: str


class PredictResponse(BaseModel):
    prediction: str
    confidence: float
    model_version: str


@app.get("/health")
async def health():
    model = model_store.get("model")
    if model is None:
        return JSONResponse(
            status_code=503,
            content={"status": "unhealthy", "reason": "model not loaded"},
        )
    return {
        "status": "healthy",
        "model_version": model["version"],
        "uptime_seconds": round(time.time() - model["loaded_at"], 1),
        "request_count": model_store["request_count"],
        "error_rate": (
            model_store["error_count"] / max(model_store["request_count"], 1)
        ),
    }


@app.post("/predict", response_model=PredictResponse)
async def predict(req: PredictRequest):
    model_store["request_count"] += 1
    model = model_store["model"]

    # Replace with your actual inference logic
    if model["version"] == "v2":
        result = {"prediction": "positive", "confidence": 0.93}
    else:
        result = {"prediction": "positive", "confidence": 0.87}

    return PredictResponse(
        prediction=result["prediction"],
        confidence=result["confidence"],
        model_version=model["version"],
    )

Run it locally to test:

1
MODEL_VERSION=v1 uvicorn model_server:app --host 0.0.0.0 --port 8001

Docker Compose for Blue and Green

The two environments run as separate Docker Compose services on different ports. An Nginx reverse proxy sits in front and routes traffic to whichever environment is active.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# docker-compose.yml
services:
  blue:
    build: .
    environment:
      - MODEL_VERSION=v1
      - MODEL_NAME=sentiment-v1
    ports:
      - "8001:8000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 10s
      timeout: 5s
      retries: 3

  green:
    build: .
    environment:
      - MODEL_VERSION=v2
      - MODEL_NAME=sentiment-v2
    ports:
      - "8002:8000"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 10s
      timeout: 5s
      retries: 3

  router:
    image: nginx:1.25-alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - blue
      - green

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Dockerfile
FROM python:3.11-slim

RUN apt-get update && apt-get install -y --no-install-recommends curl \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app
RUN pip install fastapi uvicorn httpx pydantic
COPY model_server.py .

CMD ["uvicorn", "model_server:app", "--host", "0.0.0.0", "--port", "8000"]

The Nginx config starts by routing to blue:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# nginx.conf
events {
    worker_connections 1024;
}

http {
    upstream active_model {
        server blue:8000;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://active_model;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_connect_timeout 5s;
            proxy_read_timeout 30s;
        }
    }
}

Start everything:

1
docker compose up -d

At this point, all traffic goes to blue (v1) on port 8001. Green (v2) is running on port 8002 but not receiving production traffic.

The Deployment Script

This is the core of blue-green. The script validates green, swaps the Nginx config to point at green, and rolls back if error rates spike.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# deploy.py
import subprocess
import sys
import time

import httpx

BLUE_URL = "http://localhost:8001"
GREEN_URL = "http://localhost:8002"
ROUTER_URL = "http://localhost:80"

HEALTH_CHECK_RETRIES = 10
HEALTH_CHECK_INTERVAL = 3  # seconds
ERROR_RATE_THRESHOLD = 0.05  # 5% error rate triggers rollback
VALIDATION_REQUESTS = 50  # number of test requests to send
MONITORING_DURATION = 60  # seconds to monitor after swap


NGINX_TEMPLATE = """events {{
    worker_connections 1024;
}}

http {{
    upstream active_model {{
        server {target}:8000;
    }}

    server {{
        listen 80;

        location / {{
            proxy_pass http://active_model;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_connect_timeout 5s;
            proxy_read_timeout 30s;
        }}
    }}
}}
"""


def wait_for_healthy(url: str, label: str) -> bool:
    """Poll the health endpoint until the service is ready."""
    for attempt in range(1, HEALTH_CHECK_RETRIES + 1):
        try:
            resp = httpx.get(f"{url}/health", timeout=5.0)
            data = resp.json()
            if data.get("status") == "healthy":
                print(f"  {label} is healthy (attempt {attempt})")
                return True
            print(f"  {label} not ready: {data}")
        except httpx.RequestError as e:
            print(f"  {label} unreachable (attempt {attempt}): {e}")
        time.sleep(HEALTH_CHECK_INTERVAL)
    return False


def send_test_traffic(url: str, count: int) -> dict:
    """Send test requests and return error stats."""
    errors = 0
    latencies = []
    for i in range(count):
        start = time.time()
        try:
            resp = httpx.post(
                f"{url}/predict",
                json={"text": f"test input {i}"},
                timeout=10.0,
            )
            latencies.append(time.time() - start)
            if resp.status_code != 200:
                errors += 1
        except httpx.RequestError:
            errors += 1
            latencies.append(time.time() - start)

    avg_latency = sum(latencies) / max(len(latencies), 1)
    error_rate = errors / max(count, 1)
    return {"errors": errors, "error_rate": error_rate, "avg_latency": avg_latency}


def swap_to(target: str):
    """Rewrite nginx.conf and reload the router."""
    config = NGINX_TEMPLATE.format(target=target)
    with open("nginx.conf", "w") as f:
        f.write(config)
    subprocess.run(
        ["docker", "compose", "exec", "router", "nginx", "-s", "reload"],
        check=True,
    )
    print(f"  Router now points to {target}")


def deploy():
    print("=== Blue-Green Deployment ===\n")

    # Step 1: Verify blue is healthy
    print("Step 1: Checking blue (current production)...")
    if not wait_for_healthy(BLUE_URL, "Blue"):
        print("ERROR: Blue is not healthy. Aborting deployment.")
        sys.exit(1)

    # Step 2: Verify green is healthy
    print("\nStep 2: Checking green (new version)...")
    if not wait_for_healthy(GREEN_URL, "Green"):
        print("ERROR: Green failed health checks. Aborting deployment.")
        sys.exit(1)

    # Step 3: Send test traffic to green
    print(f"\nStep 3: Sending {VALIDATION_REQUESTS} test requests to green...")
    stats = send_test_traffic(GREEN_URL, VALIDATION_REQUESTS)
    print(f"  Results: {stats['errors']} errors, "
          f"{stats['error_rate']:.1%} error rate, "
          f"{stats['avg_latency']*1000:.0f}ms avg latency")

    if stats["error_rate"] > ERROR_RATE_THRESHOLD:
        print(f"ERROR: Green error rate {stats['error_rate']:.1%} exceeds "
              f"threshold {ERROR_RATE_THRESHOLD:.1%}. Aborting.")
        sys.exit(1)

    # Step 4: Swap traffic to green
    print("\nStep 4: Swapping traffic to green...")
    swap_to("green")
    time.sleep(2)  # let nginx settle

    # Step 5: Monitor green under production traffic
    print(f"\nStep 5: Monitoring green for {MONITORING_DURATION}s...")
    end_time = time.time() + MONITORING_DURATION
    check_interval = 10

    while time.time() < end_time:
        resp = httpx.get(f"{GREEN_URL}/health", timeout=5.0)
        health = resp.json()
        error_rate = health.get("error_rate", 0)
        remaining = int(end_time - time.time())
        print(f"  error_rate={error_rate:.3f}, "
              f"requests={health.get('request_count', 0)}, "
              f"{remaining}s remaining")

        if error_rate > ERROR_RATE_THRESHOLD:
            print(f"\nROLLBACK: Error rate {error_rate:.1%} exceeds threshold!")
            swap_to("blue")
            print("Rolled back to blue. Green deployment failed.")
            sys.exit(1)

        time.sleep(check_interval)

    print("\n=== Deployment successful! Green is now production. ===")


if __name__ == "__main__":
    deploy()

Run the deployment:

1
python deploy.py

The script follows this flow: check blue is alive, check green is alive, send test traffic directly to green, swap the router, monitor for errors, and roll back if anything goes wrong. Every step has a clear exit point.

Rollback Mechanics

Rollback is just rewriting the Nginx config and reloading. Because blue is still running and serving zero traffic, it stays warm and ready. The swap takes under a second.

If you need to roll back manually:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Point nginx back to blue
cat > nginx.conf << 'EOF'
events {
    worker_connections 1024;
}

http {
    upstream active_model {
        server blue:8000;
    }

    server {
        listen 80;

        location / {
            proxy_pass http://active_model;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_connect_timeout 5s;
            proxy_read_timeout 30s;
        }
    }
}
EOF

docker compose exec router nginx -s reload

After a successful deployment, you can update blue to match green for the next deployment cycle, or just flip the roles: next time, deploy to blue while green serves production.

Common Errors and Fixes

Green passes health checks but fails under load

Health checks only verify the model is loaded. They don’t test throughput. Add a load test phase to your deployment script between the health check and the swap. Send 500+ concurrent requests with httpx.AsyncClient and check p99 latency, not just error rate.

Nginx reload drops connections

nginx -s reload does a graceful reload – it finishes in-flight requests before swapping. But if your model inference takes longer than proxy_read_timeout, those requests will be killed. Set proxy_read_timeout higher than your worst-case inference time. For LLMs, 120s is a reasonable starting point.

Docker Compose port conflicts

If ports 8001 or 8002 are already in use, the containers fail silently. Check with docker compose ps after starting. If a container is in “Exit” state, run docker compose logs blue or docker compose logs green to see the actual error.

Green model loads a stale version

Docker caches layers aggressively. If your model weights are baked into the image, make sure you rebuild with --no-cache or use a unique tag per version:

1
2
docker compose build --no-cache green
docker compose up -d green

Rollback doesn’t work because blue crashed while idle

Blue sits idle during green’s serving period, but it can still crash from OOM or GPU memory issues. The deployment script checks blue’s health at the start, but you should also run a background health monitor that alerts if the standby environment goes down.

Error rate spikes briefly during nginx reload

This usually happens when the upstream isn’t fully connected yet. Add a proxy_next_upstream error timeout directive so Nginx retries failed requests on the same upstream. For blue-green this is safe because there’s only one upstream at a time.

What Blue-Green Gets You#

The Model Server#

Docker Compose for Blue and Green#

The Deployment Script#

Rollback Mechanics#

Common Errors and Fixes#

Related Guides#

About the Author