Your Prometheus fires an alert at 3 AM. The on-call engineer opens PagerDuty, stares at a spike in http_request_duration_seconds, manually queries five other metrics in Grafana, cross-references the deploy log, and eventually figures out a bad config rollout increased upstream latency. That whole investigation loop is exactly what an LLM agent can automate.

Here’s the architecture: a FastAPI service receives Alertmanager webhooks, pulls the relevant metrics window from Prometheus, hands everything to an LLM, and gets back a diagnosis with remediation steps. Let’s build it.

The Webhook Receiver

Alertmanager sends a JSON payload when an alert fires. We need a FastAPI app that accepts that webhook, extracts the alert details, and kicks off the diagnosis pipeline.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# agent.py
import json
from contextlib import asynccontextmanager
from datetime import datetime, timedelta

from fastapi import FastAPI, Request
from openai import OpenAI
from prometheus_api_client import PrometheusConnect

# Configuration
PROMETHEUS_URL = "http://prometheus:9090"
OPENAI_MODEL = "gpt-4o"

prom: PrometheusConnect | None = None
llm: OpenAI | None = None


@asynccontextmanager
async def lifespan(app: FastAPI):
    global prom, llm
    prom = PrometheusConnect(url=PROMETHEUS_URL, disable_ssl=True)
    llm = OpenAI()  # reads OPENAI_API_KEY from env
    yield
    prom = None
    llm = None


app = FastAPI(lifespan=lifespan)


@app.post("/webhook")
async def receive_alert(request: Request):
    payload = await request.json()
    alerts = payload.get("alerts", [])
    results = []

    for alert in alerts:
        if alert.get("status") != "firing":
            continue

        labels = alert.get("labels", {})
        annotations = alert.get("annotations", {})
        alert_name = labels.get("alertname", "unknown")
        severity = labels.get("severity", "warning")
        summary = annotations.get("summary", "No summary provided")

        # Pull metrics around the alert window
        metrics_context = query_related_metrics(labels)

        # Ask the LLM for diagnosis
        diagnosis = diagnose_with_llm(
            alert_name=alert_name,
            severity=severity,
            summary=summary,
            labels=labels,
            metrics=metrics_context,
        )

        results.append({
            "alert": alert_name,
            "diagnosis": diagnosis,
        })

    return {"status": "processed", "results": results}

The lifespan context manager initializes both the Prometheus client and the OpenAI client on startup. No deprecated @app.on_event decorators here.

Querying Prometheus for Context

A raw alert name alone isn’t enough for a good diagnosis. The agent needs to pull surrounding metrics – the same ones an engineer would check manually. We query a 30-minute window around the alert and grab metrics matching the alert’s labels.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def query_related_metrics(labels: dict) -> str:
    """Pull metrics from Prometheus that share the alert's labels."""
    end_time = datetime.now()
    start_time = end_time - timedelta(minutes=30)

    # Build a label matcher from alert labels (skip meta labels)
    skip_keys = {"alertname", "severity", "prometheus", "alertstate"}
    matchers = {k: v for k, v in labels.items() if k not in skip_keys}

    # Queries to run -- adjust these to match your stack
    queries = [
        'rate(http_request_duration_seconds_sum{__MATCHERS__}[5m]) / rate(http_request_duration_seconds_count{__MATCHERS__}[5m])',
        'rate(http_requests_total{__MATCHERS__, status=~"5.."}[5m])',
        'process_resident_memory_bytes{__MATCHERS__}',
        'rate(container_cpu_usage_seconds_total{__MATCHERS__}[5m])',
    ]

    matcher_str = ", ".join(f'{k}="{v}"' for k, v in matchers.items())
    sections = []

    for query_template in queries:
        query = query_template.replace("__MATCHERS__", matcher_str)
        try:
            result = prom.custom_query_range(
                query=query,
                start_time=start_time,
                end_time=end_time,
                step="60s",
            )
            if result:
                metric_name = query.split("{")[0].split("(")[-1]
                values = result[0].get("values", [])
                recent = values[-5:] if len(values) >= 5 else values
                formatted = [f"  t={v[0]}: {float(v[1]):.4f}" for v in recent]
                sections.append(f"{metric_name} (last {len(recent)} points):\n" + "\n".join(formatted))
        except Exception as e:
            sections.append(f"Query failed ({query[:60]}...): {e}")

    if not sections:
        return "No metric data retrieved. Prometheus may be unreachable or labels didn't match."

    return "\n\n".join(sections)

This pulls four key signals: request latency, 5xx error rate, memory usage, and CPU. You should customize these queries based on what your infrastructure actually exposes. The function returns a formatted string of recent data points that goes straight into the LLM prompt.

LLM Diagnosis

Now the interesting part. We take the alert metadata and the metric snapshots, build a structured prompt, and ask the LLM to act as an SRE.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def diagnose_with_llm(
    alert_name: str,
    severity: str,
    summary: str,
    labels: dict,
    metrics: str,
) -> dict:
    """Send alert context to the LLM and get a structured diagnosis."""
    system_prompt = (
        "You are an expert SRE agent. You receive Prometheus alerts with "
        "associated metric data. Your job is to:\n"
        "1. Identify the most likely root cause\n"
        "2. Explain the evidence from the metrics\n"
        "3. Suggest specific remediation steps\n"
        "4. Rate confidence as low/medium/high\n\n"
        "Be concise. Use metric values to support your reasoning."
    )

    user_prompt = (
        f"ALERT: {alert_name}\n"
        f"SEVERITY: {severity}\n"
        f"SUMMARY: {summary}\n"
        f"LABELS: {json.dumps(labels)}\n\n"
        f"METRIC DATA (30-minute window):\n{metrics}"
    )

    response = llm.chat.completions.create(
        model=OPENAI_MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        temperature=0.2,
        response_format={"type": "json_object"},
    )

    raw = response.choices[0].message.content
    try:
        return json.loads(raw)
    except json.JSONDecodeError:
        return {"raw_diagnosis": raw, "parse_error": True}

Setting temperature=0.2 keeps the output focused. We use response_format={"type": "json_object"} so the model returns structured JSON we can pipe into a ticketing system, Slack message, or runbook trigger. If JSON parsing fails, the raw text still comes through.

Alertmanager Configuration

Point your Alertmanager at the agent by adding a webhook receiver. Here’s the relevant section of alertmanager.yml:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# alertmanager.yml
route:
  receiver: "default"
  group_by: ["alertname", "namespace"]
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  routes:
    - match:
        severity: critical
      receiver: "llm-agent"
    - match:
        severity: warning
      receiver: "llm-agent"

receivers:
  - name: "default"
    webhook_configs:
      - url: "http://monitoring-agent:8000/webhook"
  - name: "llm-agent"
    webhook_configs:
      - url: "http://monitoring-agent:8000/webhook"
        send_resolved: true

Set send_resolved: true if you want the agent to also analyze recovery patterns. The agent code already filters for status == "firing" alerts, so resolved notifications get skipped unless you extend the handler.

Running the Agent

Start the FastAPI server alongside your Prometheus stack:

1
pip install fastapi uvicorn openai prometheus-api-client
1
uvicorn agent:app --host 0.0.0.0 --port 8000

To test without waiting for a real alert, send a fake Alertmanager payload:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
curl -X POST http://localhost:8000/webhook \
  -H "Content-Type: application/json" \
  -d '{
    "status": "firing",
    "alerts": [
      {
        "status": "firing",
        "labels": {
          "alertname": "HighLatency",
          "severity": "critical",
          "job": "api-server",
          "instance": "api-server:8080"
        },
        "annotations": {
          "summary": "P99 latency for api-server exceeded 2s for 5 minutes"
        }
      }
    ]
  }'

You’ll get back a JSON response with the LLM’s diagnosis, root cause analysis, and suggested remediation steps.

Common Errors and Fixes

prometheus_api_client.exceptions.PrometheusApiClientException: Connection refused

Your Prometheus URL is wrong or the instance isn’t reachable from the agent container. If running in Docker Compose, use the service name (http://prometheus:9090), not localhost.

openai.AuthenticationError: Incorrect API key

The OpenAI() client reads OPENAI_API_KEY from the environment. Make sure it’s set before starting uvicorn:

1
2
export OPENAI_API_KEY="sk-..."
uvicorn agent:app --host 0.0.0.0 --port 8000

Empty metrics context – “No metric data retrieved”

This happens when your PromQL label matchers don’t match any time series. Check that the label names in your alert rules (job, instance, namespace) actually exist on the metrics you’re querying. Run the queries manually in the Prometheus UI first.

json.JSONDecodeError from the LLM response

Even with response_format={"type": "json_object"}, older models or unusual edge cases can return malformed JSON. The code already handles this with a fallback, but if it happens often, add a retry loop or switch to a model that supports structured outputs more reliably like gpt-4o.

Alertmanager sends alerts but the agent never receives them

Check Alertmanager’s /api/v2/alerts endpoint to confirm alerts are active. Then check the webhook receiver status at /-/healthy. Network policies or firewalls between Alertmanager and your agent service are the usual culprit in Kubernetes clusters.