OpenRouter gives you a single API endpoint that routes to 400+ models across OpenAI, Anthropic, Google, Meta, Mistral, and dozens of other providers. It’s OpenAI SDK-compatible, so you swap one base URL and your existing code works against any model.

Here’s the fastest path to a working call:

1
2
pip install openai
export OPENROUTER_API_KEY="sk-or-v1-your-key-here"
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import os
from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.environ["OPENROUTER_API_KEY"],
)

response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "Explain gradient descent in two sentences."}],
)

print(response.choices[0].message.content)

That’s it. Same OpenAI SDK, different base_url, access to every major model family.

Authentication and Model Selection

Sign up at openrouter.ai and grab an API key from the dashboard. Keys start with sk-or-v1-. You can optionally pass attribution headers so OpenRouter’s leaderboard tracks your app:

1
2
3
4
5
6
7
8
response = client.chat.completions.create(
    extra_headers={
        "HTTP-Referer": "https://yourapp.com",
        "X-Title": "Your App Name",
    },
    model="google/gemini-2.5-pro-preview",
    messages=[{"role": "user", "content": "Summarize this paper for me."}],
)

Model IDs follow a provider/model-name pattern. Some commonly used ones:

Model IDProvider
openai/gpt-4oOpenAI
anthropic/claude-sonnet-4Anthropic
google/gemini-2.5-pro-previewGoogle
meta-llama/llama-4-maverickMeta
mistralai/mistral-largeMistral

Browse the full list at openrouter.ai/models. Pricing varies per model and OpenRouter adds no markup on most models.

Streaming Responses

Streaming works exactly like the OpenAI SDK. Set stream=True and iterate over chunks:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "Write a Python function to merge two sorted lists."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)

print()  # newline after stream finishes

Each chunk follows the standard OpenAI SSE format. The final chunk has choices[0].finish_reason set to "stop". If you’re already handling OpenAI streams, nothing changes.

Fallback Routing and Provider Preferences

This is where OpenRouter shines over calling providers directly. You can define fallback models so if your primary model is down, rate-limited, or refuses the request, the next model in the list handles it automatically.

Pass the models array through extra_body:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "Explain the CAP theorem."}],
    extra_body={
        "models": [
            "anthropic/claude-sonnet-4",
            "openai/gpt-4o",
            "google/gemini-2.5-pro-preview",
        ],
    },
)

# Check which model actually served the request
print(f"Served by: {response.model}")

If Anthropic returns an error, OpenRouter tries GPT-4o, then Gemini. You only pay for the model that actually succeeds.

You can also control provider ordering and performance thresholds:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
response = client.chat.completions.create(
    model="meta-llama/llama-4-maverick",
    messages=[{"role": "user", "content": "What is RLHF?"}],
    extra_body={
        "provider": {
            "order": ["Fireworks", "Together", "Lepton"],
            "allow_fallbacks": True,
            "require_parameters": True,
        },
    },
)

The order field sets which infrastructure providers to try first for that model. allow_fallbacks (default True) controls whether OpenRouter tries other providers if your preferred ones fail. Setting require_parameters to True ensures the provider supports all features you’ve requested (like function calling or JSON mode) before routing to it.

For latency-sensitive workloads, you can sort providers by throughput:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
response = client.chat.completions.create(
    model="meta-llama/llama-4-maverick",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "provider": {
            "sort": "throughput",
            "preferred_min_throughput": 50,  # tokens/sec minimum
        },
    },
)

Cost Tracking

Every response includes a usage field with token counts and cost:

1
2
3
4
5
6
7
8
response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "What is a transformer?"}],
)

print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

For detailed cost breakdowns after the fact, query the generation endpoint with the response ID:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import requests

generation_id = response.id  # from the chat completion response
headers = {"Authorization": f"Bearer {os.environ['OPENROUTER_API_KEY']}"}

stats = requests.get(
    f"https://openrouter.ai/api/v1/generation?id={generation_id}",
    headers=headers,
)

data = stats.json()["data"]
print(f"Model used: {data['model']}")
print(f"Total cost: ${data['total_cost']}")
print(f"Generation time: {data['generation_time']}ms")
print(f"Prompt tokens (native): {data['native_tokens_prompt']}")
print(f"Completion tokens (native): {data['native_tokens_completion']}")

This gives you native token counts (the actual tokenizer’s count, not the OpenAI-normalized count), total cost in USD, and generation latency. Useful for building dashboards or setting budget alerts.

Common Errors and Fixes

401 Unauthorized – Your API key is missing or invalid. Double-check the OPENROUTER_API_KEY env var. Keys must start with sk-or-v1-.

402 Payment Required – Your OpenRouter account has insufficient credits. Add credits at openrouter.ai/credits. Free-tier models exist but have rate limits.

404 Model Not Found – The model ID is wrong. Model IDs are case-sensitive and follow provider/model-name format. Check openrouter.ai/models for the exact ID.

429 Rate Limited – You’ve hit the rate limit for a specific provider. Use fallback models to route around this automatically:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello"}],
    extra_body={
        "models": [
            "anthropic/claude-sonnet-4",
            "openai/gpt-4o",
        ],
    },
)

Streaming hangs or returns empty chunks – Some models don’t support streaming. Check the model’s page on OpenRouter for supported features. If you’re behind a proxy or CDN, make sure it’s not buffering SSE responses.

extra_body ignored silently – If provider preferences aren’t taking effect, make sure you’re passing them as a top-level extra_body kwarg, not nested inside messages or headers.