The Bedrock Converse API is the right way to call models on AWS if you want to swap between Claude, Llama, and Mistral without rewriting your request format every time. Unlike invoke_model where each model family has its own JSON schema, converse() gives you one message format that works everywhere. You change the model ID string and everything else stays the same.

Set Up the Bedrock Runtime Client

Install boto3 and make sure you have AWS credentials configured. You need the bedrock-runtime service client – not the bedrock management client.

1
2
pip install boto3
aws configure  # set access key, secret key, and region

Enable model access in the Bedrock console first. AWS requires you to explicitly request access for each model family (Claude, Llama, Mistral) before you can call them.

Here is a basic Converse API call to Claude:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import boto3

client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse(
    modelId="anthropic.claude-sonnet-4-20250514",
    messages=[
        {
            "role": "user",
            "content": [{"text": "What are the tradeoffs between REST and GraphQL?"}],
        }
    ],
    inferenceConfig={
        "maxTokens": 512,
        "temperature": 0.5,
    },
)

output_text = response["output"]["message"]["content"][0]["text"]
print(output_text)
print(f"Total tokens: {response['usage']['totalTokens']}")

The content block format for Converse is {"text": "..."} – a plain dict with just the text key. This is different from Claude’s native invoke_model format which uses {"type": "text", "text": "..."}. Do not mix these up or you will get validation errors.

To call Llama or Mistral instead, swap the model ID. The rest of the code is identical:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Llama 3.1 70B
response = client.converse(
    modelId="meta.llama3-1-70b-instruct-v1:0",
    messages=[
        {"role": "user", "content": [{"text": "Explain container orchestration."}]}
    ],
    inferenceConfig={"maxTokens": 512, "temperature": 0.7},
)

# Mistral Large
response = client.converse(
    modelId="mistral.mistral-large-2407-v1:0",
    messages=[
        {"role": "user", "content": [{"text": "Explain container orchestration."}]}
    ],
    inferenceConfig={"maxTokens": 512, "temperature": 0.7},
)

That is the whole point of the Converse API. No more juggling prompt vs messages, max_gen_len vs max_tokens, or different response parsing per model.

Stream Responses with converse_stream

For anything user-facing, you want streaming so tokens appear as they are generated instead of waiting for the full response. Use converse_stream():

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import boto3

client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse_stream(
    modelId="anthropic.claude-sonnet-4-20250514",
    messages=[
        {"role": "user", "content": [{"text": "Write a Python binary search implementation."}]}
    ],
    system=[{"text": "You are a concise coding assistant. Return only code with brief comments."}],
    inferenceConfig={"maxTokens": 1024, "temperature": 0.2},
)

for event in response["stream"]:
    if "contentBlockDelta" in event:
        print(event["contentBlockDelta"]["delta"]["text"], end="", flush=True)
    if "metadata" in event:
        usage = event["metadata"]["usage"]
        print(f"\nInput tokens: {usage['inputTokens']}")
        print(f"Output tokens: {usage['outputTokens']}")

The stream yields several event types: messageStart, contentBlockStart, contentBlockDelta (the actual text chunks), contentBlockStop, messageStop, and metadata. You only need to handle contentBlockDelta for printing and metadata for token usage. The same streaming code works for Llama and Mistral – just change the model ID.

System prompts go in the system parameter as a list of text blocks. This is separate from the messages array, which keeps your conversation history clean.

Handle Tool Use Through Converse

The Converse API supports tool use (function calling) across models that support it, including Claude and Mistral. You define tools with a JSON schema and the model decides when to call them.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
import boto3
import json

client = boto3.client("bedrock-runtime", region_name="us-east-1")

# Define a tool
tools = [
    {
        "toolSpec": {
            "name": "get_weather",
            "description": "Get the current weather for a given city.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "city": {
                            "type": "string",
                            "description": "The city name, e.g. 'San Francisco'",
                        }
                    },
                    "required": ["city"],
                }
            },
        }
    }
]

messages = [
    {"role": "user", "content": [{"text": "What's the weather in Seattle?"}]}
]

response = client.converse(
    modelId="anthropic.claude-sonnet-4-20250514",
    messages=messages,
    toolConfig={"tools": tools},
    inferenceConfig={"maxTokens": 1024},
)

# Check if the model wants to use a tool
response_message = response["output"]["message"]
stop_reason = response["stopReason"]

if stop_reason == "tool_use":
    # Extract the tool use block
    for block in response_message["content"]:
        if "toolUse" in block:
            tool_name = block["toolUse"]["name"]
            tool_input = block["toolUse"]["input"]
            tool_use_id = block["toolUse"]["toolUseId"]
            print(f"Model wants to call: {tool_name}({json.dumps(tool_input)})")

            # Simulate calling the tool and return the result
            tool_result = {"temperature": "54F", "condition": "cloudy", "humidity": "82%"}

            # Send tool result back to the model
            messages.append(response_message)
            messages.append({
                "role": "user",
                "content": [
                    {
                        "toolResult": {
                            "toolUseId": tool_use_id,
                            "content": [{"json": tool_result}],
                        }
                    }
                ],
            })

            final_response = client.converse(
                modelId="anthropic.claude-sonnet-4-20250514",
                messages=messages,
                toolConfig={"tools": tools},
                inferenceConfig={"maxTokens": 1024},
            )
            print(final_response["output"]["message"]["content"][0]["text"])

The flow is: send the message with tool definitions, check if stopReason is "tool_use", extract the tool call details, execute your function, then send the result back as a toolResult block. The model then generates a natural language response using the tool output.

The tool schema format is consistent across models. Claude, Mistral, and any future models that support tool use through Bedrock will all use the same toolSpec / toolResult structure.

Model ID Reference

Here are the model IDs you will use most often with the Converse API on Bedrock:

ModelModel ID
Claude Sonnet 4anthropic.claude-sonnet-4-20250514
Claude Haiku 3.5anthropic.claude-3-5-haiku-20241022-v1:0
Claude Opus 4anthropic.claude-opus-4-20250514
Llama 3.1 70B Instructmeta.llama3-1-70b-instruct-v1:0
Llama 3.1 8B Instructmeta.llama3-1-8b-instruct-v1:0
Mistral Large (24.07)mistral.mistral-large-2407-v1:0
Mistral Small (24.02)mistral.mistral-small-2402-v1:0

Not all models are available in every region. us-east-1 and us-west-2 have the broadest model selection. Check the Bedrock console for region-specific availability.

A quick helper to call any of these models:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import boto3

def chat(model_id: str, prompt: str, system: str = "", max_tokens: int = 512) -> str:
    client = boto3.client("bedrock-runtime", region_name="us-east-1")

    kwargs = {
        "modelId": model_id,
        "messages": [{"role": "user", "content": [{"text": prompt}]}],
        "inferenceConfig": {"maxTokens": max_tokens, "temperature": 0.5},
    }
    if system:
        kwargs["system"] = [{"text": system}]

    response = client.converse(**kwargs)
    return response["output"]["message"]["content"][0]["text"]


# Same function works for all models
print(chat("anthropic.claude-sonnet-4-20250514", "Explain RLHF in two sentences."))
print(chat("meta.llama3-1-70b-instruct-v1:0", "Explain RLHF in two sentences."))
print(chat("mistral.mistral-large-2407-v1:0", "Explain RLHF in two sentences."))

Common Errors and Fixes

AccessDeniedException: You don’t have access to the model Go to the Bedrock console, click “Model access” in the sidebar, and request access for the model family. Some models require EULA acceptance. Access grants can take a few minutes to propagate.

ValidationException: Malformed input request You are probably mixing up Converse format and invoke_model format. Converse content blocks are {"text": "..."}. If you accidentally pass {"type": "text", "text": "..."} to converse(), you will get this error.

ResourceNotFoundException: Could not resolve the foundation model Double-check your model ID string and region. Model IDs are exact – meta.llama3-1-70b-instruct-v1:0 is not the same as meta.llama3-70b-instruct-v1:0. The version suffix matters.

ThrottlingException: Rate exceeded Bedrock applies per-model, per-account rate limits. Request a quota increase through AWS Service Quotas, or implement exponential backoff:

1
2
3
4
5
6
7
from botocore.config import Config

config = Config(
    retries={"max_attempts": 5, "mode": "adaptive"},
    read_timeout=300,
)
client = boto3.client("bedrock-runtime", region_name="us-east-1", config=config)

The adaptive retry mode handles throttling with exponential backoff automatically. For production workloads with predictable volume, look into Bedrock provisioned throughput.

Tool use returning unexpected stop reason If you send tools in toolConfig but the model responds with stopReason: "end_turn" instead of "tool_use", it means the model decided it could answer without calling the tool. Always handle both stop reasons in your code.