The Bedrock Converse API is the right way to call models on AWS if you want to swap between Claude, Llama, and Mistral without rewriting your request format every time. Unlike invoke_model where each model family has its own JSON schema, converse() gives you one message format that works everywhere. You change the model ID string and everything else stays the same.
Set Up the Bedrock Runtime Client
Install boto3 and make sure you have AWS credentials configured. You need the bedrock-runtime service client – not the bedrock management client.
| |
Enable model access in the Bedrock console first. AWS requires you to explicitly request access for each model family (Claude, Llama, Mistral) before you can call them.
Here is a basic Converse API call to Claude:
| |
The content block format for Converse is {"text": "..."} – a plain dict with just the text key. This is different from Claude’s native invoke_model format which uses {"type": "text", "text": "..."}. Do not mix these up or you will get validation errors.
To call Llama or Mistral instead, swap the model ID. The rest of the code is identical:
| |
That is the whole point of the Converse API. No more juggling prompt vs messages, max_gen_len vs max_tokens, or different response parsing per model.
Stream Responses with converse_stream
For anything user-facing, you want streaming so tokens appear as they are generated instead of waiting for the full response. Use converse_stream():
| |
The stream yields several event types: messageStart, contentBlockStart, contentBlockDelta (the actual text chunks), contentBlockStop, messageStop, and metadata. You only need to handle contentBlockDelta for printing and metadata for token usage. The same streaming code works for Llama and Mistral – just change the model ID.
System prompts go in the system parameter as a list of text blocks. This is separate from the messages array, which keeps your conversation history clean.
Handle Tool Use Through Converse
The Converse API supports tool use (function calling) across models that support it, including Claude and Mistral. You define tools with a JSON schema and the model decides when to call them.
| |
The flow is: send the message with tool definitions, check if stopReason is "tool_use", extract the tool call details, execute your function, then send the result back as a toolResult block. The model then generates a natural language response using the tool output.
The tool schema format is consistent across models. Claude, Mistral, and any future models that support tool use through Bedrock will all use the same toolSpec / toolResult structure.
Model ID Reference
Here are the model IDs you will use most often with the Converse API on Bedrock:
| Model | Model ID |
|---|---|
| Claude Sonnet 4 | anthropic.claude-sonnet-4-20250514 |
| Claude Haiku 3.5 | anthropic.claude-3-5-haiku-20241022-v1:0 |
| Claude Opus 4 | anthropic.claude-opus-4-20250514 |
| Llama 3.1 70B Instruct | meta.llama3-1-70b-instruct-v1:0 |
| Llama 3.1 8B Instruct | meta.llama3-1-8b-instruct-v1:0 |
| Mistral Large (24.07) | mistral.mistral-large-2407-v1:0 |
| Mistral Small (24.02) | mistral.mistral-small-2402-v1:0 |
Not all models are available in every region. us-east-1 and us-west-2 have the broadest model selection. Check the Bedrock console for region-specific availability.
A quick helper to call any of these models:
| |
Common Errors and Fixes
AccessDeniedException: You don’t have access to the model Go to the Bedrock console, click “Model access” in the sidebar, and request access for the model family. Some models require EULA acceptance. Access grants can take a few minutes to propagate.
ValidationException: Malformed input request
You are probably mixing up Converse format and invoke_model format. Converse content blocks are {"text": "..."}. If you accidentally pass {"type": "text", "text": "..."} to converse(), you will get this error.
ResourceNotFoundException: Could not resolve the foundation model
Double-check your model ID string and region. Model IDs are exact – meta.llama3-1-70b-instruct-v1:0 is not the same as meta.llama3-70b-instruct-v1:0. The version suffix matters.
ThrottlingException: Rate exceeded Bedrock applies per-model, per-account rate limits. Request a quota increase through AWS Service Quotas, or implement exponential backoff:
| |
The adaptive retry mode handles throttling with exponential backoff automatically. For production workloads with predictable volume, look into Bedrock provisioned throughput.
Tool use returning unexpected stop reason
If you send tools in toolConfig but the model responds with stopReason: "end_turn" instead of "tool_use", it means the model decided it could answer without calling the tool. Always handle both stop reasons in your code.
Related Guides
- How to Use Amazon Bedrock for Foundation Model APIs
- How to Use the Cerebras API for Fast LLM Inference
- How to Use the Anthropic Prompt Caching API with Context Blocks
- How to Use the Anthropic Tool Use API for Agentic Workflows
- How to Use the Stability AI API for Image and Video Generation
- How to Use the OpenAI Realtime API for Voice Applications
- How to Use the Anthropic PDF Processing API for Document Analysis
- How to Use the Weights and Biases Prompts API for LLM Tracing
- How to Use the Mistral API for Code Generation and Chat
- How to Use Claude’s Model Context Protocol (MCP)