The Message Batches API lets you fire off up to 100,000 Claude requests in a single batch at 50% of the standard per-token price. The trade-off is straightforward: instead of sub-second responses, your results arrive asynchronously within a 24-hour window (most batches finish in under an hour). If you’re running evaluations, processing documents, generating content at scale, or doing any workload where you don’t need an answer right now, this is the most cost-effective path through the Anthropic API.
This guide walks through the full lifecycle: creating a batch, polling until it finishes, and parsing every result into something useful.
Creating a Batch
Install the SDK and set your API key first:
| |
A batch is a list of individual message requests, each tagged with a custom_id you choose. The API processes them all asynchronously and returns results keyed by those IDs. Here’s a real example – classifying 100 product reviews into sentiment categories:
| |
A few things to note. Every custom_id must be unique within the batch – duplicates cause a validation error. The params dict takes the same fields as a regular client.messages.create() call: model, max_tokens, messages, system, temperature, tools, and so on. You can mix different models and parameters within a single batch if you want.
The response gives you a batch ID (starts with msgbatch_) and a processing_status of in_progress. Save that batch ID – you need it for everything that follows.
Polling for Results
Batches are fire-and-forget. You submit the work, then check back. The simplest approach is a polling loop that calls retrieve() until the batch status flips to ended:
| |
The processing_status field has three possible values:
in_progress– still working through the queue.ended– all requests have a terminal result (succeeded, errored, expired, or canceled).canceling– you called cancel and it’s winding down.
The request_counts object updates as the batch progresses. During processing, counts.processing decreases while the other counters go up. When processing hits zero and status is ended, you’re ready to pull results.
For production systems, consider a webhook or cron job instead of a blocking loop. But for scripts and notebooks, polling works perfectly.
Processing Batch Results
Once status is ended, iterate over the results. Each entry maps back to a custom_id from your original requests. Here’s a complete example that sorts results into successes and failures, then writes to a JSON file:
| |
Results stream back in arbitrary order – never assume they match your input sequence. The custom_id is your only reliable key for matching results to requests.
Each succeeded result contains the full message object, identical to what you’d get from a synchronous client.messages.create() call. That means you have access to usage, stop_reason, content, and everything else.
Common Errors and Fixes
Duplicate custom_id values
The API rejects the entire batch if any two requests share the same custom_id. If you’re generating IDs from a loop, double-check for collisions:
| |
Batch too large (413 error)
Individual batches can’t exceed 256 MB total payload. If you’re sending long prompts or many requests, split into chunks:
| |
Invalid request parameters in individual items
A malformed request inside a batch (missing model, empty messages, bad parameter name) shows up as an errored result for that specific custom_id. The rest of the batch processes normally. Fix the parameters and resubmit only the failed requests in a new batch.
Expired requests
If requests expire (hit the 24-hour ceiling), it usually means the batch was too large during a high-demand period. Break your next batch into smaller chunks (2,000–5,000 requests each) and retry the expired ones separately.
Rate limits on batch creation
You can’t create unlimited batches simultaneously. Anthropic enforces concurrency limits on in-flight batches. If you hit a rate limit on create(), wait for an existing batch to finish before submitting more. The client.messages.batches.list() endpoint shows all your active batches so you can track what’s still running.
Related Guides
- How to Use the Anthropic Extended Thinking API for Complex Reasoning
- How to Use the Anthropic Batch API for High-Volume Processing
- How to Use the Anthropic Token Efficient Tool Use API
- How to Use the DeepSeek API for Code and Reasoning Tasks
- How to Use the Anthropic Token Streaming API for Real-Time UIs
- How to Use the xAI Grok API for Chat and Function Calling
- How to Use the Anthropic Citations API for Grounded Responses
- How to Use the OpenRouter API for Multi-Provider LLM Access
- How to Use LiteLLM as a Universal LLM API Gateway
- How to Use the Together AI API for Open-Source LLMs