The Message Batches API lets you fire off up to 100,000 Claude requests in a single batch at 50% of the standard per-token price. The trade-off is straightforward: instead of sub-second responses, your results arrive asynchronously within a 24-hour window (most batches finish in under an hour). If you’re running evaluations, processing documents, generating content at scale, or doing any workload where you don’t need an answer right now, this is the most cost-effective path through the Anthropic API.

This guide walks through the full lifecycle: creating a batch, polling until it finishes, and parsing every result into something useful.

Creating a Batch

Install the SDK and set your API key first:

1
2
pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

A batch is a list of individual message requests, each tagged with a custom_id you choose. The API processes them all asynchronously and returns results keyed by those IDs. Here’s a real example – classifying 100 product reviews into sentiment categories:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import anthropic
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request

client = anthropic.Anthropic()

# Simulated product reviews -- in production, load these from a database or CSV
reviews = [
    {"id": f"review-{i:04d}", "text": text}
    for i, text in enumerate([
        "Absolutely love this keyboard. The mechanical switches feel great.",
        "Broke after two weeks. Total waste of money.",
        "It's fine. Does what it says. Nothing special.",
        "Best purchase I've made all year. The battery lasts forever.",
        "Arrived damaged, customer support ghosted me.",
    ] * 20)  # 100 reviews total
]

system_prompt = (
    "Classify the product review sentiment. "
    "Respond with exactly one word: positive, negative, or neutral."
)

batch_requests = [
    Request(
        custom_id=review["id"],
        params=MessageCreateParamsNonStreaming(
            model="claude-sonnet-4-20250514",
            max_tokens=8,
            system=system_prompt,
            messages=[{"role": "user", "content": review["text"]}],
        ),
    )
    for review in reviews
]

batch = client.messages.batches.create(requests=batch_requests)
print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")
print(f"Requests queued: {batch.request_counts.processing}")

A few things to note. Every custom_id must be unique within the batch – duplicates cause a validation error. The params dict takes the same fields as a regular client.messages.create() call: model, max_tokens, messages, system, temperature, tools, and so on. You can mix different models and parameters within a single batch if you want.

The response gives you a batch ID (starts with msgbatch_) and a processing_status of in_progress. Save that batch ID – you need it for everything that follows.

Polling for Results

Batches are fire-and-forget. You submit the work, then check back. The simplest approach is a polling loop that calls retrieve() until the batch status flips to ended:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import anthropic
import time

client = anthropic.Anthropic()

batch_id = "msgbatch_01ABC123..."  # from the create response above

while True:
    batch = client.messages.batches.retrieve(batch_id)
    counts = batch.request_counts

    print(
        f"Processing: {counts.processing} | "
        f"Succeeded: {counts.succeeded} | "
        f"Errored: {counts.errored} | "
        f"Expired: {counts.expired} | "
        f"Canceled: {counts.canceled}"
    )

    if batch.processing_status == "ended":
        break

    # Poll every 60 seconds -- no need to hammer the API
    time.sleep(60)

print("Batch finished. Fetching results...")

The processing_status field has three possible values:

  • in_progress – still working through the queue.
  • ended – all requests have a terminal result (succeeded, errored, expired, or canceled).
  • canceling – you called cancel and it’s winding down.

The request_counts object updates as the batch progresses. During processing, counts.processing decreases while the other counters go up. When processing hits zero and status is ended, you’re ready to pull results.

For production systems, consider a webhook or cron job instead of a blocking loop. But for scripts and notebooks, polling works perfectly.

Processing Batch Results

Once status is ended, iterate over the results. Each entry maps back to a custom_id from your original requests. Here’s a complete example that sorts results into successes and failures, then writes to a JSON file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import anthropic
import json

client = anthropic.Anthropic()

batch_id = "msgbatch_01ABC123..."

successes = {}
failures = []

for entry in client.messages.batches.results(batch_id):
    if entry.result.type == "succeeded":
        response_text = entry.result.message.content[0].text
        successes[entry.custom_id] = {
            "sentiment": response_text.strip().lower(),
            "input_tokens": entry.result.message.usage.input_tokens,
            "output_tokens": entry.result.message.usage.output_tokens,
        }
    elif entry.result.type == "errored":
        failures.append({
            "custom_id": entry.custom_id,
            "error_type": entry.result.error.type,
            "error_message": str(entry.result.error.message),
        })
    elif entry.result.type == "expired":
        failures.append({
            "custom_id": entry.custom_id,
            "error_type": "expired",
            "error_message": "Request did not complete within 24-hour window",
        })
    elif entry.result.type == "canceled":
        failures.append({
            "custom_id": entry.custom_id,
            "error_type": "canceled",
            "error_message": "Batch was canceled before this request processed",
        })

# Write results to file
output = {
    "total_succeeded": len(successes),
    "total_failed": len(failures),
    "results": successes,
    "failures": failures,
}

with open("batch_results.json", "w") as f:
    json.dump(output, f, indent=2)

print(f"Saved {len(successes)} results, {len(failures)} failures to batch_results.json")

# Quick summary
for review_id, data in list(successes.items())[:5]:
    print(f"  {review_id}: {data['sentiment']} ({data['input_tokens']}+{data['output_tokens']} tokens)")

Results stream back in arbitrary order – never assume they match your input sequence. The custom_id is your only reliable key for matching results to requests.

Each succeeded result contains the full message object, identical to what you’d get from a synchronous client.messages.create() call. That means you have access to usage, stop_reason, content, and everything else.

Common Errors and Fixes

Duplicate custom_id values

The API rejects the entire batch if any two requests share the same custom_id. If you’re generating IDs from a loop, double-check for collisions:

1
2
3
4
5
6
# Safe: use enumerate or UUID
import uuid
batch_requests = [
    Request(custom_id=str(uuid.uuid4()), params=...)
    for item in items
]

Batch too large (413 error)

Individual batches can’t exceed 256 MB total payload. If you’re sending long prompts or many requests, split into chunks:

1
2
3
4
5
6
7
8
9
def chunk(items, size):
    for i in range(0, len(items), size):
        yield items[i:i + size]

batch_ids = []
for group in chunk(batch_requests, 5000):
    b = client.messages.batches.create(requests=group)
    batch_ids.append(b.id)
    print(f"Created batch {b.id} with {len(group)} requests")

Invalid request parameters in individual items

A malformed request inside a batch (missing model, empty messages, bad parameter name) shows up as an errored result for that specific custom_id. The rest of the batch processes normally. Fix the parameters and resubmit only the failed requests in a new batch.

Expired requests

If requests expire (hit the 24-hour ceiling), it usually means the batch was too large during a high-demand period. Break your next batch into smaller chunks (2,000–5,000 requests each) and retry the expired ones separately.

Rate limits on batch creation

You can’t create unlimited batches simultaneously. Anthropic enforces concurrency limits on in-flight batches. If you hit a rate limit on create(), wait for an existing batch to finish before submitting more. The client.messages.batches.list() endpoint shows all your active batches so you can track what’s still running.