The Anthropic Message Batches API lets you send up to 100,000 Claude requests in a single batch and pay 50% less per token. Batches process asynchronously – most finish within an hour, with a maximum window of 24 hours. If you’re running evals, classifying documents, or generating summaries at scale, this is the cheapest way to use Claude.

Here’s a minimal batch with two requests:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import anthropic
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request

client = anthropic.Anthropic()

batch = client.messages.batches.create(
    requests=[
        Request(
            custom_id="ticket-001",
            params=MessageCreateParamsNonStreaming(
                model="claude-sonnet-4-20250514",
                max_tokens=256,
                messages=[{"role": "user", "content": "Classify this support ticket: 'My payment failed at checkout.' Respond with one label: billing, technical, account, or general."}],
            ),
        ),
        Request(
            custom_id="ticket-002",
            params=MessageCreateParamsNonStreaming(
                model="claude-sonnet-4-20250514",
                max_tokens=256,
                messages=[{"role": "user", "content": "Classify this support ticket: 'I can't reset my password.' Respond with one label: billing, technical, account, or general."}],
            ),
        ),
    ]
)

print(f"Batch ID: {batch.id}")
print(f"Status: {batch.processing_status}")

You get back a batch ID and a status of in_progress. Each request needs a unique custom_id so you can match results back to inputs – the API doesn’t guarantee result order.

Setting Up

Install the Anthropic Python SDK:

1
pip install anthropic

Set your API key as an environment variable:

1
export ANTHROPIC_API_KEY="sk-ant-..."

The SDK reads ANTHROPIC_API_KEY automatically, so anthropic.Anthropic() works without passing the key explicitly.

Building a Real Batch: Classifying Support Tickets

A two-request example is fine for syntax. Here’s something closer to production – classifying 100 customer support tickets in a single batch.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import anthropic
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request

client = anthropic.Anthropic()

# Sample support tickets
tickets = [
    {"id": "t-001", "text": "My credit card was charged twice for the same order."},
    {"id": "t-002", "text": "The app crashes when I try to upload a profile photo."},
    {"id": "t-003", "text": "How do I change the email address on my account?"},
    {"id": "t-004", "text": "I want to cancel my subscription and get a refund."},
    {"id": "t-005", "text": "Your website is showing a 502 error on the pricing page."},
]

system_prompt = (
    "You are a support ticket classifier. "
    "Respond with exactly one label: billing, technical, account, or general. "
    "No explanation, just the label."
)

requests = []
for ticket in tickets:
    requests.append(
        Request(
            custom_id=ticket["id"],
            params=MessageCreateParamsNonStreaming(
                model="claude-sonnet-4-20250514",
                max_tokens=16,
                system=system_prompt,
                messages=[{"role": "user", "content": ticket["text"]}],
            ),
        )
    )

batch = client.messages.batches.create(requests=requests)
print(f"Created batch {batch.id} with {batch.request_counts.processing} requests")

Each request in the batch takes the same parameters as a regular client.messages.create() call – model, max_tokens, system, messages, temperature, tools, all of it. You can even mix different models and parameter combinations within the same batch.

Polling for Completion

Batches are asynchronous. You submit the work, then check back later. Here’s a polling loop:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import anthropic
import time

client = anthropic.Anthropic()

batch_id = "msgbatch_01ABC123..."  # from the create response

while True:
    batch = client.messages.batches.retrieve(batch_id)
    counts = batch.request_counts

    print(
        f"Processing: {counts.processing} | "
        f"Succeeded: {counts.succeeded} | "
        f"Errored: {counts.errored} | "
        f"Expired: {counts.expired}"
    )

    if batch.processing_status == "ended":
        break

    time.sleep(60)

print(f"Batch complete. Results ready.")

The request_counts object gives you a live tally of how many requests are still processing, how many succeeded, and how many hit errors or expired. All counts except processing stay at zero until the entire batch finishes.

Most batches complete in under an hour. The hard ceiling is 24 hours – anything still running after that expires.

Retrieving and Processing Results

Once the batch status is ended, stream the results:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import anthropic

client = anthropic.Anthropic()

batch_id = "msgbatch_01ABC123..."

results = {}
errors = []

for entry in client.messages.batches.results(batch_id):
    if entry.result.type == "succeeded":
        text = entry.result.message.content[0].text
        results[entry.custom_id] = text
    elif entry.result.type == "errored":
        errors.append({
            "id": entry.custom_id,
            "error": entry.result.error.type,
        })
    elif entry.result.type == "expired":
        errors.append({
            "id": entry.custom_id,
            "error": "expired",
        })

print(f"Got {len(results)} successful results, {len(errors)} failures")

for ticket_id, label in results.items():
    print(f"  {ticket_id}: {label}")

if errors:
    print("Errors:")
    for err in errors:
        print(f"  {err['id']}: {err['error']}")

Results come back as a stream of individual entries. Each entry has a custom_id matching your original request and a result with one of four types:

  • succeeded – the request completed. The message field contains the full Claude response.
  • errored – something went wrong. Check result.error.type for invalid_request (fix your input) or a server error (retry).
  • canceled – you canceled the batch before this request was processed.
  • expired – the 24-hour window ran out before this request was reached.

Results are not in the same order as your input requests. Always match on custom_id.

Cost Comparison: Standard vs. Batch

The batch API charges exactly 50% of standard per-token pricing. Here’s what that looks like for Claude Sonnet 4:

MethodInput CostOutput Cost
Standard API$3 / MTok$15 / MTok
Batch API$1.50 / MTok$7.50 / MTok

If you’re processing 10,000 tickets at roughly 100 input tokens and 10 output tokens each:

  • Standard: (1M input tokens * $3) + (100K output tokens * $15) = $4.50
  • Batch: (1M input tokens * $1.50) + (100K output tokens * $7.50) = $2.25

That’s a straight 50% savings with no change to output quality. The tradeoff is latency – you wait minutes to hours instead of getting a response in seconds.

You can stack batch pricing with prompt caching for even deeper discounts. Cached input tokens in a batch cost 90% less than standard input pricing, so you’re paying 5% of the base rate for repeated context.

Common Errors and Fixes

anthropic.BadRequestError: requests: each request must have a unique custom_id

Every custom_id in a single batch must be unique. If you’re generating IDs from a list, make sure there are no duplicates:

1
2
3
4
5
# Wrong: duplicate IDs if tickets have repeated IDs
requests = [Request(custom_id="request", params=...) for t in tickets]

# Right: use unique identifiers
requests = [Request(custom_id=f"ticket-{i}", params=...) for i, t in enumerate(tickets)]

413 request_too_large

Your batch exceeds the 256 MB size limit. Split it into smaller batches:

1
2
3
4
5
6
7
def chunk_list(items, chunk_size):
    for i in range(0, len(items), chunk_size):
        yield items[i : i + chunk_size]

for i, chunk in enumerate(chunk_list(all_requests, 5000)):
    batch = client.messages.batches.create(requests=chunk)
    print(f"Batch {i}: {batch.id}")

result.error.type == "invalid_request"

One or more requests in the batch had malformed parameters. The batch still processes – other requests aren’t affected. Common causes:

  • Missing required fields (model, max_tokens, messages)
  • Empty messages array
  • Invalid model name (typo or deprecated model)

Fix the request params and resubmit just the failed ones in a new batch.

Requests showing as expired

Your batch didn’t finish within 24 hours. This happens during high-demand periods or with very large batches. Solutions:

  • Break large batches into smaller ones (2,000-5,000 requests each)
  • Retry expired requests in a new batch
  • Check the Anthropic status page for capacity issues

anthropic.AuthenticationError

Your ANTHROPIC_API_KEY environment variable isn’t set or is invalid. Double-check:

1
2
echo $ANTHROPIC_API_KEY
# Should print sk-ant-...