You want to know how much a Claude API call will cost before you send it. Maybe you’re building a product where users submit arbitrary prompts, and you need to reject requests that blow past a spending limit. Or you’re routing between models and need the token count to pick the cheapest option. The Anthropic token counting API gives you exact input token counts without running inference – it’s free, fast, and uses the same parameters as the Messages API.
Count Tokens for Messages#
The client.messages.count_tokens() method accepts the same inputs as client.messages.create() – model, system prompt, messages, and tools. It returns a MessageTokensCount object with a single input_tokens field.
Install the SDK if you haven’t:
Here’s a basic example counting tokens for a single message:
1
2
3
4
5
6
7
8
9
10
| from anthropic import Anthropic
client = Anthropic()
response = client.messages.count_tokens(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Explain gradient descent in three sentences."}],
)
print(response.input_tokens) # e.g., 16
|
The ANTHROPIC_API_KEY environment variable is read automatically. No need to pass it explicitly.
System prompts and multi-turn conversations#
System prompts and conversation history add tokens. Here’s how to count a multi-turn conversation with a system prompt:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| from anthropic import Anthropic
client = Anthropic()
response = client.messages.count_tokens(
model="claude-sonnet-4-20250514",
system="You are a Python expert. Answer concisely with code examples.",
messages=[
{"role": "user", "content": "How do I read a CSV file?"},
{"role": "assistant", "content": "Use pandas: `pd.read_csv('data.csv')`"},
{"role": "user", "content": "How do I filter rows where column 'age' is greater than 30?"},
],
)
print(f"Input tokens: {response.input_tokens}") # e.g., 65
|
Each turn adds tokens. The assistant’s previous response counts as input tokens for the next request. This is the number you’ll be billed for on the input side when you call messages.create() with the same parameters.
Estimate Costs from Token Counts#
Token counts alone aren’t useful unless you map them to dollars. Here’s a cost calculator using real Anthropic pricing:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
| from anthropic import Anthropic
# Pricing per million tokens (as of 2025)
MODEL_PRICING = {
"claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
"claude-3-5-haiku-20241022": {"input": 0.80, "output": 4.00},
"claude-opus-4-20250514": {"input": 15.00, "output": 75.00},
}
def estimate_cost(
model: str,
messages: list,
system: str | None = None,
estimated_output_tokens: int = 500,
) -> dict:
"""Estimate the cost of a Claude API call before sending it."""
client = Anthropic()
kwargs = {"model": model, "messages": messages}
if system:
kwargs["system"] = system
token_count = client.messages.count_tokens(**kwargs)
input_tokens = token_count.input_tokens
pricing = MODEL_PRICING[model]
input_cost = (input_tokens / 1_000_000) * pricing["input"]
output_cost = (estimated_output_tokens / 1_000_000) * pricing["output"]
return {
"model": model,
"input_tokens": input_tokens,
"estimated_output_tokens": estimated_output_tokens,
"input_cost_usd": round(input_cost, 6),
"estimated_output_cost_usd": round(output_cost, 6),
"estimated_total_cost_usd": round(input_cost + output_cost, 6),
}
result = estimate_cost(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Write a Python function to merge two sorted lists."}],
estimated_output_tokens=300,
)
for key, value in result.items():
print(f"{key}: {value}")
|
Output tokens are always an estimate since you can’t know the exact response length beforehand. Use max_tokens as a worst-case ceiling, or track your average output length per use case and plug that in.
Build a Budget Guard#
Here’s a practical wrapper that checks estimated cost against a budget before sending a request. If the input alone would exceed your limit, the request gets rejected without burning any inference tokens:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
| from anthropic import Anthropic
MODEL_PRICING = {
"claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
"claude-3-5-haiku-20241022": {"input": 0.80, "output": 4.00},
"claude-opus-4-20250514": {"input": 15.00, "output": 75.00},
}
class BudgetGuard:
def __init__(self, budget_usd: float, model: str = "claude-sonnet-4-20250514"):
self.budget_remaining = budget_usd
self.model = model
self.client = Anthropic()
self.pricing = MODEL_PRICING[model]
def _estimate_max_cost(self, input_tokens: int, max_output_tokens: int) -> float:
input_cost = (input_tokens / 1_000_000) * self.pricing["input"]
output_cost = (max_output_tokens / 1_000_000) * self.pricing["output"]
return input_cost + output_cost
def send(
self,
messages: list,
system: str | None = None,
max_tokens: int = 1024,
):
kwargs = {"model": self.model, "messages": messages}
if system:
kwargs["system"] = system
token_count = self.client.messages.count_tokens(**kwargs)
input_tokens = token_count.input_tokens
worst_case_cost = self._estimate_max_cost(input_tokens, max_tokens)
if worst_case_cost > self.budget_remaining:
raise RuntimeError(
f"Request rejected: worst-case cost ${worst_case_cost:.4f} "
f"exceeds remaining budget ${self.budget_remaining:.4f}"
)
response = self.client.messages.create(
max_tokens=max_tokens,
**kwargs,
)
# Calculate actual cost from usage
actual_input = response.usage.input_tokens
actual_output = response.usage.output_tokens
actual_cost = (
(actual_input / 1_000_000) * self.pricing["input"]
+ (actual_output / 1_000_000) * self.pricing["output"]
)
self.budget_remaining -= actual_cost
print(f"Cost: ${actual_cost:.6f} | Remaining: ${self.budget_remaining:.4f}")
return response
# Usage
guard = BudgetGuard(budget_usd=0.50, model="claude-sonnet-4-20250514")
response = guard.send(
messages=[{"role": "user", "content": "What is the capital of France?"}],
max_tokens=100,
)
print(response.content[0].text)
|
The guard counts tokens first, calculates worst-case cost using max_tokens, and only proceeds if it fits within budget. After the response, it deducts the actual cost. This gives you a hard spending limit per session, user, or request batch.
Tool definitions add significant token overhead. Each tool’s name, description, and JSON schema get serialized into the prompt. Counting tokens with tools tells you exactly how much that overhead costs:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
| from anthropic import Anthropic
client = Anthropic()
tools = [
{
"name": "get_weather",
"description": "Get the current weather in a given location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
}
},
"required": ["location"],
},
},
{
"name": "get_stock_price",
"description": "Get the current stock price for a ticker symbol",
"input_schema": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "Stock ticker symbol, e.g. AAPL",
}
},
"required": ["ticker"],
},
},
]
# Count WITHOUT tools
no_tools = client.messages.count_tokens(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "What's the weather in NYC?"}],
)
# Count WITH tools
with_tools = client.messages.count_tokens(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "What's the weather in NYC?"}],
tools=tools,
)
tool_overhead = with_tools.input_tokens - no_tools.input_tokens
print(f"Without tools: {no_tools.input_tokens} tokens")
print(f"With tools: {with_tools.input_tokens} tokens")
print(f"Tool overhead: {tool_overhead} tokens")
|
You’ll typically see tool definitions add hundreds of tokens. Two simple tools can easily add 400+ tokens. If you’re defining 10-20 tools, that overhead can reach thousands of tokens per request. Use this measurement to decide which tools to include – strip out tools the model won’t need for a given query.
Common Errors and Fixes#
anthropic.AuthenticationError: 401#
Your API key is missing or invalid.
1
2
3
4
5
6
7
| # Wrong -- key not set
client = Anthropic()
# Fix -- set the environment variable
# export ANTHROPIC_API_KEY=sk-ant-api03-...
# Or pass it directly:
client = Anthropic(api_key="sk-ant-api03-...")
|
Check that ANTHROPIC_API_KEY is exported in your shell, not just assigned. Run echo $ANTHROPIC_API_KEY to verify.
anthropic.BadRequestError: 400 - messages: roles must alternate#
Your messages array has consecutive user or assistant turns:
1
2
3
4
5
6
7
8
9
10
| # Wrong -- two user messages in a row
messages = [
{"role": "user", "content": "Hello"},
{"role": "user", "content": "How are you?"},
]
# Fix -- combine them or add an assistant turn between
messages = [
{"role": "user", "content": "Hello. How are you?"},
]
|
The Anthropic API requires strict user / assistant alternation. The SDK will combine consecutive same-role messages automatically in some cases, but it’s safer to structure them correctly upfront.
anthropic.RateLimitError: 429 - rate_limit_error#
Token counting has its own rate limits separate from message creation. Limits depend on your usage tier:
| Tier | Requests per minute |
|---|
| 1 | 100 |
| 2 | 2,000 |
| 3 | 4,000 |
| 4 | 8,000 |
Add retry logic with exponential backoff:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| import time
from anthropic import Anthropic, RateLimitError
client = Anthropic()
def count_with_retry(max_retries: int = 3, **kwargs) -> int:
for attempt in range(max_retries):
try:
response = client.messages.count_tokens(**kwargs)
return response.input_tokens
except RateLimitError:
wait = 2 ** attempt
print(f"Rate limited. Retrying in {wait}s...")
time.sleep(wait)
raise RuntimeError("Max retries exceeded for token counting")
tokens = count_with_retry(
model="claude-sonnet-4-20250514",
messages=[{"role": "user", "content": "Hello"}],
)
print(f"Tokens: {tokens}")
|
anthropic.BadRequestError: 400 - model: invalid model#
You passed a model name that doesn’t exist. Double-check the model ID:
1
2
3
4
5
6
7
| # Wrong
model = "claude-3-sonnet"
# Right -- use full model IDs
model = "claude-sonnet-4-20250514" # Sonnet 4
model = "claude-3-5-haiku-20241022" # Haiku 3.5
model = "claude-opus-4-20250514" # Opus 4
|
All active models support token counting. Use the exact model ID string from the Anthropic docs.