Install the SDK

1
pip install anthropic

Set your API key as an environment variable:

1
export ANTHROPIC_API_KEY="sk-ant-..."

The SDK picks up ANTHROPIC_API_KEY automatically — no need to pass it in code.

Your First API Call

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ]
)

print(message.content[0].text)
# "The capital of France is Paris."

The messages.create() method is the core of everything. Every interaction with Claude goes through this endpoint.

System Prompts

System prompts set Claude’s behavior before the conversation starts. Use them for role definition, output formatting, and constraints.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    system="You are a senior Python developer. Respond with code only, no explanations. Use type hints and docstrings.",
    messages=[
        {"role": "user", "content": "Write a function to merge two sorted lists"}
    ]
)

print(message.content[0].text)

The system parameter is separate from messages — it’s a top-level argument, not a message with role: "system".

Streaming Responses

For real-time applications, stream tokens as they’re generated instead of waiting for the full response.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import anthropic

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain how hash tables work"}
    ]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print()  # Newline at the end

The stream() context manager gives you a text_stream iterator. Each iteration yields a chunk of text as soon as it’s generated. The flush=True ensures characters appear immediately in the terminal.

Multi-Turn Conversations

Pass the full conversation history in the messages array. Claude doesn’t remember previous calls — you manage the context.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import anthropic

client = anthropic.Anthropic()

conversation = []

def chat(user_message: str) -> str:
    """Send a message and get a response, maintaining conversation history."""
    conversation.append({"role": "user", "content": user_message})

    response = client.messages.create(
        model="claude-sonnet-4-5-20250514",
        max_tokens=1024,
        system="You are a helpful coding tutor. Keep answers concise.",
        messages=conversation,
    )

    assistant_message = response.content[0].text
    conversation.append({"role": "assistant", "content": assistant_message})

    return assistant_message


# Multi-turn conversation
print(chat("What's a decorator in Python?"))
print(chat("Show me an example of a timing decorator"))
print(chat("How do I make it work with functions that have arguments?"))

Each call includes the full history, so Claude has context from earlier messages.

Error Handling

The SDK raises typed exceptions you can catch individually.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import anthropic

client = anthropic.Anthropic()

try:
    message = client.messages.create(
        model="claude-sonnet-4-5-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": "Hello"}],
    )
    print(message.content[0].text)

except anthropic.AuthenticationError:
    print("Invalid API key. Check ANTHROPIC_API_KEY.")

except anthropic.RateLimitError:
    print("Rate limited. Wait and retry.")

except anthropic.APIStatusError as e:
    print(f"API error {e.status_code}: {e.message}")

except anthropic.APIConnectionError:
    print("Could not connect to the API. Check your network.")

For production, use the built-in retry logic:

1
2
3
# The SDK automatically retries on transient errors (429, 500, 503)
# with exponential backoff. Configure max retries:
client = anthropic.Anthropic(max_retries=5)

Working with Images

Claude can analyze images passed as base64 or URLs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import anthropic
import base64

client = anthropic.Anthropic()

# Read image and encode to base64
with open("screenshot.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data,
                    },
                },
                {
                    "type": "text",
                    "text": "Describe what you see in this image.",
                },
            ],
        }
    ],
)

print(message.content[0].text)

Response Metadata

Every response includes useful metadata beyond the text.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)

print(f"Model: {message.model}")
print(f"Stop reason: {message.stop_reason}")
print(f"Input tokens: {message.usage.input_tokens}")
print(f"Output tokens: {message.usage.output_tokens}")
print(f"Response: {message.content[0].text}")

Track usage.input_tokens and usage.output_tokens to monitor costs. Claude Sonnet is $3 per million input tokens and $15 per million output tokens.

Common Issues

AuthenticationError on first call. Make sure ANTHROPIC_API_KEY is exported in your shell, not just set in a .env file. The SDK reads the environment variable directly.

Responses cut off mid-sentence. Increase max_tokens. The default is model-dependent, but you should always set it explicitly. Claude won’t generate beyond this limit.

Slow responses. Use streaming for better perceived latency. The total generation time is the same, but the user sees tokens immediately instead of waiting for the full response.