How to Use the Anthropic Claude Vision API for Image Understanding

Claude can see images. Send it a screenshot, a photo of a whiteboard, or an architecture diagram and it will describe what it sees, extract text, or answer specific questions about the visual content. The API accepts images as base64-encoded data or URLs, and you can send multiple images in a single request.

Install the SDK and set your API key:

1
2
pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

Here is a minimal example that sends an image URL and asks Claude to describe it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from anthropic import Anthropic

client = Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Cat03.jpg/1200px-Cat03.jpg",
                    },
                },
                {
                    "type": "text",
                    "text": "What is in this image? Be specific.",
                },
            ],
        }
    ],
)

print(message.content[0].text)

That is the entire flow. The rest of this guide covers base64 encoding, multi-image requests, structured extraction, and the errors you will actually hit in production.

Sending Images as Base64

When you have a local file – a screenshot, a scanned document, a photo from your phone – you need to encode it as base64 before sending it to the API. This is the most common pattern for server-side applications.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
import base64
from pathlib import Path
from anthropic import Anthropic

client = Anthropic()

image_path = Path("receipt.png")
image_data = base64.standard_b64encode(image_path.read_bytes()).decode("utf-8")

# Determine media type from file extension
suffix_to_media_type = {
    ".png": "image/png",
    ".jpg": "image/jpeg",
    ".jpeg": "image/jpeg",
    ".gif": "image/gif",
    ".webp": "image/webp",
}
media_type = suffix_to_media_type.get(image_path.suffix.lower(), "image/png")

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": media_type,
                        "data": image_data,
                    },
                },
                {
                    "type": "text",
                    "text": "Extract all text visible in this image.",
                },
            ],
        }
    ],
)

print(message.content[0].text)

A few things to note. The media_type must match the actual image format – Claude supports image/png, image/jpeg, image/gif, and image/webp. The base64 string should not include the data:image/png;base64, prefix you see in data URIs. Just the raw base64 payload.

Image size matters. Claude accepts images up to 20 MB before encoding. After base64 encoding (which increases size by about 33%), the payload can get large. If you are sending high-resolution photos, resize them first. Claude rescales images internally anyway – anything over roughly 1568 pixels on the long side gets downscaled.

Sending Images by URL

If your image is already hosted somewhere publicly accessible, skip the base64 encoding and pass a URL directly. This keeps your request payload small.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from anthropic import Anthropic

client = Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/4/47/PNG_transparency_demonstration_1.png/300px-PNG_transparency_demonstration_1.png",
                    },
                },
                {
                    "type": "text",
                    "text": "Describe the transparency effects in this image.",
                },
            ],
        }
    ],
)

print(message.content[0].text)

The URL must be publicly reachable. Claude’s servers fetch the image at request time, so authenticated URLs, signed S3 links that have expired, or localhost URLs will fail. If you need to send private images, use the base64 approach.

Multi-Image Analysis

You can send multiple images in a single message. This is useful for comparing two versions of a UI, analyzing a set of charts, or processing a batch of photos.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import base64
from pathlib import Path
from anthropic import Anthropic

client = Anthropic()

def encode_image(file_path: str) -> dict:
    """Encode a local image file into a Claude API image content block."""
    path = Path(file_path)
    data = base64.standard_b64encode(path.read_bytes()).decode("utf-8")
    suffix_map = {".png": "image/png", ".jpg": "image/jpeg", ".jpeg": "image/jpeg", ".webp": "image/webp"}
    media_type = suffix_map.get(path.suffix.lower(), "image/png")
    return {
        "type": "image",
        "source": {
            "type": "base64",
            "media_type": media_type,
            "data": data,
        },
    }

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                encode_image("dashboard_v1.png"),
                encode_image("dashboard_v2.png"),
                {
                    "type": "text",
                    "text": "Compare these two dashboard screenshots. List every visual difference you can find, including layout changes, color changes, and any added or removed elements.",
                },
            ],
        }
    ],
)

print(message.content[0].text)

You can send up to 20 images per request. Each image counts toward the token limit – a typical 1000x1000 image uses around 1600 tokens. Keep that in mind when you set max_tokens and when estimating costs.

You can mix URL-based and base64-based images in the same request. Just include both source types in the content array.

Structured Data Extraction from Images

The real power of vision APIs shows up when you extract structured data from images. Receipts, invoices, tables in screenshots, form fields – Claude can pull out the data and return it as JSON.

The trick is to ask for JSON explicitly in your prompt and then parse the response.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import json
from anthropic import Anthropic
import base64
from pathlib import Path

client = Anthropic()

image_data = base64.standard_b64encode(Path("receipt.jpg").read_bytes()).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": image_data,
                    },
                },
                {
                    "type": "text",
                    "text": """Extract all data from this receipt as JSON. Use this exact schema:

{
  "store_name": "string",
  "date": "YYYY-MM-DD",
  "items": [
    {"name": "string", "quantity": number, "price": number}
  ],
  "subtotal": number,
  "tax": number,
  "total": number
}

Return ONLY the JSON object, no other text.""",
                },
            ],
        }
    ],
)

raw_response = message.content[0].text

# Strip markdown code fences if Claude wraps the JSON
if raw_response.startswith("```"):
    raw_response = raw_response.split("\n", 1)[1].rsplit("```", 1)[0].strip()

receipt_data = json.loads(raw_response)

print(f"Store: {receipt_data['store_name']}")
print(f"Date: {receipt_data['date']}")
print(f"Total: ${receipt_data['total']:.2f}")
print(f"Items:")
for item in receipt_data["items"]:
    print(f"  - {item['name']}: {item['quantity']}x ${item['price']:.2f}")

A few tips for reliable structured extraction. First, always provide the exact JSON schema you want – Claude follows schemas well when you spell them out. Second, tell it to return only JSON with no surrounding text. Third, handle the case where Claude wraps the output in markdown code fences anyway, as shown above. Fourth, for critical applications, validate the parsed JSON against your schema with something like Pydantic.

Common Errors and Fixes

Image Too Large

1
Error: Invalid image size: Image exceeds maximum allowed size of 20 MB

The raw image file must be under 20 MB. Resize or compress before sending. For screenshots, PNG is often unnecessarily large – convert to JPEG first:

1
convert screenshot.png -quality 85 screenshot.jpg

Unsupported Media Type

1
Error: Invalid image media type: image/tiff. Supported types: image/jpeg, image/png, image/gif, image/webp

Claude only supports JPEG, PNG, GIF, and WebP. Convert TIFF, BMP, SVG, or other formats before sending. Also make sure the media_type field in your request matches the actual file format. If you send a JPEG but label it image/png, you will get unexpected behavior.

Rate Limiting

1
Error: 429 Rate limit exceeded. Please retry after 30 seconds.

The Anthropic API has rate limits based on your plan. When you hit them, back off and retry. The SDK has built-in retry logic, but for bulk image processing you should add explicit rate limiting:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import time
from anthropic import Anthropic

client = Anthropic()

image_paths = ["img1.png", "img2.png", "img3.png"]

for path in image_paths:
    try:
        message = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "image",
                            "source": {
                                "type": "base64",
                                "media_type": "image/png",
                                "data": __import__("base64").standard_b64encode(
                                    open(path, "rb").read()
                                ).decode("utf-8"),
                            },
                        },
                        {"type": "text", "text": "Describe this image briefly."},
                    ],
                }
            ],
        )
        print(f"{path}: {message.content[0].text[:100]}")
    except Exception as e:
        print(f"{path}: Error - {e}")
    time.sleep(1)  # Simple rate limiting: 1 second between requests

Invalid Base64 Data

1
Error: Could not process image: Invalid base64 encoding

This usually means you included the data:image/png;base64, prefix in the data field. Strip it. The data field should be the raw base64 string only. Also check that you are using standard_b64encode, not urlsafe_b64encode.

Sending Images as Base64#

Sending Images by URL#

Multi-Image Analysis#

Structured Data Extraction from Images#

Common Errors and Fixes#

Image Too Large#

Unsupported Media Type#

Rate Limiting#

Invalid Base64 Data#

Related Guides#

About the Author