Claude can now point to exact sentences when answering questions about your documents. The Anthropic Citations API bakes source attribution directly into the response format, so you get verifiable quotes instead of vague summaries. No prompt hacking required.

You pass documents into the API, ask a question, and the response comes back with citations arrays that reference specific character ranges or page numbers in your source material. The cited text doesn’t count toward output tokens, which makes this cheaper than asking Claude to quote sources in its own words.

Basic Citation Usage

Install the SDK and set your API key:

1
2
pip install anthropic
export ANTHROPIC_API_KEY="your-key-here"

Here’s the simplest possible citation request. You pass a plain text document with citations.enabled set to True, then ask a question about it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "text",
                        "media_type": "text/plain",
                        "data": (
                            "Transformers were introduced in the 2017 paper 'Attention Is All You Need' "
                            "by Vaswani et al. The architecture replaced recurrent layers entirely with "
                            "self-attention mechanisms. This reduced training time significantly compared "
                            "to LSTM-based models. The original Transformer used 6 encoder and 6 decoder "
                            "layers with 512-dimensional embeddings."
                        ),
                    },
                    "title": "Transformer Architecture Overview",
                    "citations": {"enabled": True},
                },
                {
                    "type": "text",
                    "text": "What architecture did Transformers replace, and what were the key dimensions?",
                },
            ],
        }
    ],
)

for block in response.content:
    if hasattr(block, "text"):
        print(block.text)
    if hasattr(block, "citations") and block.citations:
        for cite in block.citations:
            print(f"  [Source: {cite.document_title}, chars {cite.start_char_index}-{cite.end_char_index}]")
            print(f"  Cited: \"{cite.cited_text}\"")

The response comes back as a list of content blocks. Some blocks are plain text, others carry a citations array. Each citation includes the cited_text (the exact quote from your document), the document_index (which document it came from), and character position indices.

Parsing the Response Structure

The response format deserves a closer look because it’s different from standard Messages API responses. Instead of one big text string, you get interleaved blocks:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import anthropic
import json

client = anthropic.Anthropic()

source_text = (
    "Python 3.12 introduced several performance improvements. "
    "The new interpreter uses a more efficient dispatch mechanism. "
    "Comprehension scoping was also fixed, closing a long-standing bug. "
    "Type parameter syntax with PEP 695 simplified generic class definitions."
)

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "text",
                        "media_type": "text/plain",
                        "data": source_text,
                    },
                    "title": "Python 3.12 Release Notes",
                    "citations": {"enabled": True},
                },
                {
                    "type": "text",
                    "text": "What changed in the Python 3.12 interpreter?",
                },
            ],
        }
    ],
)

# Build a plain text answer with footnotes
full_text = ""
footnotes = []

for block in response.content:
    if block.type == "text":
        full_text += block.text
        if hasattr(block, "citations") and block.citations:
            for cite in block.citations:
                footnote_num = len(footnotes) + 1
                full_text += f"[{footnote_num}]"
                footnotes.append(cite.cited_text)

print(full_text)
print("\n--- Sources ---")
for i, note in enumerate(footnotes, 1):
    print(f"[{i}] \"{note}\"")

This pattern is useful when you need to render citations as footnotes in a web UI or a report.

Multiple Documents

You can pass several documents in a single request. Each citation’s document_index tells you which document it came from (0-indexed, in the order you provided them):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import anthropic

client = anthropic.Anthropic()

docs = [
    {
        "type": "document",
        "source": {
            "type": "text",
            "media_type": "text/plain",
            "data": (
                "GPT-4 was released by OpenAI in March 2023. "
                "It supports multimodal input including images. "
                "The model achieves human-level performance on many professional benchmarks."
            ),
        },
        "title": "GPT-4 Overview",
        "citations": {"enabled": True},
    },
    {
        "type": "document",
        "source": {
            "type": "text",
            "media_type": "text/plain",
            "data": (
                "Claude 3.5 Sonnet was released by Anthropic in June 2024. "
                "It outperformed GPT-4o on several coding and reasoning benchmarks. "
                "The model costs $3 per million input tokens and $15 per million output tokens."
            ),
        },
        "title": "Claude 3.5 Sonnet Overview",
        "citations": {"enabled": True},
    },
]

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                *docs,
                {
                    "type": "text",
                    "text": "Compare the release dates and capabilities of these models.",
                },
            ],
        }
    ],
)

for block in response.content:
    if block.type == "text":
        print(block.text, end="")
        if hasattr(block, "citations") and block.citations:
            sources = [f"[{c.document_title}]" for c in block.citations]
            print(f" {', '.join(sources)}", end="")
print()

This is the foundation of multi-source Q&A. You could feed in product docs, research papers, and internal wikis, then let citations track which answer came from where.

PDF Document Citations

PDFs work the same way, except you base64-encode the file and get page numbers instead of character indices:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import anthropic
import base64

client = anthropic.Anthropic()

with open("research_paper.pdf", "rb") as f:
    pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=2048,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data,
                    },
                    "title": "Research Paper",
                    "citations": {"enabled": True},
                },
                {
                    "type": "text",
                    "text": "What methodology did the authors use, and what were the key results?",
                },
            ],
        }
    ],
)

for block in response.content:
    if block.type == "text":
        print(block.text, end="")
        if hasattr(block, "citations") and block.citations:
            for cite in block.citations:
                # PDF citations use page_location type with page numbers
                print(f"\n  >> Page {cite.start_page_number}: \"{cite.cited_text[:80]}...\"")
print()

PDF citations return page_location objects with start_page_number (1-indexed) and end_page_number (exclusive). Scanned PDFs without extractable text won’t produce citations, so make sure your PDFs have a text layer.

Custom Content Documents for Fine-Grained Control

When automatic sentence chunking doesn’t match your data shape (bullet lists, dialogue transcripts, table rows), use custom content documents. You define the chunks yourself, and Claude cites by block index:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "content",
                        "content": [
                            {"type": "text", "text": "Q1 2025 revenue: $4.2B (+18% YoY)"},
                            {"type": "text", "text": "Q1 2025 operating margin: 22.5%"},
                            {"type": "text", "text": "Active users: 180M monthly"},
                            {"type": "text", "text": "Guidance: $4.5-4.7B for Q2 2025"},
                        ],
                    },
                    "title": "Earnings Summary",
                    "citations": {"enabled": True},
                },
                {
                    "type": "text",
                    "text": "What was the revenue growth and user count?",
                },
            ],
        }
    ],
)

for block in response.content:
    if block.type == "text":
        print(block.text, end="")
        if hasattr(block, "citations") and block.citations:
            for cite in block.citations:
                # Custom content citations use content_block_location
                print(f" [block {cite.start_block_index}]", end="")
print()

Each item in the content array becomes one citable unit. No further chunking happens. This gives you exact control over citation granularity.

Building a Cited Q&A System

Here’s a practical function that takes a list of text sources and a question, then returns a structured answer with citations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
import anthropic
from dataclasses import dataclass


@dataclass
class CitedAnswer:
    text: str
    citations: list[dict]


def ask_with_citations(
    question: str,
    sources: list[dict],
    model: str = "claude-sonnet-4-20250514",
) -> CitedAnswer:
    """Ask a question across multiple sources and get a cited answer.

    Args:
        question: The question to ask.
        sources: List of dicts with 'title' and 'content' keys.
        model: Model ID to use.

    Returns:
        CitedAnswer with full text and list of citation dicts.
    """
    client = anthropic.Anthropic()

    doc_blocks = [
        {
            "type": "document",
            "source": {
                "type": "text",
                "media_type": "text/plain",
                "data": src["content"],
            },
            "title": src["title"],
            "citations": {"enabled": True},
        }
        for src in sources
    ]

    response = client.messages.create(
        model=model,
        max_tokens=2048,
        messages=[
            {
                "role": "user",
                "content": [
                    *doc_blocks,
                    {"type": "text", "text": question},
                ],
            }
        ],
    )

    full_text = ""
    all_citations = []

    for block in response.content:
        if block.type == "text":
            full_text += block.text
            if hasattr(block, "citations") and block.citations:
                for cite in block.citations:
                    all_citations.append({
                        "cited_text": cite.cited_text,
                        "document_title": cite.document_title,
                        "document_index": cite.document_index,
                    })

    return CitedAnswer(text=full_text, citations=all_citations)


# Usage
answer = ask_with_citations(
    question="What are the key differences between these frameworks?",
    sources=[
        {
            "title": "FastAPI Docs",
            "content": (
                "FastAPI is a modern Python web framework built on Starlette and Pydantic. "
                "It generates OpenAPI docs automatically and supports async request handlers. "
                "Type hints drive both validation and documentation."
            ),
        },
        {
            "title": "Flask Docs",
            "content": (
                "Flask is a lightweight WSGI framework with a simple core. "
                "Extensions handle features like database access and authentication. "
                "It uses decorators for URL routing and supports Jinja2 templates natively."
            ),
        },
    ],
)

print(answer.text)
print(f"\n{len(answer.citations)} citations found:")
for c in answer.citations:
    print(f"  - [{c['document_title']}]: \"{c['cited_text'][:60]}...\"")

You can plug this into a RAG pipeline. Retrieve chunks from your vector store, pass them as sources, and let the citations API handle attribution. No need to prompt-engineer Claude into quoting things.

Using Prompt Caching with Citations

For repeated queries against the same documents, combine citations with prompt caching to avoid re-processing the source material every time:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import anthropic

client = anthropic.Anthropic()

# This document content will be cached after the first request
large_doc = "Your very long document content here... " * 500

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "text",
                        "media_type": "text/plain",
                        "data": large_doc,
                    },
                    "citations": {"enabled": True},
                    "cache_control": {"type": "ephemeral"},
                },
                {
                    "type": "text",
                    "text": "Summarize the key points with sources.",
                },
            ],
        }
    ],
)

The cache_control field goes on the document block itself. Subsequent requests with the same document hit the cache, which cuts input token costs. The citations still work normally.

Common Errors and Fixes

citations must be enabled on all or none of the documents

If you enable citations on one document, every document in the request must have citations.enabled set to True. You can’t mix cited and uncited documents in the same call.

400 error when using citations with structured outputs

Citations and structured outputs (the output_config.format parameter) are incompatible. Citations produce interleaved text and citation blocks, which can’t conform to a strict JSON schema. Pick one or the other. If you need structured output with citations, parse the citation response yourself into your desired format.

AttributeError: 'TextBlock' object has no attribute 'citations'

Not every text block in the response carries citations. Some blocks are just connective text like “According to the document,” with no source reference. Always check hasattr(block, "citations") and block.citations before accessing the list.

Empty citations on PDF documents

Scanned PDFs without an embedded text layer won’t produce citations. The API extracts text from the PDF and chunks it. If there’s no extractable text, there’s nothing to cite. Run your PDFs through OCR first if they’re image-based scans.

cited_text not matching your original document exactly

Character indices are 0-indexed with exclusive end indices. If you’re trying to verify citations by slicing your original string, use source_text[start_char_index:end_char_index]. The end index is exclusive, just like Python slice notation.

High input token count with citations enabled

Enabling citations adds a small overhead from system prompt additions and document chunking metadata. For large documents you’re querying repeatedly, use prompt caching to offset the cost. The cited_text in responses doesn’t count toward output tokens, so the output side is actually cheaper than prompt-based quoting.