The Perplexity API gives you LLM responses grounded in live web search. You send a question, and the Sonar models search the web, synthesize results, and return an answer with citations. The best part: the API is OpenAI-compatible, so you can use the openai Python SDK with a base URL swap.

Install and Configure

Install the OpenAI Python SDK if you don’t have it already:

1
pip install openai

Grab your API key from perplexity.ai/settings/api. Set it as an environment variable:

1
export PERPLEXITY_API_KEY="pplx-xxxxxxxxxxxxxxxxxxxx"

Now point the OpenAI client at Perplexity’s endpoint:

1
2
3
4
5
6
7
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["PERPLEXITY_API_KEY"],
    base_url="https://api.perplexity.ai",
)

That’s it. Every call through this client hits Perplexity’s servers instead of OpenAI’s.

Make a Search-Grounded Query

The Sonar models automatically search the web before generating a response. You don’t need to configure anything special – just send a message like you would with any chat completions API:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
response = client.chat.completions.create(
    model="sonar",
    messages=[
        {
            "role": "system",
            "content": "Be precise and concise. Cite your sources.",
        },
        {
            "role": "user",
            "content": "What are the main differences between GPT-4o and Claude 3.5 Sonnet?",
        },
    ],
)

print(response.choices[0].message.content)

The response text will include inline citation markers like [1], [2], etc. These map to URLs in the response metadata.

Extract Citations from Responses

Every Sonar response includes a citations field with the source URLs. The OpenAI SDK stores this in the raw response data. Here’s how to access it:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
response = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {"role": "user", "content": "What is the current state of quantum computing?"}
    ],
)

answer = response.choices[0].message.content
print(answer)

# Citations are available as a top-level field on the response
citations = getattr(response, "citations", None)
if citations:
    print("\nSources:")
    for i, url in enumerate(citations, 1):
        print(f"  [{i}] {url}")

If you need the raw JSON (including search_results with titles and snippets), use the HTTP client directly or parse the raw response:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import os
import httpx

headers = {
    "Authorization": f"Bearer {os.environ['PERPLEXITY_API_KEY']}",
    "Content-Type": "application/json",
}

payload = {
    "model": "sonar-pro",
    "messages": [
        {"role": "user", "content": "Latest breakthroughs in fusion energy"}
    ],
}

resp = httpx.post(
    "https://api.perplexity.ai/chat/completions",
    headers=headers,
    json=payload,
)
data = resp.json()

# Full citation URLs
for i, url in enumerate(data.get("citations", []), 1):
    print(f"[{i}] {url}")

# Detailed search results (title, URL, snippet)
for result in data.get("search_results", []):
    print(f"- {result['title']}: {result['url']}")

Sonar Models: Which One to Pick

Perplexity offers several Sonar model tiers:

ModelBest ForInput CostOutput Cost
sonarFast Q&A, lightweight search$1/M tokens$1/M tokens
sonar-proMulti-step research, complex queries$3/M tokens$15/M tokens
sonar-reasoning-proReasoning-heavy tasks with search$2/M tokens$8/M tokens
sonar-deep-researchIn-depth research sessions$2/M tokens$8/M tokens

Per-request search fees also apply on top of token costs (starting at $5 per 1,000 requests at the default context size).

Use sonar for quick lookups where speed matters. Use sonar-pro when you need deeper research with more citations – it returns roughly double the number of sources compared to sonar.

You can also disable web search entirely by passing disable_search: true in the request body. This turns Sonar into a standard chat model without grounding:

1
2
3
4
5
6
7
8
# No web search -- pure LLM response
response = client.chat.completions.create(
    model="sonar",
    messages=[
        {"role": "user", "content": "Explain the transformer architecture"}
    ],
    extra_body={"disable_search": True},
)

Filter Search by Domain and Recency

You can restrict which sites Sonar searches and how recent the results should be:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
response = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {"role": "user", "content": "Recent papers on LLM alignment techniques"}
    ],
    extra_body={
        "search_domain_filter": ["arxiv.org", "openreview.net"],
        "search_recency_filter": "month",
    },
)

print(response.choices[0].message.content)

The search_recency_filter accepts hour, day, week, month, or year. The search_domain_filter takes a list of domains to restrict results to. You can also prefix a domain with - to exclude it (e.g., ["-reddit.com"]).

For academic research specifically, set the search_mode to academic:

1
2
3
4
5
6
7
response = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {"role": "user", "content": "State of the art in protein structure prediction"}
    ],
    extra_body={"search_mode": "academic"},
)

Build a Research Assistant

Here’s a practical example: a research assistant that takes a topic, searches the web, and produces a structured summary with sources.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["PERPLEXITY_API_KEY"],
    base_url="https://api.perplexity.ai",
)

def research(topic: str, domains: list[str] | None = None) -> dict:
    """Search the web and return a structured research summary."""
    system_prompt = (
        "You are a research assistant. For every query:\n"
        "1. Search for the most recent and authoritative sources.\n"
        "2. Summarize findings in 3-5 bullet points.\n"
        "3. Note any conflicting information between sources.\n"
        "4. End with a one-sentence overall assessment."
    )

    extra = {}
    if domains:
        extra["search_domain_filter"] = domains
    extra["search_recency_filter"] = "week"

    response = client.chat.completions.create(
        model="sonar-pro",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": f"Research this topic: {topic}"},
        ],
        extra_body=extra,
    )

    citations = getattr(response, "citations", []) or []

    return {
        "summary": response.choices[0].message.content,
        "citations": citations,
        "model": response.model,
        "tokens_used": response.usage.total_tokens,
    }


result = research("open-source alternatives to GPT-4 in 2026")
print(result["summary"])
print(f"\nSources ({len(result['citations'])}):")
for i, url in enumerate(result["citations"], 1):
    print(f"  [{i}] {url}")
print(f"\nTokens used: {result['tokens_used']}")

Handle Streaming Responses

For longer research queries, streaming gives your users faster time-to-first-token. The pattern is identical to OpenAI streaming:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
stream = client.chat.completions.create(
    model="sonar-pro",
    messages=[
        {"role": "user", "content": "Compare the top 5 vector databases in 2026"}
    ],
    stream=True,
)

full_response = ""
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)
        full_response += delta

print()  # newline after streaming completes

One caveat with streaming: citations arrive in the final chunk, not inline during streaming. If you need citations from a streamed response, you’ll need to parse the raw SSE events or make a follow-up non-streaming call.

Common Errors and Fixes

401 Unauthorized Your API key is missing or invalid. Double-check that PERPLEXITY_API_KEY is set and starts with pplx-. Perplexity keys look like pplx-abc123....

429 Too Many Requests You’ve hit the rate limit. The basic tier allows 50 requests/minute for sonar and sonar-pro, but only 5/minute for sonar-deep-research. Add retry logic with exponential backoff:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import time
from openai import RateLimitError

max_retries = 3
for attempt in range(max_retries):
    try:
        response = client.chat.completions.create(
            model="sonar",
            messages=[{"role": "user", "content": "your query here"}],
        )
        break
    except RateLimitError:
        wait = 2 ** attempt
        print(f"Rate limited. Retrying in {wait}s...")
        time.sleep(wait)

Model not found The old model names like llama-3.1-sonar-small-128k-online are deprecated. Use the current names: sonar, sonar-pro, sonar-reasoning-pro, or sonar-deep-research.

Empty citations If citations is None or empty, you likely passed disable_search: True or the query didn’t trigger a web search. Make sure you’re using an online-capable model and haven’t disabled search.

Timeout on deep research sonar-deep-research can take 30+ seconds. Increase your client timeout:

1
2
3
4
5
client = OpenAI(
    api_key=os.environ["PERPLEXITY_API_KEY"],
    base_url="https://api.perplexity.ai",
    timeout=120.0,
)

extra_body parameters ignored If search_domain_filter or search_recency_filter seem to have no effect, make sure you’re passing them inside extra_body, not as top-level keyword arguments. The OpenAI SDK doesn’t know about Perplexity-specific parameters, so they must go through extra_body.