The Perplexity API gives you LLM responses grounded in live web search. You send a question, and the Sonar models search the web, synthesize results, and return an answer with citations. The best part: the API is OpenAI-compatible, so you can use the openai Python SDK with a base URL swap.
Install and Configure
Install the OpenAI Python SDK if you don’t have it already:
| |
Grab your API key from perplexity.ai/settings/api. Set it as an environment variable:
| |
Now point the OpenAI client at Perplexity’s endpoint:
| |
That’s it. Every call through this client hits Perplexity’s servers instead of OpenAI’s.
Make a Search-Grounded Query
The Sonar models automatically search the web before generating a response. You don’t need to configure anything special – just send a message like you would with any chat completions API:
| |
The response text will include inline citation markers like [1], [2], etc. These map to URLs in the response metadata.
Extract Citations from Responses
Every Sonar response includes a citations field with the source URLs. The OpenAI SDK stores this in the raw response data. Here’s how to access it:
| |
If you need the raw JSON (including search_results with titles and snippets), use the HTTP client directly or parse the raw response:
| |
Sonar Models: Which One to Pick
Perplexity offers several Sonar model tiers:
| Model | Best For | Input Cost | Output Cost |
|---|---|---|---|
sonar | Fast Q&A, lightweight search | $1/M tokens | $1/M tokens |
sonar-pro | Multi-step research, complex queries | $3/M tokens | $15/M tokens |
sonar-reasoning-pro | Reasoning-heavy tasks with search | $2/M tokens | $8/M tokens |
sonar-deep-research | In-depth research sessions | $2/M tokens | $8/M tokens |
Per-request search fees also apply on top of token costs (starting at $5 per 1,000 requests at the default context size).
Use sonar for quick lookups where speed matters. Use sonar-pro when you need deeper research with more citations – it returns roughly double the number of sources compared to sonar.
You can also disable web search entirely by passing disable_search: true in the request body. This turns Sonar into a standard chat model without grounding:
| |
Filter Search by Domain and Recency
You can restrict which sites Sonar searches and how recent the results should be:
| |
The search_recency_filter accepts hour, day, week, month, or year. The search_domain_filter takes a list of domains to restrict results to. You can also prefix a domain with - to exclude it (e.g., ["-reddit.com"]).
For academic research specifically, set the search_mode to academic:
| |
Build a Research Assistant
Here’s a practical example: a research assistant that takes a topic, searches the web, and produces a structured summary with sources.
| |
Handle Streaming Responses
For longer research queries, streaming gives your users faster time-to-first-token. The pattern is identical to OpenAI streaming:
| |
One caveat with streaming: citations arrive in the final chunk, not inline during streaming. If you need citations from a streamed response, you’ll need to parse the raw SSE events or make a follow-up non-streaming call.
Common Errors and Fixes
401 Unauthorized
Your API key is missing or invalid. Double-check that PERPLEXITY_API_KEY is set and starts with pplx-. Perplexity keys look like pplx-abc123....
429 Too Many Requests
You’ve hit the rate limit. The basic tier allows 50 requests/minute for sonar and sonar-pro, but only 5/minute for sonar-deep-research. Add retry logic with exponential backoff:
| |
Model not found
The old model names like llama-3.1-sonar-small-128k-online are deprecated. Use the current names: sonar, sonar-pro, sonar-reasoning-pro, or sonar-deep-research.
Empty citations
If citations is None or empty, you likely passed disable_search: True or the query didn’t trigger a web search. Make sure you’re using an online-capable model and haven’t disabled search.
Timeout on deep research
sonar-deep-research can take 30+ seconds. Increase your client timeout:
| |
extra_body parameters ignored
If search_domain_filter or search_recency_filter seem to have no effect, make sure you’re passing them inside extra_body, not as top-level keyword arguments. The OpenAI SDK doesn’t know about Perplexity-specific parameters, so they must go through extra_body.
Related Guides
- How to Use the OpenRouter API for Multi-Provider LLM Access
- How to Use the xAI Grok API for Chat and Function Calling
- How to Use the Fireworks AI API for Fast Open-Source LLMs
- How to Use the Together AI API for Open-Source LLMs
- How to Use the AWS Bedrock Converse API for Multi-Model Chat
- How to Use the OpenAI Realtime API for Voice Applications
- How to Use the Cerebras API for Fast LLM Inference
- How to Use the Anthropic Token Counting API for Cost Estimation
- How to Use the Anthropic Claude Files API for Large Document Processing
- How to Use the Anthropic Token Streaming API for Real-Time UIs