DeepSeek gives you two strong models through a single OpenAI-compatible API: deepseek-chat (the V3 line, fast general-purpose and code generation) and deepseek-reasoner (the R1 line, chain-of-thought reasoning for math, logic, and hard problems). Because the API follows the OpenAI spec, you use the standard OpenAI Python SDK and just swap the base URL. Here’s the fastest way to get started:
| |
| |
Sign up at platform.deepseek.com to grab your API key. New accounts get 5 million free tokens, no credit card required.
Generate Code with DeepSeek-V3
deepseek-chat is what you want for everyday code generation. It’s fast, cheap, and handles most programming tasks well. Here’s a more practical example: ask it to write a function, then ask it to write tests for that function in the same conversation.
| |
Multi-turn conversations work exactly like OpenAI’s API. You append the assistant’s previous response, then add your follow-up. The model keeps full context of the conversation.
Solve Reasoning Problems with DeepSeek-R1
Switch to deepseek-reasoner when you need the model to think through a problem step by step. R1 generates an internal chain-of-thought before answering, which makes it much better at math, logic puzzles, and multi-step debugging.
The key difference from deepseek-chat: the response includes a reasoning_content field that shows you the model’s thinking process.
| |
A few things to know about deepseek-reasoner:
- Max output is 64K tokens (including the chain-of-thought), compared to 8K for
deepseek-chat. - Temperature and top_p have no effect. The model controls its own sampling during reasoning.
- Multi-turn conversations drop the reasoning. When you build a conversation history, only include the
contentfield from the assistant’s response. If you passreasoning_contentback in the messages array, the API returns a 400 error. - No function calling or tool use. R1 is purely for text-in, text-out reasoning.
Stream Responses
Both models support streaming. For deepseek-reasoner, the stream sends reasoning chunks first, then content chunks. You can tell them apart by checking which field is populated on each delta.
| |
For deepseek-chat, streaming works identically to OpenAI’s API – there’s no reasoning_content field, just delta.content on each chunk.
When to Use R1 vs V3
Pick the right model and you save both time and money.
Use deepseek-chat (V3) when you need:
- Code generation, refactoring, or translation between languages
- Summarization, rewriting, or general text tasks
- Fast responses where latency matters
- High-volume batch processing on a budget
Use deepseek-reasoner (R1) when you need:
- Multi-step math or logic problems
- Debugging code where the bug isn’t obvious
- Planning and architectural decisions that require weighing tradeoffs
- Any task where “think step by step” actually helps
Pricing is the same for both models right now: $0.28 per million input tokens (cache miss), $0.028 per million on cache hits, and $0.42 per million output tokens. That’s roughly 10x cheaper than GPT-4o on input and significantly cheaper on output. The cache hit discount is automatic – DeepSeek caches common prompt prefixes, so repeated calls with the same system prompt get the lower rate without you doing anything.
Context window is 128K tokens for both models. The main tradeoff is speed: R1 is slower because it generates the full reasoning chain before answering, so expect higher latency on complex prompts.
Common Errors and Fixes
AuthenticationError: Incorrect API key
Double-check your key at platform.deepseek.com/api_keys. Keys start with sk-. Make sure you’re reading from the right environment variable.
400 Bad Request on multi-turn with R1
You’re probably passing reasoning_content back in the messages. Strip it out – only include the content field from the assistant’s previous turn:
| |
model_not_found error
The model names are deepseek-chat and deepseek-reasoner. Not deepseek-v3, not deepseek-r1. Those are the marketing names, not the API identifiers.
Timeout on long R1 responses R1 can generate up to 64K tokens of reasoning plus answer. Set a longer timeout on your client:
| |
Rate limiting (429 errors)
DeepSeek’s rate limits are generous but not unlimited. Add retry logic with exponential backoff, or use the max_retries parameter built into the OpenAI SDK:
| |
Related Guides
- How to Use the Anthropic Extended Thinking API for Complex Reasoning
- How to Use the Anthropic Message Batches API for Async Workloads
- How to Use the Anthropic Batch API for High-Volume Processing
- How to Use the xAI Grok API for Chat and Function Calling
- How to Use the Anthropic Token Efficient Tool Use API
- How to Use the Together AI API for Open-Source LLMs
- How to Use the OpenRouter API for Multi-Provider LLM Access
- How to Use the Anthropic Token Counting API for Cost Estimation
- How to Use the Anthropic Token Streaming API for Real-Time UIs
- How to Use the Anthropic Claude Files API for Large Document Processing