DeepSeek gives you two strong models through a single OpenAI-compatible API: deepseek-chat (the V3 line, fast general-purpose and code generation) and deepseek-reasoner (the R1 line, chain-of-thought reasoning for math, logic, and hard problems). Because the API follows the OpenAI spec, you use the standard OpenAI Python SDK and just swap the base URL. Here’s the fastest way to get started:

1
pip install openai
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Write a Python function that merges two sorted lists."}],
)
print(response.choices[0].message.content)

Sign up at platform.deepseek.com to grab your API key. New accounts get 5 million free tokens, no credit card required.

Generate Code with DeepSeek-V3

deepseek-chat is what you want for everyday code generation. It’s fast, cheap, and handles most programming tasks well. Here’s a more practical example: ask it to write a function, then ask it to write tests for that function in the same conversation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

messages = [
    {
        "role": "user",
        "content": (
            "Write a Python function called `parse_duration` that takes a string "
            "like '2h30m15s' and returns the total number of seconds as an integer. "
            "Handle hours (h), minutes (m), and seconds (s). Raise ValueError for invalid input."
        ),
    }
]

# Step 1: Generate the function
response = client.chat.completions.create(model="deepseek-chat", messages=messages)
function_code = response.choices[0].message.content
print(function_code)

# Step 2: Ask for tests in the same conversation
messages.append({"role": "assistant", "content": function_code})
messages.append({
    "role": "user",
    "content": "Now write pytest tests for this function. Cover normal cases, edge cases, and invalid input.",
})

response = client.chat.completions.create(model="deepseek-chat", messages=messages)
test_code = response.choices[0].message.content
print(test_code)

Multi-turn conversations work exactly like OpenAI’s API. You append the assistant’s previous response, then add your follow-up. The model keeps full context of the conversation.

Solve Reasoning Problems with DeepSeek-R1

Switch to deepseek-reasoner when you need the model to think through a problem step by step. R1 generates an internal chain-of-thought before answering, which makes it much better at math, logic puzzles, and multi-step debugging.

The key difference from deepseek-chat: the response includes a reasoning_content field that shows you the model’s thinking process.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[
        {
            "role": "user",
            "content": (
                "A server processes requests in batches. Each batch takes 200ms base time "
                "plus 15ms per request. The system queues incoming requests and starts a new "
                "batch every 500ms. If 120 requests arrive uniformly over 2 seconds, what is "
                "the average latency per request? Show your reasoning."
            ),
        }
    ],
)

# The chain-of-thought reasoning (R1's internal thinking)
reasoning = response.choices[0].message.reasoning_content
print("=== Reasoning ===")
print(reasoning)

# The final answer
answer = response.choices[0].message.content
print("\n=== Answer ===")
print(answer)

A few things to know about deepseek-reasoner:

  • Max output is 64K tokens (including the chain-of-thought), compared to 8K for deepseek-chat.
  • Temperature and top_p have no effect. The model controls its own sampling during reasoning.
  • Multi-turn conversations drop the reasoning. When you build a conversation history, only include the content field from the assistant’s response. If you pass reasoning_content back in the messages array, the API returns a 400 error.
  • No function calling or tool use. R1 is purely for text-in, text-out reasoning.

Stream Responses

Both models support streaming. For deepseek-reasoner, the stream sends reasoning chunks first, then content chunks. You can tell them apart by checking which field is populated on each delta.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

stream = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[{"role": "user", "content": "Prove that the square root of 2 is irrational."}],
    stream=True,
)

reasoning_content = ""
content = ""

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.reasoning_content:
        reasoning_content += delta.reasoning_content
        print(delta.reasoning_content, end="", flush=True)
    elif delta.content:
        if not content:
            print("\n\n--- Final Answer ---\n")
        content += delta.content
        print(delta.content, end="", flush=True)

print()

For deepseek-chat, streaming works identically to OpenAI’s API – there’s no reasoning_content field, just delta.content on each chunk.

When to Use R1 vs V3

Pick the right model and you save both time and money.

Use deepseek-chat (V3) when you need:

  • Code generation, refactoring, or translation between languages
  • Summarization, rewriting, or general text tasks
  • Fast responses where latency matters
  • High-volume batch processing on a budget

Use deepseek-reasoner (R1) when you need:

  • Multi-step math or logic problems
  • Debugging code where the bug isn’t obvious
  • Planning and architectural decisions that require weighing tradeoffs
  • Any task where “think step by step” actually helps

Pricing is the same for both models right now: $0.28 per million input tokens (cache miss), $0.028 per million on cache hits, and $0.42 per million output tokens. That’s roughly 10x cheaper than GPT-4o on input and significantly cheaper on output. The cache hit discount is automatic – DeepSeek caches common prompt prefixes, so repeated calls with the same system prompt get the lower rate without you doing anything.

Context window is 128K tokens for both models. The main tradeoff is speed: R1 is slower because it generates the full reasoning chain before answering, so expect higher latency on complex prompts.

Common Errors and Fixes

AuthenticationError: Incorrect API key Double-check your key at platform.deepseek.com/api_keys. Keys start with sk-. Make sure you’re reading from the right environment variable.

400 Bad Request on multi-turn with R1 You’re probably passing reasoning_content back in the messages. Strip it out – only include the content field from the assistant’s previous turn:

1
2
3
4
5
# Wrong: passing the full response object fields
messages.append({"role": "assistant", "content": answer, "reasoning_content": reasoning})

# Right: only pass content
messages.append({"role": "assistant", "content": answer})

model_not_found error The model names are deepseek-chat and deepseek-reasoner. Not deepseek-v3, not deepseek-r1. Those are the marketing names, not the API identifiers.

Timeout on long R1 responses R1 can generate up to 64K tokens of reasoning plus answer. Set a longer timeout on your client:

1
2
3
4
5
client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
    timeout=300.0,  # 5 minutes
)

Rate limiting (429 errors) DeepSeek’s rate limits are generous but not unlimited. Add retry logic with exponential backoff, or use the max_retries parameter built into the OpenAI SDK:

1
2
3
4
5
client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
    max_retries=3,
)