How to Use the Mistral API for Code Generation and Chat

Install the SDK and Set Your Key

1
pip install mistralai

Grab your API key from console.mistral.ai and export it:

1
export MISTRAL_API_KEY="your-key-here"

The SDK reads MISTRAL_API_KEY from your environment automatically. You can also pass it directly to the client constructor if you prefer explicit configuration.

Chat Completions

The simplest call you can make. This is the bread and butter of the API.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from mistralai import Mistral

client = Mistral(api_key=None)  # reads MISTRAL_API_KEY from env

response = client.chat.complete(
    model="mistral-large-latest",
    messages=[
        {"role": "user", "content": "Explain Python generators in 3 sentences."}
    ],
)

print(response.choices[0].message.content)

The model parameter takes a model alias like mistral-large-latest or a specific version like mistral-large-2407. Stick with the -latest aliases unless you need reproducibility.

Adding System Prompts

System prompts go as the first message with role: "system". Use them to set tone, constraints, and output format.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from mistralai import Mistral

client = Mistral()

response = client.chat.complete(
    model="mistral-large-latest",
    messages=[
        {"role": "system", "content": "You are a senior backend engineer. Reply with code only. Use type hints."},
        {"role": "user", "content": "Write a retry decorator with exponential backoff."}
    ],
    temperature=0.1,
)

print(response.choices[0].message.content)

Lower temperature values (0.0-0.3) produce more deterministic outputs. For creative tasks, bump it to 0.7-0.9.

Code Generation with Codestral

Codestral is Mistral’s code-specific model. It beats the general-purpose models on code generation tasks and runs faster for code completions. Use it whenever you need to generate, review, or refactor code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from mistralai import Mistral

client = Mistral()

response = client.chat.complete(
    model="codestral-latest",
    messages=[
        {"role": "system", "content": "Generate clean, production-ready Python. Include docstrings."},
        {"role": "user", "content": "Write an async HTTP client that retries failed requests with backoff."}
    ],
    temperature=0.1,
    max_tokens=2048,
)

print(response.choices[0].message.content)

Codestral also supports fill-in-the-middle (FIM) completions, which is what IDE integrations like Continue and VS Code use. FIM lets you provide a prefix and suffix, and the model fills the gap:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from mistralai import Mistral

client = Mistral()

response = client.fim.complete(
    model="codestral-latest",
    prompt="def fibonacci(n: int) -> int:\n    ",
    suffix="\n\nprint(fibonacci(10))",
)

print(response.choices[0].message.content)

FIM is great for autocomplete-style features. The model sees both what comes before and after the cursor.

Streaming Responses

For anything user-facing, stream the response. Nobody wants to stare at a blank screen waiting for 2000 tokens to generate.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from mistralai import Mistral

client = Mistral()

stream = client.chat.stream(
    model="mistral-large-latest",
    messages=[
        {"role": "user", "content": "Write a guide to Python asyncio patterns."}
    ],
)

for chunk in stream:
    content = chunk.data.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

print()

Each chunk arrives as soon as the model generates it. The delta.content field contains the new text fragment.

Function Calling

Function calling lets the model decide when to call your tools and what arguments to pass. Define your tools as JSON schemas, and Mistral figures out the rest.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import json
from mistralai import Mistral

client = Mistral()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit",
                    },
                },
                "required": ["city"],
            },
        },
    }
]

response = client.chat.complete(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "What's the weather in Berlin?"}],
    tools=tools,
    tool_choice="auto",
)

# Check if the model wants to call a function
message = response.choices[0].message
if message.tool_calls:
    tool_call = message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {args}")
    # -> Function: get_weather
    # -> Arguments: {'city': 'Berlin', 'unit': 'celsius'}

    # After executing the function, send the result back:
    follow_up = client.chat.complete(
        model="mistral-large-latest",
        messages=[
            {"role": "user", "content": "What's the weather in Berlin?"},
            message,
            {
                "role": "tool",
                "name": "get_weather",
                "content": json.dumps({"temp": 8, "condition": "cloudy"}),
                "tool_call_id": tool_call.id,
            },
        ],
        tools=tools,
    )
    print(follow_up.choices[0].message.content)

Set tool_choice="auto" to let the model decide whether to call a function. Use tool_choice="any" to force a tool call, or tool_choice="none" to prevent them.

Choosing the Right Model

Mistral offers several models, and picking the right one matters for both cost and quality.

Mistral Large (mistral-large-latest) is your best general-purpose model. It handles complex reasoning, multi-step instructions, and nuanced tasks well. Use it when quality matters more than speed. This is the model I reach for first.

Mistral Small (mistral-small-latest) is fast and cheap. It handles straightforward tasks like classification, extraction, and simple Q&A. Use it for high-volume workloads where you need to keep costs down. It’s significantly cheaper per token than Large.

Codestral (codestral-latest) is purpose-built for code. It outperforms Mistral Large on coding benchmarks and supports FIM completions. If your task is code generation, completion, or review, always pick Codestral over the general-purpose models.

Mistral Medium has been deprecated. If you were using it, migrate to Mistral Small for cost savings or Mistral Large for quality.

My recommendation: start with Mistral Large for prototyping, then drop down to Small once you have evals that confirm it works for your use case. For any code task, go straight to Codestral.

Common Errors

MistralAPIStatusException: 401 Unauthorized – Your API key is missing or invalid. Verify it’s exported correctly with echo $MISTRAL_API_KEY. If you just created the key, wait a minute for it to propagate.

MistralAPIStatusException: 429 Too Many Requests – You’ve hit the rate limit. The SDK does not auto-retry by default. Wrap your calls in a retry loop with exponential backoff, or use a library like tenacity:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from tenacity import retry, wait_exponential, stop_after_attempt
from mistralai import Mistral

client = Mistral()

@retry(wait=wait_exponential(min=1, max=60), stop=stop_after_attempt(5))
def call_mistral(prompt: str) -> str:
    response = client.chat.complete(
        model="mistral-large-latest",
        messages=[{"role": "user", "content": prompt}],
    )
    return response.choices[0].message.content

print(call_mistral("What is 2 + 2?"))

MistralAPIStatusException: 400 with tool calls – This usually means the tool_call_id in your tool response message doesn’t match the one the model returned. Always use tool_call.id from the original response, don’t generate your own.

AttributeError: 'NoneType' object has no attribute 'content' – The model returned an empty response. This happens when max_tokens is too low for the task, or when the model decides to call a tool instead of responding with text. Check message.tool_calls before accessing message.content.

Codestral FIM returns garbage – Make sure you’re using the fim.complete method, not chat.complete. FIM uses a different endpoint and prompt format. Also verify your prompt ends where you want the completion to start.

Install the SDK and Set Your Key#

Chat Completions#

Adding System Prompts#

Code Generation with Codestral#

Streaming Responses#

Function Calling#

Choosing the Right Model#

Common Errors#

Related Guides#

About the Author