The xAI Grok API is OpenAI-compatible. That means you can use the OpenAI Python SDK you already know, point it at https://api.x.ai/v1, and talk to Grok models without learning a new client library. Same request format, same response objects, same tools parameter for function calling.

Here’s what you need to get started: an xAI API key from console.x.ai and the OpenAI Python package.

1
pip install openai

Set your API key as an environment variable:

1
export XAI_API_KEY="xai-your-key-here"

Basic Chat Completion

Point the OpenAI client at xAI’s endpoint and make a standard chat completion call. The grok-3-latest model gives you Grok 3 with a 131K token context window.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

response = client.chat.completions.create(
    model="grok-3-latest",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain gradient checkpointing in two sentences."},
    ],
)

print(response.choices[0].message.content)

That’s it. The response object is identical to what you’d get from OpenAI. You get response.choices[0].message.content for the text, response.usage for token counts, and everything else you’d expect.

Streaming Responses

For long responses, streaming prints tokens as they arrive instead of waiting for the full completion. Pass stream=True and iterate over chunks.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

stream = client.chat.completions.create(
    model="grok-3-latest",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Write a Python function to compute cosine similarity between two vectors."},
    ],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end="", flush=True)
print()

Each chunk has chunk.choices[0].delta.content. It’s None on the final chunk, so always check before printing. The flush=True makes sure tokens appear immediately in the terminal.

Multi-Turn Conversations

Grok handles multi-turn conversations the same way as OpenAI. Keep appending messages to your list and send the full history with each request.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

messages = [
    {"role": "system", "content": "You are a senior Python developer."},
    {"role": "user", "content": "What's the difference between a list and a tuple?"},
]

response = client.chat.completions.create(
    model="grok-3-latest",
    messages=messages,
)

assistant_reply = response.choices[0].message.content
print(assistant_reply)

# Add the assistant's reply and a follow-up question
messages.append({"role": "assistant", "content": assistant_reply})
messages.append({"role": "user", "content": "When should I use a named tuple instead?"})

response = client.chat.completions.create(
    model="grok-3-latest",
    messages=messages,
)

print(response.choices[0].message.content)

The model sees the entire conversation each time. Keep the history growing and trim older messages when you approach the context limit.

Function Calling with Tools

Function calling lets Grok request data from your code mid-conversation. You define tools with JSON Schema, the model decides when to call them, and you execute the function locally and return the result.

Here’s a complete weather tool example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
import os
import json
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
)

# Define the tool
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. San Francisco",
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit",
                    },
                },
                "required": ["city"],
            },
        },
    }
]


# Simulated weather lookup (replace with a real API in production)
def get_weather(city: str, unit: str = "fahrenheit") -> dict:
    weather_data = {
        "San Francisco": {"temp": 59, "condition": "foggy"},
        "New York": {"temp": 35, "condition": "cloudy"},
        "Austin": {"temp": 72, "condition": "sunny"},
    }
    data = weather_data.get(city, {"temp": 65, "condition": "unknown"})
    if unit == "celsius":
        data["temp"] = round((data["temp"] - 32) * 5 / 9)
    return {"city": city, "temperature": data["temp"], "unit": unit, "condition": data["condition"]}


messages = [
    {"role": "user", "content": "What's the weather like in San Francisco and Austin?"},
]

# Step 1: Send the request with tools
response = client.chat.completions.create(
    model="grok-3-latest",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)

assistant_message = response.choices[0].message

# Step 2: Check if the model wants to call functions
if assistant_message.tool_calls:
    messages.append(assistant_message)

    for tool_call in assistant_message.tool_calls:
        function_name = tool_call.function.name
        arguments = json.loads(tool_call.function.arguments)

        # Execute the function
        result = get_weather(**arguments)

        # Append the tool result
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(result),
        })

    # Step 3: Send the results back to get a natural language response
    final_response = client.chat.completions.create(
        model="grok-3-latest",
        messages=messages,
        tools=tools,
    )

    print(final_response.choices[0].message.content)
else:
    print(assistant_message.content)

The flow is three steps. First, send the user message with the tools parameter. Second, if the response contains tool_calls, execute each function and append the results as role: "tool" messages. Third, send the updated messages back so the model can generate a final answer using the tool results.

Grok supports parallel function calling by default. When you ask about multiple cities, it may return multiple tool_calls in a single response. The loop handles that automatically.

Choosing the Right Model

xAI offers several Grok variants at different price points:

ModelContextInput PriceOutput PriceBest For
grok-3-latest131K$3.00/M$15.00/MGeneral tasks, high quality
grok-3-mini-latest131K$0.30/M$0.50/MFast, cheaper tasks

The grok-3-mini-latest model is roughly 10x cheaper than grok-3-latest and works well for straightforward tasks. Use the full grok-3-latest when you need stronger reasoning or complex function calling chains.

Common Errors and Fixes

openai.AuthenticationError: Error code: 401

Your API key is missing or invalid. Double-check you exported it correctly:

1
2
echo $XAI_API_KEY
# Should print your key, not empty

openai.NotFoundError: Error code: 404

Wrong model name. The xAI API is strict about model IDs. Use exactly grok-3-latest or grok-3-mini-latest. Common mistakes include grok3, grok-3, or grok-latest.

openai.APITimeoutError

Grok can take longer on complex prompts. Increase the timeout:

1
2
3
4
5
6
7
8
import httpx
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XAI_API_KEY"],
    base_url="https://api.x.ai/v1",
    timeout=httpx.Timeout(120.0),
)

Tool calls return empty or malformed JSON

If json.loads(tool_call.function.arguments) fails, the model generated invalid JSON. This is rare but happens with vague tool descriptions. Fix it by making your description and parameters fields as specific as possible. Adding enum constraints and explicit descriptions for each property helps the model generate correct arguments.

tool_call_id mismatch

Every tool result message must include the exact tool_call_id from the corresponding tool call. If you mix them up, the API returns a 400 error. Always pair them in a loop like the example above.