The OpenAI Assistants API gives you a stateful, server-side agent runtime. Unlike the Chat Completions API where you manage conversation history yourself, Assistants handle threading, tool execution, and state persistence on OpenAI’s servers. You create an assistant once, spin up threads per conversation, and let the API manage the back-and-forth – including automatic code execution and file handling.
This is the right choice when you need persistent conversations, built-in code execution, or file retrieval without building that infrastructure yourself. If you want something lighter-weight or fully client-side, stick with Chat Completions and function calling. But for anything with multi-turn tool use and state, Assistants save you a lot of plumbing.
Set Up the Client
| |
| |
Set your API key:
| |
That’s all you need. The openai package handles auth, retries, and streaming. Make sure you’re on version 1.x or later – the Assistants API doesn’t exist in the 0.x SDK.
Create an Assistant with Tools
An assistant is a configured agent that persists across conversations. You define its instructions, model, and which tools it can use. The three built-in tool types are code_interpreter, file_search, and custom function tools.
| |
A few opinions here: always set explicit instructions. The default behavior without instructions is unpredictable. And use gpt-4o for agents that need tool calling – the smaller models hallucinate tool arguments more often and struggle with multi-step reasoning.
Run a Conversation with Threads
Threads are conversation containers. Each thread holds a message history, and you can create as many as you need per assistant. OpenAI stores the thread server-side, so you just need the thread ID to resume later.
| |
The run kicks off the agent loop on OpenAI’s servers. The assistant reads the thread, decides which tools to call, and starts executing. But it doesn’t return the result immediately – you need to poll for completion.
Poll for Run Completion
Runs are asynchronous. After creating one, you poll until it reaches a terminal status. The key statuses are completed, requires_action (needs you to handle a function call), failed, and expired.
| |
Polling every second is fine for most use cases. If you need real-time updates in a UI, use the streaming API instead (covered below). Don’t poll faster than once per second – you’ll hit rate limits and waste API calls.
Handle Function Tool Calls
When the assistant decides to call your custom function, the run enters the requires_action status. You extract the function call arguments, execute your function locally, and submit the results back.
| |
Always return JSON strings in the output field. The assistant parses the output and uses it in its next reasoning step. If your function fails, return an error message in the output rather than raising an exception – the assistant can often recover and explain the error to the user.
The Full Agent Loop
Putting it all together, here’s the complete autonomous agent loop that handles multiple rounds of tool calls:
| |
The key thing to notice: the while True loop handles cases where the assistant calls a function, gets the result, then decides it needs to call another function (or use code_interpreter) before producing a final answer. Real agents often chain 3-5 tool calls per query.
Use Streaming for Real-Time Output
Polling works but the user sees nothing until the run completes. Streaming gives you token-by-token output and real-time tool call events:
| |
Streaming is strictly better for user-facing applications. The only reason to use polling is batch processing where latency doesn’t matter.
Common Errors
Run expired after 10 minutes. Runs have a default timeout. If your function calls take too long, the run expires. Submit tool outputs within a few seconds, or increase the timeout with timeout in the run configuration.
Invalid tool_call_id when submitting outputs. You must use the exact tool_call.id from the run’s required_action. Don’t generate your own IDs. Each tool call has a unique ID and the API validates it.
Thread not found errors. Thread IDs are scoped to your organization. If you’re switching between API keys or orgs, threads from one won’t be visible in another. Store thread IDs alongside the org context.
Rate limit exceeded during polling. If you’re polling too aggressively across many concurrent runs, you’ll burn through your rate limit. Use exponential backoff or switch to streaming, which uses a single long-lived connection.
Function arguments are malformed JSON. The model sometimes generates invalid JSON for complex parameter schemas. Keep your function parameter schemas simple – flat objects with clear descriptions work best. If you need nested objects, provide explicit examples in the function description.
code_interpreter runs but produces wrong results. The sandbox environment has limited packages. It includes numpy, pandas, matplotlib, and scipy, but not specialized libraries. If the assistant tries to import sklearn and fails, it’ll usually fall back to manual implementations – which may be wrong. Specify what’s available in your assistant instructions.
When to Use Assistants vs Chat Completions
Use the Assistants API when you need persistent threads (customer support, tutoring), built-in code execution, or file handling. The server manages state, so you don’t need a database for conversation history.
Use Chat Completions with function calling when you want full control over the loop, lower latency (no polling overhead), or you’re building something stateless. Chat Completions is also cheaper per token since there’s no thread storage cost.
Don’t mix them in the same application for the same use case. Pick one pattern and stick with it.
Related Guides
- How to Build a GitHub Issue Triage Agent with LLMs and the GitHub API
- How to Build a Scheduling Agent with Calendar and Email Tools
- How to Build a Data Pipeline Agent with LLMs and Pandas
- How to Build a Planning Agent with Task Decomposition
- How to Build a Data Analysis Agent with Code Execution
- How to Build a Browser Automation Agent with Playwright and LLMs
- How to Build a File Management Agent with Filesystem Tools
- How to Build an MCP Server for AI Agents with Python
- How to Build Agents with LangGraph
- How to Build an API Testing Agent with LLMs and Requests