The OpenAI Agents SDK is OpenAI’s production framework for building multi-agent systems in Python. It replaced Swarm (the experimental repo from late 2024) with a proper package, real documentation, and features you actually need in production: typed tool definitions, agent handoffs, input/output guardrails, and built-in tracing. If you tried Swarm and found it too bare-bones, this is what it grew into.

The SDK is intentionally minimal compared to LangGraph or CrewAI. There’s no graph DSL, no YAML configs, no complex abstractions. You define agents as Python objects, give them tools and instructions, and run them. That’s it.

Install the SDK

1
pip install openai-agents

You need an OpenAI API key:

1
export OPENAI_API_KEY="sk-your-key-here"

The SDK requires Python 3.9+. It pulls in openai and pydantic as dependencies – if you already have those, the install is fast.

Your First Agent

An Agent is a wrapper around an LLM with instructions, tools, and optional handoff targets. Here’s the simplest possible agent:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from agents import Agent, Runner

agent = Agent(
    name="assistant",
    instructions="You are a helpful assistant that answers questions concisely.",
)

result = Runner.run_sync(agent, "What is the capital of Japan?")
print(result.final_output)
# "The capital of Japan is Tokyo."

Runner.run_sync() is the synchronous entry point. It runs the agent loop – sending messages to the LLM, executing tool calls, handling handoffs – until the agent produces a final text response. There’s also an async version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import asyncio
from agents import Agent, Runner

agent = Agent(
    name="assistant",
    instructions="You are a helpful assistant.",
)

async def main():
    result = await Runner.run(agent, "What is the capital of Japan?")
    print(result.final_output)

asyncio.run(main())

Use the async version in production. The sync wrapper is convenient for scripts and notebooks but blocks the event loop.

Define Function Tools

Tools are where agents get useful. The SDK uses a @function_tool decorator that automatically generates the JSON schema from your function’s type hints and docstring.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from agents import Agent, Runner, function_tool

@function_tool
def get_weather(city: str, unit: str = "fahrenheit") -> str:
    """Get the current weather for a city.

    Args:
        city: The city name, e.g. 'San Francisco'
        unit: Temperature unit, either 'fahrenheit' or 'celsius'
    """
    # Replace with a real API call
    return f"72 degrees {unit} and sunny in {city}"

@function_tool
def search_docs(query: str) -> str:
    """Search the internal documentation for relevant articles.

    Args:
        query: The search query string
    """
    return f"Found 3 results for '{query}': [doc1, doc2, doc3]"

agent = Agent(
    name="support_agent",
    instructions="You help users with weather and documentation questions. Use the tools available to you.",
    tools=[get_weather, search_docs],
)

result = Runner.run_sync(agent, "What's the weather in Portland?")
print(result.final_output)

The type hints matter. The SDK uses them to build the parameters schema that gets sent to the model. If you skip type hints, the model won’t know what arguments your tool expects and will hallucinate them.

Tools with Pydantic Models

For complex inputs, use Pydantic models directly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from pydantic import BaseModel, Field
from agents import Agent, Runner, function_tool

class FlightSearch(BaseModel):
    origin: str = Field(description="Departure airport code (IATA)")
    destination: str = Field(description="Arrival airport code (IATA)")
    date: str = Field(description="Travel date in YYYY-MM-DD format")

@function_tool
def search_flights(params: FlightSearch) -> str:
    """Search for available flights between two airports."""
    return f"Found 5 flights from {params.origin} to {params.destination} on {params.date}"

This gives you automatic validation. If the model produces a bad date format or missing field, Pydantic catches it before your function runs.

Agent Handoffs

Handoffs are the SDK’s best feature. One agent can transfer control to another agent mid-conversation. The receiving agent picks up the full message history and continues.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from agents import Agent, Runner

billing_agent = Agent(
    name="billing_agent",
    instructions=(
        "You handle billing questions: invoices, payments, refunds, "
        "and subscription changes. Be direct and helpful."
    ),
)

technical_agent = Agent(
    name="technical_agent",
    instructions=(
        "You handle technical support: API errors, integration issues, "
        "and debugging. Ask for error messages and code snippets."
    ),
)

triage_agent = Agent(
    name="triage_agent",
    instructions=(
        "You are the front-line support agent. Determine what the user needs "
        "and hand off to the appropriate specialist. "
        "For billing questions, hand off to billing_agent. "
        "For technical issues, hand off to technical_agent. "
        "If unclear, ask one clarifying question before handing off."
    ),
    handoffs=[billing_agent, technical_agent],
)

result = Runner.run_sync(triage_agent, "I got a 429 error when calling the embeddings endpoint")
print(result.final_output)
# technical_agent handles this -- explains rate limiting and retry strategies

The triage agent reads the user’s message, decides it’s a technical issue, and hands off to technical_agent. The user never sees the handoff – they just get an answer from the right specialist. The result object tracks which agent produced the final output.

This pattern scales to any routing problem: language-based routing, tier-based support, domain-specific experts. You can chain handoffs too – technical_agent could hand off to an escalation_agent if it can’t solve the problem.

Guardrails

Guardrails run validation on inputs or outputs before and after agent execution. They’re separate from the agent’s instructions – think of them as programmable safety checks.

Input Guardrails

Input guardrails validate user messages before the agent processes them:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from agents import Agent, Runner, InputGuardrail, GuardrailFunctionOutput

async def check_for_pii(ctx, agent, input_data) -> GuardrailFunctionOutput:
    """Block messages that contain email addresses or phone numbers."""
    import re
    text = input_data if isinstance(input_data, str) else str(input_data)
    has_email = bool(re.search(r'[\w.-]+@[\w.-]+\.\w+', text))
    has_phone = bool(re.search(r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b', text))

    return GuardrailFunctionOutput(
        output_info={"has_pii": has_email or has_phone},
        tripwire_triggered=has_email or has_phone,
    )

agent = Agent(
    name="safe_agent",
    instructions="You are a helpful assistant.",
    input_guardrails=[
        InputGuardrail(guardrail_function=check_for_pii),
    ],
)

# This will raise an InputGuardrailTripwireTriggered exception
try:
    result = Runner.run_sync(agent, "My email is [email protected], help me reset my password")
except Exception as e:
    print(f"Blocked: {e}")

When tripwire_triggered is True, the SDK raises an InputGuardrailTripwireTriggered exception and the agent never processes the message. You can also use output guardrails with OutputGuardrail to validate agent responses before they reach the user.

Tracing

Every agent run is automatically traced. The SDK captures each LLM call, tool execution, handoff, and guardrail check in a structured trace you can inspect in the OpenAI dashboard.

Traces are enabled by default. You’ll see them in your OpenAI dashboard under the Traces tab. For custom trace names:

1
2
3
4
5
6
7
from agents import Agent, Runner, trace

agent = Agent(name="my_agent", instructions="You are helpful.")

with trace("customer-support-flow"):
    result = Runner.run_sync(agent, "Help me with my order")
    print(result.final_output)

The trace() context manager groups related runs under a single trace. This is useful when you have multiple agent runs in a single request – a triage agent handing off to a specialist, for example. Both runs appear under one trace.

To disable tracing (for local development or testing):

1
2
3
from agents import tracing

tracing.set_disabled(True)

Common Errors and Fixes

openai.AuthenticationError: Incorrect API key

1
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-your...here.', 'type': 'invalid_request_error'}}

Your OPENAI_API_KEY is missing or wrong. The SDK uses the openai client under the hood, so it reads from the same environment variable. Double-check with echo $OPENAI_API_KEY in the same shell where you run your script.

ModuleNotFoundError: No module named 'agents'

1
ModuleNotFoundError: No module named 'agents'

You installed the wrong package. The correct install is pip install openai-agents, which gives you the agents module. A common mistake is running pip install agents – that’s a completely different package.

InputGuardrailTripwireTriggered

1
agents.exceptions.InputGuardrailTripwireTriggered: Guardrail InputGuardrail triggered tripwire

An input guardrail blocked the message. This is working as intended – catch the exception and return a user-friendly error message instead of letting it bubble up.

ModelBehaviorError on handoffs

If your agent tries to hand off to a target that isn’t in its handoffs list, you’ll get a ModelBehaviorError. Make sure every agent that should be a handoff target is included in the handoffs parameter of the calling agent. Also check that your instructions mention the handoff targets by name – the model needs to know they exist.

Tool schema validation errors

1
2
3
pydantic.ValidationError: 1 validation error for get_weather
city
  Field required [type=missing, input_value={}, input_type=dict]

The model called your tool without required arguments. This usually means your tool’s docstring is unclear. Be explicit about what each parameter does and when the tool should be used.

How It Compares to LangGraph and CrewAI

vs. LangGraph: LangGraph gives you a graph-based execution model with conditional edges, checkpointing, and human-in-the-loop patterns. It’s more flexible but more complex. The Agents SDK is simpler – you don’t define a graph, you define agents and let the framework handle routing via handoffs. Pick LangGraph if you need fine-grained control over execution flow. Pick the Agents SDK if you want to get a multi-agent system running fast with less boilerplate.

vs. CrewAI: CrewAI focuses on “crews” of agents with roles and goals, using a sequential or hierarchical process. It’s higher-level and more opinionated. The Agents SDK is lower-level – you control the tool definitions, handoff logic, and guardrails yourself. CrewAI is better for rapid prototyping of role-based workflows. The Agents SDK is better when you need precise control over what each agent can and cannot do.

The Agents SDK’s main advantage is that it’s made by OpenAI. Tracing integrates directly with the OpenAI dashboard, the tool schema matches OpenAI’s function calling format natively, and you don’t need adapter layers between the SDK and the API.