Why Parallel Tool Calling Matters

Most LLM tool calling tutorials show you the basics: define a tool, call it, return the result. That works fine when your model needs one piece of data. But real-world agents rarely need just one thing.

Say your user asks “What’s the weather in Tokyo, London, and New York?” A naive implementation makes three sequential API calls to your weather service. Each takes 200-500ms. Your user waits 1.5 seconds for no good reason.

Parallel tool calling lets the model request multiple tool executions in a single response. You get back an array of tool calls, run them all concurrently, and send every result back in one message. The user gets their answer in ~500ms instead of 1500ms.

OpenAI’s GPT-4o and Claude 3.5+ both support this natively. The model decides when to request multiple tools at once – you just need to handle the response correctly.

Defining Tools for the OpenAI API

First, set up your tool definitions. These go in the tools parameter of your chat completion request. Each tool is a JSON Schema describing its name, purpose, and parameters.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
from openai import OpenAI

client = OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. 'Tokyo'"
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_population",
            "description": "Get the population of a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

When you send a message like “Compare the weather and population of Tokyo and Paris,” the model can emit up to four tool calls in a single response – get_weather and get_population for each city.

Handling Parallel Tool Call Responses

Here’s the full flow: send the request, detect tool calls, execute them, and return results. The key detail is that each tool call has a unique id. You must reference that id when sending results back.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
import json

def get_weather(city: str, units: str = "celsius") -> dict:
    """Simulated weather lookup. Replace with a real API call."""
    weather_data = {
        "Tokyo": {"temp": 18, "condition": "cloudy"},
        "Paris": {"temp": 12, "condition": "rainy"},
        "New York": {"temp": 8, "condition": "sunny"},
        "London": {"temp": 10, "condition": "foggy"},
    }
    data = weather_data.get(city, {"temp": 20, "condition": "unknown"})
    if units == "fahrenheit":
        data["temp"] = round(data["temp"] * 9 / 5 + 32)
    data["units"] = units
    data["city"] = city
    return data

def get_population(city: str) -> dict:
    """Simulated population lookup."""
    populations = {
        "Tokyo": 13_960_000,
        "Paris": 2_161_000,
        "New York": 8_336_000,
        "London": 8_982_000,
    }
    return {"city": city, "population": populations.get(city, 0)}

# Map function names to actual callables
tool_functions = {
    "get_weather": get_weather,
    "get_population": get_population,
}

messages = [
    {"role": "user", "content": "Compare the weather and population of Tokyo and Paris."}
]

# Step 1: Send the request
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
)

assistant_message = response.choices[0].message

# Step 2: Check if the model wants to call tools
if assistant_message.tool_calls:
    # Append the assistant message (with tool_calls) to the conversation
    messages.append(assistant_message)

    # Step 3: Execute each tool call and collect results
    for tool_call in assistant_message.tool_calls:
        func_name = tool_call.function.name
        func_args = json.loads(tool_call.function.arguments)

        # Call the actual function
        result = tool_functions[func_name](**func_args)

        # Step 4: Append each result with the matching tool_call id
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(result),
        })

    # Step 5: Send everything back for the final answer
    final_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
    )

    print(final_response.choices[0].message.content)

The model receives all tool results at once and synthesizes them into a single answer. This is already faster than sequential calling because the model batches its requests. But the tool executions themselves still run in a loop. Let’s fix that.

Running Tool Calls Concurrently with asyncio

When your tools make real network calls (HTTP requests to APIs, database queries), you want them running at the same time. Use asyncio.gather to execute all tool calls concurrently.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import asyncio
import json
import httpx
from openai import AsyncOpenAI

aclient = AsyncOpenAI()

async def get_weather_async(city: str, units: str = "celsius") -> dict:
    """Fetch weather from an API. Using httpx for async HTTP."""
    async with httpx.AsyncClient() as http:
        resp = await http.get(
            "https://wttr.in/",
            params={"format": "j1", "q": city},
            timeout=10.0,
        )
        data = resp.json()
        current = data["current_condition"][0]
        temp_key = "temp_C" if units == "celsius" else "temp_F"
        return {
            "city": city,
            "temp": int(current[temp_key]),
            "condition": current["weatherDesc"][0]["value"],
            "units": units,
        }

async def get_population_async(city: str) -> dict:
    """Simulated async population lookup."""
    await asyncio.sleep(0.1)  # Simulate network delay
    populations = {
        "Tokyo": 13_960_000,
        "Paris": 2_161_000,
        "New York": 8_336_000,
        "London": 8_982_000,
    }
    return {"city": city, "population": populations.get(city, 0)}

# Async function registry
async_tool_functions = {
    "get_weather": get_weather_async,
    "get_population": get_population_async,
}

async def run_parallel_tools(tool_calls):
    """Execute all tool calls concurrently and return results."""
    tasks = []
    for tc in tool_calls:
        func_name = tc.function.name
        func_args = json.loads(tc.function.arguments)
        coro = async_tool_functions[func_name](**func_args)
        tasks.append((tc.id, coro))

    # Run all coroutines concurrently
    results = await asyncio.gather(*(coro for _, coro in tasks))

    # Pair each result with its tool_call_id
    return [
        {
            "role": "tool",
            "tool_call_id": tc_id,
            "content": json.dumps(result),
        }
        for (tc_id, _), result in zip(tasks, results)
    ]

async def chat_with_parallel_tools(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    response = await aclient.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,  # same tool definitions from above
    )

    assistant_message = response.choices[0].message

    if not assistant_message.tool_calls:
        return assistant_message.content

    messages.append(assistant_message)

    # Execute all tools concurrently
    tool_results = await run_parallel_tools(assistant_message.tool_calls)
    messages.extend(tool_results)

    # Get final response
    final = await aclient.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
    )
    return final.choices[0].message.content

# Run it
answer = asyncio.run(
    chat_with_parallel_tools("What's the weather in Tokyo, London, and New York?")
)
print(answer)

With three cities, this runs all three weather lookups simultaneously. If each takes 300ms, total wall time is ~300ms instead of ~900ms.

Controlling Parallel Behavior with tool_choice

You can influence whether the model calls tools at all, and OpenAI gives you a parallel_tool_calls parameter to explicitly control batching.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Force the model to call a specific tool
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_weather"}},
    parallel_tool_calls=True,  # default is True for supported models
)

# Disable parallel tool calls (force sequential)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    parallel_tool_calls=False,
)

Set parallel_tool_calls=False when tool ordering matters – for example, when the second tool depends on the first tool’s output. The model will then emit one tool call at a time.

Multi-Round Tool Calling Loops

Sometimes the model needs multiple rounds of tool calls. The first round gathers data, and based on those results, it requests more tools. Here’s a pattern that handles arbitrary rounds.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
async def multi_round_tool_loop(user_message: str, max_rounds: int = 5) -> str:
    messages = [{"role": "user", "content": user_message}]

    for _ in range(max_rounds):
        response = await aclient.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
        )

        assistant_message = response.choices[0].message
        messages.append(assistant_message)

        # If no tool calls, we're done
        if not assistant_message.tool_calls:
            return assistant_message.content

        # Execute this round's tool calls in parallel
        tool_results = await run_parallel_tools(assistant_message.tool_calls)
        messages.extend(tool_results)

    # If we hit max_rounds, get a final answer without tools
    response = await aclient.chat.completions.create(
        model="gpt-4o",
        messages=messages,
    )
    return response.choices[0].message.content

Cap max_rounds to prevent runaway loops. Five rounds is generous for most use cases.

Parallel Tool Calling with Anthropic’s Claude

Claude supports parallel tool calls through the same pattern. The API shape is different, but the concept is identical. Claude returns a list of tool_use content blocks when it wants multiple tools.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
import anthropic
import json

ac = anthropic.Anthropic()

claude_tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                }
            },
            "required": ["city"]
        }
    },
    {
        "name": "get_population",
        "description": "Get the population of a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"]
        }
    }
]

response = ac.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=claude_tools,
    messages=[
        {"role": "user", "content": "Weather and population for Tokyo and Paris?"}
    ],
)

# Collect all tool_use blocks from the response
tool_use_blocks = [
    block for block in response.content if block.type == "tool_use"
]

# Execute each tool and build result blocks
tool_results = []
for block in tool_use_blocks:
    result = tool_functions[block.name](**block.input)
    tool_results.append({
        "type": "tool_result",
        "tool_use_id": block.id,
        "content": json.dumps(result),
    })

# Send results back
final_response = ac.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=claude_tools,
    messages=[
        {"role": "user", "content": "Weather and population for Tokyo and Paris?"},
        {"role": "assistant", "content": response.content},
        {"role": "user", "content": tool_results},
    ],
)

# Extract text from the response
text_blocks = [b.text for b in final_response.content if b.type == "text"]
print("\n".join(text_blocks))

Notice the structural difference: Claude nests tool results inside a user message with tool_result type blocks, while OpenAI uses a top-level tool role. Both APIs return parallel calls as arrays you can execute concurrently.

Common Errors and Fixes

tool_call_id mismatch: Every tool result must reference the exact id from the corresponding tool_call. If you mix these up, the API returns a 400 error. Double-check your ID mapping.

Forgetting to append the assistant message: Before sending tool results, you must include the assistant’s message (the one containing tool_calls) in the conversation. Skipping it breaks the message flow.

1
2
3
4
5
6
# Wrong -- missing assistant message
messages.append({"role": "tool", "tool_call_id": "call_abc", "content": "..."})

# Correct
messages.append(assistant_message)  # includes tool_calls
messages.append({"role": "tool", "tool_call_id": "call_abc", "content": "..."})

JSON parsing failures: tool_call.function.arguments is a JSON string, not a dict. Always parse it with json.loads(). The model occasionally produces malformed JSON – wrap it in a try/except.

1
2
3
4
try:
    func_args = json.loads(tool_call.function.arguments)
except json.JSONDecodeError:
    func_args = {}  # fallback or re-prompt the model

asyncio.gather swallowing exceptions: By default, asyncio.gather raises the first exception and cancels remaining tasks. Use return_exceptions=True if you want partial results.

1
2
3
4
5
results = await asyncio.gather(*coroutines, return_exceptions=True)
for r in results:
    if isinstance(r, Exception):
        # Handle the failed tool call gracefully
        pass

Rate limits on concurrent calls: Firing 10 parallel HTTP requests to the same API can trigger rate limits. Add a semaphore to cap concurrency.

1
2
3
4
5
sem = asyncio.Semaphore(5)

async def rate_limited_call(func, **kwargs):
    async with sem:
        return await func(**kwargs)

Claude tool results in wrong message format: Claude expects tool results inside the user role as a list of tool_result objects. Sending them as top-level tool role messages (like OpenAI) returns a validation error.