How to Build Prompt Chains with Tool Results and Structured Outputs

Most prompt chains pass raw text between steps. That works until a tool returns data your next step can’t parse. The fix: combine tool calling with structured outputs so every step in your chain both executes real functions and returns validated, typed data.

Here’s the pattern. Step 1 calls a tool, gets results. Step 2 takes those results, parsed into a Pydantic model, and feeds them into the next LLM call with its own structured output schema. Each handoff is typed JSON – no string munging, no regex, no guessing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import json
from openai import OpenAI
from pydantic import BaseModel, Field

client = OpenAI()

# Define the tool
tools = [
    {
        "type": "function",
        "name": "search_products",
        "description": "Search a product catalog by query",
        "parameters": {
            "type": "object",
            "required": ["query", "max_results"],
            "properties": {
                "query": {"type": "string", "description": "Search query"},
                "max_results": {"type": "integer", "description": "Number of results to return"}
            },
            "additionalProperties": False,
        },
        "strict": True,
    }
]

# Structured output schema for the final analysis
class ProductRecommendation(BaseModel):
    product_name: str = Field(description="Name of the recommended product")
    reason: str = Field(description="Why this product fits the user's needs")
    price_range: str = Field(description="Estimated price range like '$50-$100'")

class RecommendationReport(BaseModel):
    top_picks: list[ProductRecommendation] = Field(description="Top 3 product recommendations")
    budget_summary: str = Field(description="One-line budget guidance")

That’s the foundation. Tools handle the external data fetching, Pydantic models enforce what comes back. Now let’s wire the chain together.

Step 1: Tool Call with the Responses API

The first step triggers a tool call. You send a user message, the model decides it needs product data, and you execute the function locally.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Step 1: Let the model call the search tool
response = client.responses.create(
    model="gpt-4.1",
    input=[
        {"role": "system", "content": "You are a product recommendation assistant. Use the search tool to find relevant products before making recommendations."},
        {"role": "user", "content": "I need a good mechanical keyboard for programming under $150"},
    ],
    tools=tools,
    tool_choice="required",
)

# Extract the tool call
tool_call = next(
    item for item in response.output if item.type == "function_call"
)
args = json.loads(tool_call.arguments)
print(f"Tool: {tool_call.name}, Args: {args}")
# Tool: search_products, Args: {'query': 'mechanical keyboard programming', 'max_results': 5}

Setting tool_choice="required" forces the model to call a tool. If you use "auto", it might skip the tool call and hallucinate product data. For chains where tool data is mandatory, always force it.

Now execute the tool and send results back:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Your actual function -- replace with real catalog search
def search_products(query: str, max_results: int) -> list[dict]:
    return [
        {"name": "Keychron Q1 Pro", "price": 129, "switches": "Gateron Brown", "features": ["hot-swap", "wireless", "aluminum frame"]},
        {"name": "GMMK Pro", "price": 109, "switches": "Fox Linear", "features": ["hot-swap", "rotary knob", "gasket mount"]},
        {"name": "Leopold FC660M", "price": 119, "switches": "Cherry MX Brown", "features": ["compact", "PBT keycaps", "wired"]},
    ]

# Execute the tool
product_data = search_products(**args)

# Send tool results back to the model
followup_input = response.output + [
    {
        "type": "function_call_output",
        "call_id": tool_call.call_id,
        "output": json.dumps(product_data),
    }
]

step1_result = client.responses.create(
    model="gpt-4.1",
    input=[
        {"role": "system", "content": "You are a product recommendation assistant."},
        {"role": "user", "content": "I need a good mechanical keyboard for programming under $150"},
    ] + followup_input,
    tools=tools,
)
print(step1_result.output_text)

At this point the model has real product data. But the output is unstructured text. That’s where the second step comes in.

Step 2: Parse Tool Results into Structured Output

Take the tool results and the model’s text response, then feed them into a structured output call. This is the key handoff – raw tool data goes in, validated Pydantic objects come out.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Step 2: Parse the tool results into a structured recommendation
chain_input = json.dumps({
    "user_request": "mechanical keyboard for programming under $150",
    "search_results": product_data,
    "assistant_analysis": step1_result.output_text,
})

step2_response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {
            "role": "system",
            "content": (
                "You are a product analyst. Given search results and an initial analysis, "
                "produce a structured recommendation report. Pick the top 3 products and "
                "explain why each fits the user's needs."
            ),
        },
        {"role": "user", "content": chain_input},
    ],
    response_format=RecommendationReport,
)

report = step2_response.choices[0].message.parsed
print(f"Budget: {report.budget_summary}")
for pick in report.top_picks:
    print(f"  {pick.product_name}: {pick.reason} ({pick.price_range})")

The client.beta.chat.completions.parse() method automatically validates the response against your Pydantic model. If the model returns invalid JSON, OpenAI retries internally. You always get a RecommendationReport instance or an error – never half-parsed garbage.

Chaining Multiple Tools with Structured Handoffs

Real pipelines often need multiple tool calls. Here’s a two-tool chain: first search for products, then fetch reviews, then produce a structured verdict.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# Additional tool for fetching reviews
review_tool = {
    "type": "function",
    "name": "get_reviews",
    "description": "Get customer reviews for a specific product",
    "parameters": {
        "type": "object",
        "required": ["product_name"],
        "properties": {
            "product_name": {"type": "string", "description": "Exact product name"},
        },
        "additionalProperties": False,
    },
    "strict": True,
}

all_tools = tools + [review_tool]

# Intermediate structured model for the review step
class ReviewSummary(BaseModel):
    product_name: str
    avg_rating: float = Field(description="Average rating out of 5")
    pros: list[str] = Field(description="Top 3 pros from reviews")
    cons: list[str] = Field(description="Top 3 cons from reviews")

class ReviewReport(BaseModel):
    reviews: list[ReviewSummary]


def get_reviews(product_name: str) -> list[dict]:
    """Simulated review lookup."""
    return [
        {"rating": 4.5, "text": f"Great build quality on the {product_name}"},
        {"rating": 4.0, "text": "Typing feel is excellent, software could be better"},
        {"rating": 5.0, "text": "Best keyboard I've owned for coding"},
    ]


def run_full_chain(user_query: str) -> ReviewReport:
    # Step 1: Search products via tool call
    resp1 = client.responses.create(
        model="gpt-4.1",
        input=[
            {"role": "system", "content": "Search for products matching the user's request."},
            {"role": "user", "content": user_query},
        ],
        tools=all_tools,
        tool_choice={"type": "function", "name": "search_products"},
    )

    tool_call = next(item for item in resp1.output if item.type == "function_call")
    products = search_products(**json.loads(tool_call.arguments))

    # Step 2: Get reviews for top products via tool call
    review_data = {}
    for product in products[:3]:
        reviews = get_reviews(product["name"])
        review_data[product["name"]] = reviews

    # Step 3: Structured output -- combine everything into a typed report
    combined = json.dumps({
        "products": products[:3],
        "reviews": review_data,
    })

    step3 = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {
                "role": "system",
                "content": "Analyze the product data and reviews. Summarize each product's review sentiment.",
            },
            {"role": "user", "content": combined},
        ],
        response_format=ReviewReport,
    )
    return step3.choices[0].message.parsed


report = run_full_chain("wireless mechanical keyboard for coding")
for review in report.reviews:
    print(f"{review.product_name}: {review.avg_rating}/5")
    print(f"  Pros: {', '.join(review.pros)}")
    print(f"  Cons: {', '.join(review.cons)}")

Notice the pattern: tool calls handle data fetching, structured outputs handle data shaping. Each step is independently testable. You can swap out the real search_products for a mock during testing and the chain still works because the contract is the Pydantic model, not the tool output format.

Handling Tool Call Failures in Chains

Tools fail. APIs time out. Your chain needs to handle this without crashing the whole pipeline.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from pydantic import BaseModel, Field

class ChainError(BaseModel):
    step: str = Field(description="Which step failed")
    error: str = Field(description="What went wrong")
    fallback_used: bool = Field(description="Whether a fallback was applied")

def safe_tool_call(func, **kwargs):
    """Wrap tool execution with error handling."""
    try:
        return {"success": True, "data": func(**kwargs)}
    except Exception as e:
        return {"success": False, "error": str(e)}

# Use in your chain
result = safe_tool_call(search_products, query="ergonomic keyboard", max_results=5)
if not result["success"]:
    # Feed the error into the structured output step
    # so the model can explain the failure clearly
    error_input = json.dumps({
        "user_query": "ergonomic keyboard",
        "tool_error": result["error"],
    })
    error_response = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {"role": "system", "content": "The product search failed. Explain the error and suggest the user try again."},
            {"role": "user", "content": error_input},
        ],
        response_format=ChainError,
    )
    error_report = error_response.choices[0].message.parsed
    print(f"Step '{error_report.step}' failed: {error_report.error}")

Wrapping each tool call means a failure in step 2 doesn’t blow up step 3. You can route errors into the structured output path so even failure responses are typed and parseable by downstream code.

Common Errors and Fixes

BadRequestError: response_format must be a supported type – You passed a Pydantic model to the Responses API. Use client.beta.chat.completions.parse() for Pydantic-based structured outputs. The Responses API uses text.format with a JSON schema dict instead.

call_id mismatch when returning tool results – Every function_call_output must include the exact call_id from the tool call. Copy it directly from tool_call.call_id. Don’t generate your own.

Tool call returns but model ignores it – You forgot to include the tool call output in the followup input. The followup must contain both the original response.output (which has the function call) and your function_call_output item.

Structured output has None for .parsed – Check response.choices[0].message.refusal. If the model refused the request (content policy, etc.), .parsed will be None and .refusal will explain why.

ValidationError from Pydantic – Your model has required fields without defaults. Make sure every field in your BaseModel either has a default value or is something the model will always generate. Use Field(description=...) to guide the model.

Chain is slow due to sequential calls – If your tools are independent, run them concurrently with asyncio.gather or Python threads. The structured output step at the end still runs sequentially since it depends on all tool results.

Step 1: Tool Call with the Responses API#

Step 2: Parse Tool Results into Structured Output#

Chaining Multiple Tools with Structured Handoffs#

Handling Tool Call Failures in Chains#

Common Errors and Fixes#

Related Guides#

About the Author

Step 1: Tool Call with the Responses API

Step 2: Parse Tool Results into Structured Output

Chaining Multiple Tools with Structured Handoffs

Handling Tool Call Failures in Chains

Common Errors and Fixes

Related Guides