How to Build Multi-Step Prompt Chains with Structured Outputs

Prompt chains break when you pass unstructured text between steps. One LLM call returns slightly different formatting, the next call misinterprets it, and your whole pipeline falls apart. Structured outputs fix this by guaranteeing each step returns validated JSON that matches an exact schema.

The approach is straightforward: define a Pydantic model for each step’s output, use OpenAI’s structured output API to enforce that schema, then feed the validated data directly into the next step. No regex parsing. No “please return JSON” prayers. Just typed, validated data flowing through your pipeline.

Define Your Step Models with Pydantic

Each step in your chain needs a clear contract. Pydantic models serve as that contract. Here’s a three-step pipeline that extracts entities from raw text, classifies them, then generates a structured summary.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from pydantic import BaseModel, Field

# Step 1: Entity extraction
class Entity(BaseModel):
    name: str = Field(description="The entity name as it appears in the text")
    entity_type: str = Field(description="One of: person, organization, location, product, event")
    context: str = Field(description="The sentence where this entity appears")

class ExtractedEntities(BaseModel):
    entities: list[Entity] = Field(description="All named entities found in the text")

# Step 2: Classification
class ClassifiedEntity(BaseModel):
    name: str
    entity_type: str
    relevance: str = Field(description="One of: primary, secondary, mentioned")
    sentiment: str = Field(description="One of: positive, negative, neutral")

class ClassificationResult(BaseModel):
    classified_entities: list[ClassifiedEntity]
    dominant_topic: str = Field(description="The main topic of the text in 3-5 words")

# Step 3: Summary generation
class StructuredSummary(BaseModel):
    one_line_summary: str = Field(description="A single sentence summary")
    key_findings: list[str] = Field(description="3-5 bullet point findings")
    primary_entities: list[str] = Field(description="Names of primary-relevance entities only")
    sentiment_overview: str = Field(description="Overall sentiment: positive, negative, or mixed")

These models aren’t just documentation. OpenAI will reject any response that doesn’t conform to the schema. That’s the entire point.

Chain the Calls Together

Now wire the steps into a pipeline. Each step takes the previous step’s output and builds on it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
import json
from openai import OpenAI

client = OpenAI()

def run_chain(text: str) -> StructuredSummary:
    # Step 1: Extract entities
    step1_response = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {
                "role": "system",
                "content": "Extract all named entities from the provided text. Be thorough."
            },
            {"role": "user", "content": text}
        ],
        response_format=ExtractedEntities,
    )
    extracted = step1_response.choices[0].message.parsed

    # Step 2: Classify entities with original text for context
    step2_input = json.dumps({
        "original_text": text,
        "entities": [e.model_dump() for e in extracted.entities]
    })

    step2_response = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {
                "role": "system",
                "content": (
                    "Classify each entity by relevance and sentiment. "
                    "Determine the dominant topic of the text."
                )
            },
            {"role": "user", "content": step2_input}
        ],
        response_format=ClassificationResult,
    )
    classified = step2_response.choices[0].message.parsed

    # Step 3: Generate structured summary from classified data
    step3_input = json.dumps({
        "original_text": text,
        "dominant_topic": classified.dominant_topic,
        "classified_entities": [e.model_dump() for e in classified.classified_entities]
    })

    step3_response = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {
                "role": "system",
                "content": (
                    "Generate a structured summary based on the classified entities "
                    "and original text. Focus on primary entities."
                )
            },
            {"role": "user", "content": step3_input}
        ],
        response_format=StructuredSummary,
    )
    return step3_response.choices[0].message.parsed


# Run it
article = """
OpenAI announced GPT-5 at their San Francisco headquarters yesterday.
CEO Sam Altman described it as a major leap in reasoning capabilities.
Google DeepMind responded by accelerating their Gemini Ultra release timeline.
Meanwhile, Anthropic's Claude team in San Francisco has been quietly shipping
improvements to their constitutional AI approach, which researchers at
Stanford praised for its safety properties.
"""

result = run_chain(article)
print(f"Summary: {result.one_line_summary}")
print(f"Sentiment: {result.sentiment_overview}")
for finding in result.key_findings:
    print(f"  - {finding}")
print(f"Primary entities: {', '.join(result.primary_entities)}")

Notice how each step serializes the previous output to JSON before passing it. This keeps the data clean and prevents the model from hallucinating fields that don’t exist.

Add Error Handling Between Steps

Production chains need retry logic. Structured outputs eliminate most parsing failures, but you still need to handle API errors, rate limits, and content filter refusals.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
import time
from openai import APIError, RateLimitError

def call_with_retry(func, max_retries: int = 3, base_delay: float = 1.0):
    """Retry wrapper with exponential backoff for chain steps."""
    for attempt in range(max_retries):
        try:
            return func()
        except RateLimitError:
            delay = base_delay * (2 ** attempt)
            print(f"Rate limited. Retrying in {delay}s (attempt {attempt + 1}/{max_retries})")
            time.sleep(delay)
        except APIError as e:
            if attempt == max_retries - 1:
                raise
            print(f"API error: {e}. Retrying (attempt {attempt + 1}/{max_retries})")
            time.sleep(base_delay)
    raise RuntimeError("Max retries exceeded")


def run_chain_with_retries(text: str) -> StructuredSummary:
    extracted = call_with_retry(lambda: client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {"role": "system", "content": "Extract all named entities from the provided text."},
            {"role": "user", "content": text}
        ],
        response_format=ExtractedEntities,
    )).choices[0].message.parsed

    if not extracted.entities:
        raise ValueError("Step 1 returned no entities. Input text may be too short or ambiguous.")

    classified = call_with_retry(lambda: client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {
                "role": "system",
                "content": "Classify each entity by relevance and sentiment."
            },
            {
                "role": "user",
                "content": json.dumps({
                    "original_text": text,
                    "entities": [e.model_dump() for e in extracted.entities]
                })
            }
        ],
        response_format=ClassificationResult,
    )).choices[0].message.parsed

    summary = call_with_retry(lambda: client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {
                "role": "system",
                "content": "Generate a structured summary from the classified entities."
            },
            {
                "role": "user",
                "content": json.dumps({
                    "original_text": text,
                    "dominant_topic": classified.dominant_topic,
                    "classified_entities": [e.model_dump() for e in classified.classified_entities]
                })
            }
        ],
        response_format=StructuredSummary,
    )).choices[0].message.parsed

    return summary

The key detail: validate the output of each step before feeding it to the next. An empty entity list from step 1 will produce garbage in step 2. Catch it early.

Using the Raw JSON Schema Approach

If you don’t want the Pydantic integration, you can use the raw json_schema response format directly. This is useful when you’re building schemas dynamically or working in a language without Pydantic.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
response = client.chat.completions.create(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "Extract named entities from the text."},
        {"role": "user", "content": "OpenAI launched GPT-5 in San Francisco."}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "entity_extraction",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "entities": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "entity_type": {"type": "string"}
                            },
                            "required": ["name", "entity_type"],
                            "additionalProperties": False
                        }
                    }
                },
                "required": ["entities"],
                "additionalProperties": False
            }
        }
    }
)

data = json.loads(response.choices[0].message.content)
print(data["entities"])

Set "strict": True and "additionalProperties": False at every level of your schema. Without these, the model can still return extra fields or skip required ones.

When to Split vs. Combine Steps

Not every task needs three separate API calls. Here’s how I think about it:

Split into separate steps when:

Each step requires different system prompts or personas
You need to validate intermediate results before continuing
The combined prompt would exceed the model’s effective reasoning capacity
You want to cache or reuse intermediate results

Combine into a single call when:

The total output fits one schema naturally
Steps are tightly coupled with no meaningful intermediate validation
Latency matters more than reliability

Three API calls cost three times the latency. If you can get the same quality from one call with a well-designed schema, do that instead.

Common Errors and Fixes

BadRequestError: Invalid schema – Your JSON schema has a syntax error. The most common cause is missing "additionalProperties": false on nested objects. Every object in a strict schema needs this field.

message.parsed is None – The model’s response was filtered by the content policy. Check response.choices[0].message.refusal. If it’s not None, the model refused to respond. Rephrase your prompt or input.

ValidationError from Pydantic – Your Pydantic model has field types that don’t map cleanly to JSON Schema. Stick to str, int, float, bool, list, and nested BaseModel types. Avoid Union types with more than two options and Optional fields in strict mode — use a default value instead.

Steps producing inconsistent entity names – Step 1 extracts “OpenAI” but step 2 refers to “Open AI”. Pass the exact entity list from step 1 into step 2’s prompt as structured data, not as part of a natural language paragraph. The JSON serialization approach shown above prevents this.

Rate limiting on multi-step chains – Three sequential calls per request adds up fast. Use the retry wrapper above, and consider gpt-4o-mini for classification steps that don’t need the full model’s reasoning capacity. It’s cheaper and faster for simple categorization tasks.

Chain fails silently with empty results – Always validate between steps. Check that lists aren’t empty, required fields aren’t blank strings, and classifications match your expected enum values before passing data to the next step.

Define Your Step Models with Pydantic#

Chain the Calls Together#

Add Error Handling Between Steps#

Using the Raw JSON Schema Approach#

When to Split vs. Combine Steps#

Common Errors and Fixes#

Related Guides#

About the Author