GPT-5.2’s structured outputs solve the oldest headache in LLM application development: getting the model to return valid, schema-conforming JSON every single time. Not most of the time. Every time.

Unlike the older json_object mode that only guaranteed syntactically valid JSON, structured outputs enforce your exact schema during token generation. The model literally cannot produce a response that violates your schema – OpenAI constrains the token sampling inside the transformer itself. Here is how to set it up, what breaks, and how to fix it.

The Responses API Approach

GPT-5.2 uses the Responses API (the successor to Chat Completions). Structured outputs live under the text.format parameter with type: "json_schema". You must set strict: True and include additionalProperties: false in your schema.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
from openai import OpenAI
import json

client = OpenAI()

response = client.responses.create(
    model="gpt-5.2",
    input="Extract the product info: 'The Razer Viper V3 Pro weighs 54g and costs $159.99'",
    text={
        "format": {
            "type": "json_schema",
            "name": "product_info",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "product_name": {
                        "type": "string",
                        "description": "Full product name"
                    },
                    "weight_grams": {
                        "type": "number",
                        "description": "Weight in grams"
                    },
                    "price_usd": {
                        "type": "number",
                        "description": "Price in US dollars"
                    },
                    "category": {
                        "type": "string",
                        "enum": ["mouse", "keyboard", "headset", "monitor", "other"]
                    }
                },
                "required": ["product_name", "weight_grams", "price_usd", "category"],
                "additionalProperties": False
            }
        }
    }
)

data = json.loads(response.output_text)
print(data)
# {"product_name": "Razer Viper V3 Pro", "weight_grams": 54, "price_usd": 159.99, "category": "mouse"}

The key things that trip people up: additionalProperties must be False (not omitted), every field must appear in required, and the schema name must be a valid identifier (no spaces or special characters).

Using Pydantic for Cleaner Schemas

Writing raw JSON schemas gets tedious fast. The OpenAI Python SDK integrates directly with Pydantic, so you can define your schema as a Python class and let the SDK handle the conversion.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
from openai import OpenAI
from pydantic import BaseModel
from typing import Optional

client = OpenAI()

class ReviewAnalysis(BaseModel):
    sentiment: str  # "positive", "negative", "neutral"
    confidence: float
    key_topics: list[str]
    recommended: bool
    summary: str
    issues_mentioned: Optional[list[str]] = None

response = client.responses.parse(
    model="gpt-5.2",
    input="Analyze this review: 'Battery life is incredible, easily lasts 2 days. "
          "Camera is decent but struggles in low light. Screen is gorgeous. "
          "Worth every penny at this price point.'",
    text_format=ReviewAnalysis,
)

review = response.output_parsed
print(f"Sentiment: {review.sentiment} ({review.confidence})")
print(f"Topics: {', '.join(review.key_topics)}")
print(f"Recommended: {review.recommended}")

The .parse() method returns a typed Python object, not a raw string. You get autocomplete, type checking, and no manual JSON parsing. If the model’s output gets truncated mid-stream (hitting the output token limit), output_parsed will be None and response.output[0].status will be "incomplete" – always check for this.

Common Errors and How to Fix Them

Invalid Schema: additionalProperties

1
2
BadRequestError: Invalid schema: 'additionalProperties' must be set to false
for all objects in the schema when 'strict' is true.

This is the most common mistake. Every object type in your schema, including nested ones, needs "additionalProperties": false. If you have a nested object inside a property, it needs the flag too.

Schema Too Complex

1
2
BadRequestError: Invalid schema: schema is too complex for structured output.
Consider simplifying your schema.

OpenAI limits schema complexity. You cannot have more than 5 levels of nesting or more than 100 total properties across the schema. The fix is usually to flatten your schema or split the extraction into multiple calls.

Refusal Responses

Sometimes the model refuses to fill your schema (for safety reasons). When this happens, output_parsed is None and you will find a refusal message in the response. Always handle this case:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
response = client.responses.parse(
    model="gpt-5.2",
    input=user_input,
    text_format=MySchema,
)

if response.output_parsed is None:
    # Check if it was a refusal or truncation
    output_msg = response.output[0]
    if output_msg.status == "incomplete":
        print("Response was truncated -- increase max_output_tokens")
    else:
        print(f"Model refused: check input content")
else:
    result = response.output_parsed
    # proceed normally

JSON String Wrapping Bug

A known issue with the GPT-5 family: the model sometimes returns the JSON wrapped in an extra string layer, so you get "{\"key\": \"value\"}" instead of {"key": "value"}. This happens more frequently without strict: True. If you are using the raw json_schema format (not Pydantic .parse()), double-check whether you need to call json.loads() twice.

Logprobs Not Returned

If you need logprobs for confidence scoring, be aware that GPT-5.2 currently returns empty logprobs when structured outputs are enabled. This is a confirmed bug affecting both GPT-5.1 and GPT-5.2. GPT-4.1 does return logprobs with structured outputs, so if you need both features simultaneously, use gpt-4.1 for now.

Choosing the Right GPT-5.2 Variant

GPT-5.2 comes in three variants, all supporting structured outputs:

VariantModel IDBest ForPricing (per 1M tokens)
Instantgpt-5.2General extraction, fast responses$1.75 in / $14.00 out
Thinkinggpt-5.2 with reasoningComplex multi-step extractionSame base + reasoning tokens
Progpt-5.2-proMaximum accuracy, critical dataHigher tier

For most structured output tasks, the base gpt-5.2 model is the right choice. The thinking variant adds reasoning tokens before the structured response, which helps when the extraction requires inference (like calculating a value from context), but it costs more due to the extra reasoning tokens. Cached input tokens get a 90% discount, so if you are making repeated calls with the same system prompt, your effective input cost drops to $0.175 per million tokens.

Performance Tips

Keep schemas focused. Do not try to extract 30 fields in one call. Split large extraction tasks into 2-3 focused schemas. You will get better accuracy and faster responses.

Use enums for categorical fields. Instead of letting the model generate free-text for fields like status or category, define an enum constraint. This eliminates an entire class of downstream parsing bugs.

Set reasoning.effort appropriately. For straightforward extraction tasks, "low" or "medium" reasoning effort reduces latency without hurting accuracy. Save "high" and "xhigh" for cases where the model needs to infer or calculate values.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
response = client.responses.create(
    model="gpt-5.2",
    input="Extract the date and amount: 'Invoice #4521, dated March 3rd 2026, total $1,247.50'",
    reasoning={"effort": "low"},
    text={
        "format": {
            "type": "json_schema",
            "name": "invoice_extract",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "invoice_number": {"type": "string"},
                    "date": {"type": "string", "description": "ISO 8601 format"},
                    "total_usd": {"type": "number"}
                },
                "required": ["invoice_number", "date", "total_usd"],
                "additionalProperties": False
            }
        }
    }
)

Validate on your end anyway. Structured outputs guarantee schema conformance, not semantic correctness. The model will always return a valid number for price_usd, but it might hallucinate the wrong number. Add application-level validation for business-critical fields.

Structured outputs turn GPT-5.2 from a text generator into a reliable data extraction engine. The strict schema enforcement happens at the token level, so you are not relying on prompt engineering to get valid JSON – you are guaranteed it by the architecture. Start with the Pydantic .parse() approach for clean code, drop to raw JSON schemas when you need fine-grained control, and always handle the refusal and truncation edge cases.