The Core Idea
LLMs mess up. They return broken JSON, hallucinate fields, ignore schema constraints, and go off-topic. Instead of crashing or returning garbage to users, you can build chains that catch these failures and retry with the error fed back into the prompt. The LLM gets a second (or third) shot at producing correct output, and your system stays reliable.
The pattern is simple: call the LLM, validate the output, and if validation fails, re-prompt with the error message attached. Most issues resolve within 1-2 retries.
Structured Outputs with Pydantic Validation
Start by defining what “correct” looks like. Pydantic models are the best tool for this because they give you typed fields, custom validators, and clear error messages you can feed back to the LLM.
| |
This model rejects confidence scores outside 0-1, enforces a minimum reasoning length, and restricts sentiment to three allowed values. When validation fails, Pydantic gives you a structured error message that tells the LLM exactly what went wrong.
The Retry Decorator
Here’s a reusable decorator that handles retries with exponential backoff. It catches both Pydantic validation errors and JSON parsing failures.
| |
The key detail: each retry passes the error message to the function via _correction_hint. The function uses this to build a corrective prompt.
Self-Healing Prompt Chain
Now wire the pieces together. The LLM call function receives the correction hint and appends it to the conversation, giving the model explicit feedback about what went wrong.
| |
On the first call, the LLM just gets the normal prompt. If it returns broken JSON or fails validation, the retry loop catches the error, waits, and calls again with the error message appended. The model now knows exactly what went wrong – “confidence must be between 0.0 and 1.0, got 1.5” is a clear instruction to fix.
Validator Functions for Complex Checks
Sometimes you need validation beyond what Pydantic handles. Build standalone validator functions that check semantic correctness.
| |
Plug this into the chain by calling it after Pydantic validation succeeds:
| |
Now the chain catches both structural errors (bad JSON, wrong types) and semantic errors (contradictory reasoning).
Running the Full Chain
| |
This processes each text, retrying failures automatically. Most malformed outputs get fixed on the first retry because the LLM can see what it did wrong.
Common Errors and Fixes
json.JSONDecodeError on every attempt: The model keeps wrapping output in markdown code fences. The code fence stripping logic above handles this, but if you’re using a different model, check what prefix it adds. Some models output json\n{...}\n instead of \``json`.
Pydantic ValidationError for missing fields: The model sometimes omits optional-looking fields. Make your system prompt explicit about every required field. List them out. Don’t rely on the model inferring from examples alone.
RateLimitError masking validation retries: Your retry decorator catches ValidationError and JSONDecodeError, but rate limit errors from the API need separate handling. Add openai.RateLimitError to your except clause, or better, use the tenacity library for API-level retries and keep your decorator focused on validation retries.
| |
Infinite correction loops: If the model can’t fix a particular error after 3 tries, it probably won’t fix it on attempt 4. Three retries is a good default. Beyond that, fall back to a different prompt template or a more capable model.
Temperature too high on retries: Set temperature to 0.0 or 0.1 on correction attempts. High temperature introduces randomness that works against targeted fixes. You want the model to focus on the specific error, not generate a completely new creative response.
When to Use This Pattern
Self-correcting chains work best when the output has a clear schema or validation rules. JSON extraction, classification, entity extraction, and structured data generation are all good fits. They work less well for open-ended creative tasks where “correct” is subjective.
The overhead is minimal – you only pay for extra API calls when the first attempt fails. With gpt-4o and clear prompts, first-attempt success rates typically run above 95% for well-defined schemas. The retry logic is insurance for the other 5%.
Related Guides
- How to Build Prompt Chains with Tool Results and Structured Outputs
- How to Build Multi-Step Prompt Chains with Structured Outputs
- How to Build Token-Efficient Prompt Batching with LLM APIs
- How to Build Agentic RAG with Query Routing and Self-Reflection
- How to Build Prompt Chains with Async LLM Calls and Batching
- How to Build a Knowledge Graph from Text with LLMs
- How to Build Prompt Fallback Chains with Automatic Model Switching
- How to Build Prompt Caching Strategies for Multi-Turn LLM Sessions
- How to Build Dynamic Prompt Routers with LLM Cascading
- How to Build Prompt Guardrails with Structured Output Schemas