Your users want to ask questions in English and get answers from a database. “What were our top 10 customers last quarter?” should not require someone who knows SQL. Text-to-SQL bridges that gap – an LLM reads the question, understands your schema, and writes the query.
The approach that works best right now: extract your database schema automatically, inject it into a structured prompt, generate SQL with an LLM, validate the output, and execute it safely with read-only permissions. Here is the full pipeline.
Extract Your Database Schema
The LLM needs to know what tables and columns exist. Do not paste your schema manually – pull it programmatically so it stays in sync.
| |
| |
This produces clean DDL that the LLM can parse. Including primary and foreign keys is critical – without them, the model guesses at join conditions and gets them wrong half the time.
For large databases with dozens of tables, filter down to the relevant subset. Sending 200 tables in the prompt confuses the model and wastes tokens. Keep it under 20 tables, or better yet, build a table selector that picks the right tables based on the question.
Prompt Engineering for SQL Generation
The prompt structure matters more than the model choice. A well-crafted prompt with gpt-4o-mini outperforms a lazy prompt with gpt-4o on SQL generation benchmarks.
Here is the system prompt pattern that consistently works:
| |
Setting temperature=0 is non-negotiable for SQL generation. Any randomness introduces subtle bugs – wrong column names, flipped join conditions, off-by-one date ranges. You want deterministic output every time.
I recommend gpt-4o-mini over gpt-4o for straightforward queries. It handles joins, aggregations, and subqueries just fine, and it costs 10x less. Escalate to gpt-4o only when you see failures on complex multi-step queries with CTEs or window functions.
Adding Few-Shot Examples
For domain-specific databases, add 2-3 question/SQL pairs to the system prompt. This teaches the model your naming conventions and common query patterns:
| |
Append this to your system prompt. Two or three examples dramatically reduce hallucinated column names and wrong join paths.
Execute Generated Queries Safely
Never run LLM-generated SQL with write permissions. This is the single most important rule. An LLM will occasionally produce DROP TABLE or UPDATE statements, especially if the user prompt is adversarial.
| |
Three layers of defense here. First, a keyword blocklist catches the obvious mutations. Second, requiring queries to start with SELECT or WITH blocks anything that slipped through. Third, the statement timeout prevents expensive full-table scans from locking your database for minutes.
But the real protection is at the database level. Create a dedicated read-only user for your text-to-SQL pipeline:
| |
Even if the keyword filter misses something, the database itself will reject any write operation. Defense in depth.
Validate and Explain the SQL Output
Generated SQL can be syntactically valid but semantically wrong. The model might join on the wrong column, filter the wrong date range, or misinterpret “last quarter.” Always give the user a way to verify.
| |
Show the explanation to the user before executing. This catches semantic errors that automated validation cannot – the user knows if “this year” means fiscal year or calendar year, and the explanation reveals which interpretation the model chose.
Handling Errors and Retries
SQL generation fails. The model hallucinates a column that does not exist, writes invalid syntax, or produces a query that times out. Build retry logic that feeds errors back to the model:
| |
This self-correcting loop handles the most common failure mode: the model references a column that does not exist. PostgreSQL returns column "xyz" does not exist, the model sees the error, and fixes the column name on the next attempt. In practice, 80% of recoverable failures resolve on the first retry.
Common Errors and Fixes
column "xyz" does not exist
| |
The model hallucinated a column name. This is the most frequent error. The self-correcting retry loop above handles it automatically, but you can reduce the frequency by including sample data in your prompt – not just the schema, but 2-3 example rows per table. The model hallucinates less when it sees real column names in context.
relation "xyz" does not exist
| |
The model assumed a table name. Maybe your table is called accounts, not users. Make sure your schema extraction includes every relevant table, and consider adding table-level comments in the schema DDL: -- This table stores customer accounts (often called "users" in business context).
statement timeout
| |
The generated query is too expensive. Common causes: missing LIMIT, full cross joins, or aggregating millions of rows without an index. Add a post-generation check that injects LIMIT 100 if the query does not already have one:
| |
permission denied for table xyz
| |
Your read-only user does not have access to that table. This is actually good – it means your access controls are working. Either grant SELECT on the missing table, or remove it from the schema you pass to the LLM so it never tries to query it.
Blocked by keyword filter but the query is legitimate
A query like SELECT status, COUNT(*) FROM orders GROUP BY status might get blocked if your keyword filter is too aggressive and catches UPDATE inside words like status_updated_at. The fix is to check for whole-word matches, which the execute_safely function above already does by padding with spaces.
When Text-to-SQL Falls Short
This approach works well for analytical queries – aggregations, filters, joins across a few tables. It struggles with complex business logic: multi-step calculations, recursive queries, or queries that require domain knowledge the schema alone does not capture.
For those cases, build a library of pre-written query templates and let the LLM pick the right template and fill in parameters, rather than generating SQL from scratch. Template selection is a much easier problem for the model than open-ended SQL generation, and you get predictable, tested queries every time.
Related Guides
- How to Build a Text Correction and Grammar Checking Pipeline
- How to Classify Text with Zero-Shot and Few-Shot LLMs
- How to Build a Text Style Transfer Pipeline with Transformers
- How to Extract Structured Data from PDFs with LLMs
- How to Summarize Long Documents with LLMs and Map-Reduce
- How to Build a Named Entity Recognition Pipeline with spaCy and Transformers
- How to Build a Spell Checking and Autocorrect Pipeline with Python
- How to Build a Text Classification Pipeline with SetFit
- How to Build a Text Similarity API with Cross-Encoders
- How to Build a Text Embedding Pipeline with Sentence Transformers and FAISS