Your users want to ask questions in English and get answers from a database. “What were our top 10 customers last quarter?” should not require someone who knows SQL. Text-to-SQL bridges that gap – an LLM reads the question, understands your schema, and writes the query.

The approach that works best right now: extract your database schema automatically, inject it into a structured prompt, generate SQL with an LLM, validate the output, and execute it safely with read-only permissions. Here is the full pipeline.

Extract Your Database Schema

The LLM needs to know what tables and columns exist. Do not paste your schema manually – pull it programmatically so it stays in sync.

1
pip install openai sqlalchemy
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
from sqlalchemy import create_engine, inspect

def extract_schema(database_url: str) -> str:
    """Extract CREATE TABLE statements from a database."""
    engine = create_engine(database_url)
    inspector = inspect(engine)
    schema_parts = []

    for table_name in inspector.get_table_names():
        columns = inspector.get_columns(table_name)
        pk = inspector.get_pk_constraint(table_name)
        fks = inspector.get_foreign_keys(table_name)

        col_defs = []
        for col in columns:
            nullable = "" if col["nullable"] else " NOT NULL"
            col_defs.append(f"  {col['name']} {col['type']}{nullable}")

        if pk and pk["constrained_columns"]:
            col_defs.append(
                f"  PRIMARY KEY ({', '.join(pk['constrained_columns'])})"
            )

        for fk in fks:
            local_cols = ", ".join(fk["constrained_columns"])
            remote = f"{fk['referred_table']}({', '.join(fk['referred_columns'])})"
            col_defs.append(f"  FOREIGN KEY ({local_cols}) REFERENCES {remote}")

        ddl = f"CREATE TABLE {table_name} (\n" + ",\n".join(col_defs) + "\n);"
        schema_parts.append(ddl)

    return "\n\n".join(schema_parts)


# Usage
schema = extract_schema("postgresql://user:pass@localhost/mydb")
print(schema)

This produces clean DDL that the LLM can parse. Including primary and foreign keys is critical – without them, the model guesses at join conditions and gets them wrong half the time.

For large databases with dozens of tables, filter down to the relevant subset. Sending 200 tables in the prompt confuses the model and wastes tokens. Keep it under 20 tables, or better yet, build a table selector that picks the right tables based on the question.

Prompt Engineering for SQL Generation

The prompt structure matters more than the model choice. A well-crafted prompt with gpt-4o-mini outperforms a lazy prompt with gpt-4o on SQL generation benchmarks.

Here is the system prompt pattern that consistently works:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
from openai import OpenAI

client = OpenAI()

SYSTEM_PROMPT = """You are a SQL expert. Given a database schema and a natural language question, generate a single SQL query that answers the question.

Rules:
- Output ONLY the SQL query, no explanation, no markdown fences
- Use table aliases for readability
- Always qualify column names with table aliases to avoid ambiguity
- Use LIMIT 100 unless the user asks for a specific count
- Never use SELECT * -- list specific columns
- For date filtering, use standard SQL date functions
- If the question is ambiguous, make a reasonable assumption and note it in a SQL comment

Database schema:
{schema}"""

def generate_sql(question: str, schema: str) -> str:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT.format(schema=schema)},
            {"role": "user", "content": question},
        ],
    )
    sql = response.choices[0].message.content.strip()
    # Strip markdown fences if the model adds them despite instructions
    if sql.startswith("```"):
        sql = sql.split("\n", 1)[1].rsplit("```", 1)[0].strip()
    return sql


sql = generate_sql(
    "What are the top 10 customers by total order value this year?",
    schema
)
print(sql)
# SELECT c.name, SUM(o.total) AS total_value
# FROM customers c
# JOIN orders o ON c.id = o.customer_id
# WHERE o.created_at >= '2026-01-01'
# GROUP BY c.name
# ORDER BY total_value DESC
# LIMIT 10

Setting temperature=0 is non-negotiable for SQL generation. Any randomness introduces subtle bugs – wrong column names, flipped join conditions, off-by-one date ranges. You want deterministic output every time.

I recommend gpt-4o-mini over gpt-4o for straightforward queries. It handles joins, aggregations, and subqueries just fine, and it costs 10x less. Escalate to gpt-4o only when you see failures on complex multi-step queries with CTEs or window functions.

Adding Few-Shot Examples

For domain-specific databases, add 2-3 question/SQL pairs to the system prompt. This teaches the model your naming conventions and common query patterns:

1
2
3
4
5
6
7
8
9
FEW_SHOT_EXAMPLES = """
Examples:

Question: How many orders were placed last month?
SQL: SELECT COUNT(*) AS order_count FROM orders o WHERE o.created_at >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL '1 month') AND o.created_at < DATE_TRUNC('month', CURRENT_DATE);

Question: Which products have never been ordered?
SQL: SELECT p.id, p.name FROM products p LEFT JOIN order_items oi ON p.id = oi.product_id WHERE oi.id IS NULL;
"""

Append this to your system prompt. Two or three examples dramatically reduce hallucinated column names and wrong join paths.

Execute Generated Queries Safely

Never run LLM-generated SQL with write permissions. This is the single most important rule. An LLM will occasionally produce DROP TABLE or UPDATE statements, especially if the user prompt is adversarial.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
from sqlalchemy import create_engine, text
from sqlalchemy.exc import ProgrammingError, OperationalError

def execute_safely(database_url: str, sql: str) -> list[dict]:
    """Execute a SQL query with read-only protections."""
    # Use a read-only connection string or database user
    engine = create_engine(database_url)

    # Block dangerous statements before they hit the database
    sql_upper = sql.upper().strip()
    blocked_keywords = [
        "DROP", "DELETE", "UPDATE", "INSERT", "ALTER", "TRUNCATE",
        "CREATE", "GRANT", "REVOKE", "EXEC", "EXECUTE",
    ]
    for keyword in blocked_keywords:
        # Check for keyword as a standalone word
        if f" {keyword} " in f" {sql_upper} ":
            raise ValueError(f"Blocked: query contains '{keyword}' statement")

    if not sql_upper.startswith("SELECT") and not sql_upper.startswith("WITH"):
        raise ValueError("Only SELECT and WITH (CTE) queries are allowed")

    with engine.connect() as conn:
        # Set statement timeout to prevent runaway queries
        conn.execute(text("SET statement_timeout = '30s'"))
        result = conn.execute(text(sql))
        columns = list(result.keys())
        rows = [dict(zip(columns, row)) for row in result.fetchall()]

    return rows

Three layers of defense here. First, a keyword blocklist catches the obvious mutations. Second, requiring queries to start with SELECT or WITH blocks anything that slipped through. Third, the statement timeout prevents expensive full-table scans from locking your database for minutes.

But the real protection is at the database level. Create a dedicated read-only user for your text-to-SQL pipeline:

1
2
3
4
5
6
CREATE USER texttosql_reader WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE mydb TO texttosql_reader;
GRANT USAGE ON SCHEMA public TO texttosql_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO texttosql_reader;
ALTER DEFAULT PRIVILEGES IN SCHEMA public
  GRANT SELECT ON TABLES TO texttosql_reader;

Even if the keyword filter misses something, the database itself will reject any write operation. Defense in depth.

Validate and Explain the SQL Output

Generated SQL can be syntactically valid but semantically wrong. The model might join on the wrong column, filter the wrong date range, or misinterpret “last quarter.” Always give the user a way to verify.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def generate_sql_with_explanation(question: str, schema: str) -> dict:
    """Generate SQL and a plain-English explanation of what it does."""
    sql = generate_sql(question, schema)

    # Ask the model to explain the generated query
    explanation_response = client.chat.completions.create(
        model="gpt-4o-mini",
        temperature=0,
        messages=[
            {
                "role": "system",
                "content": (
                    "Explain what this SQL query does in plain English. "
                    "Be specific about which tables are queried, what filters "
                    "are applied, and what the output represents. "
                    "Keep it to 2-3 sentences."
                ),
            },
            {"role": "user", "content": f"Question: {question}\n\nSQL:\n{sql}"},
        ],
    )

    return {
        "sql": sql,
        "explanation": explanation_response.choices[0].message.content.strip(),
    }


result = generate_sql_with_explanation(
    "What are the top 10 customers by total order value this year?",
    schema
)
print(f"SQL: {result['sql']}")
print(f"Explanation: {result['explanation']}")

Show the explanation to the user before executing. This catches semantic errors that automated validation cannot – the user knows if “this year” means fiscal year or calendar year, and the explanation reveals which interpretation the model chose.

Handling Errors and Retries

SQL generation fails. The model hallucinates a column that does not exist, writes invalid syntax, or produces a query that times out. Build retry logic that feeds errors back to the model:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def generate_and_execute(
    question: str,
    schema: str,
    database_url: str,
    max_retries: int = 3,
) -> dict:
    """Generate SQL, execute it, and retry on failure."""
    errors = []

    for attempt in range(max_retries):
        # Build the prompt with any previous errors
        messages = [
            {"role": "system", "content": SYSTEM_PROMPT.format(schema=schema)},
            {"role": "user", "content": question},
        ]

        # If previous attempts failed, include the error context
        for err in errors:
            messages.append({"role": "assistant", "content": err["sql"]})
            messages.append({
                "role": "user",
                "content": (
                    f"That query failed with this error:\n{err['error']}\n\n"
                    "Please fix the query and try again."
                ),
            })

        response = client.chat.completions.create(
            model="gpt-4o-mini",
            temperature=0,
            messages=messages,
        )
        sql = response.choices[0].message.content.strip()
        if sql.startswith("```"):
            sql = sql.split("\n", 1)[1].rsplit("```", 1)[0].strip()

        try:
            rows = execute_safely(database_url, sql)
            return {"sql": sql, "rows": rows, "attempts": attempt + 1}
        except Exception as e:
            errors.append({"sql": sql, "error": str(e)})

    raise RuntimeError(
        f"Failed after {max_retries} attempts. Errors: {errors}"
    )

This self-correcting loop handles the most common failure mode: the model references a column that does not exist. PostgreSQL returns column "xyz" does not exist, the model sees the error, and fixes the column name on the next attempt. In practice, 80% of recoverable failures resolve on the first retry.

Common Errors and Fixes

column "xyz" does not exist

1
2
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn)
column "order_date" does not exist

The model hallucinated a column name. This is the most frequent error. The self-correcting retry loop above handles it automatically, but you can reduce the frequency by including sample data in your prompt – not just the schema, but 2-3 example rows per table. The model hallucinates less when it sees real column names in context.

relation "xyz" does not exist

1
2
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable)
relation "users" does not exist

The model assumed a table name. Maybe your table is called accounts, not users. Make sure your schema extraction includes every relevant table, and consider adding table-level comments in the schema DDL: -- This table stores customer accounts (often called "users" in business context).

statement timeout

1
sqlalchemy.exc.OperationalError: canceling statement due to statement timeout

The generated query is too expensive. Common causes: missing LIMIT, full cross joins, or aggregating millions of rows without an index. Add a post-generation check that injects LIMIT 100 if the query does not already have one:

1
2
3
4
5
def add_safety_limit(sql: str, limit: int = 100) -> str:
    sql_upper = sql.upper().strip().rstrip(";")
    if "LIMIT" not in sql_upper:
        return f"{sql.rstrip(';')}\nLIMIT {limit};"
    return sql

permission denied for table xyz

1
2
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.InsufficientPrivilege)
permission denied for table internal_metrics

Your read-only user does not have access to that table. This is actually good – it means your access controls are working. Either grant SELECT on the missing table, or remove it from the schema you pass to the LLM so it never tries to query it.

Blocked by keyword filter but the query is legitimate

A query like SELECT status, COUNT(*) FROM orders GROUP BY status might get blocked if your keyword filter is too aggressive and catches UPDATE inside words like status_updated_at. The fix is to check for whole-word matches, which the execute_safely function above already does by padding with spaces.

When Text-to-SQL Falls Short

This approach works well for analytical queries – aggregations, filters, joins across a few tables. It struggles with complex business logic: multi-step calculations, recursive queries, or queries that require domain knowledge the schema alone does not capture.

For those cases, build a library of pre-written query templates and let the LLM pick the right template and fill in parameters, rather than generating SQL from scratch. Template selection is a much easier problem for the model than open-ended SQL generation, and you get predictable, tested queries every time.