How to Build a SQL Query Agent with LLMs and Tool Calling

A SQL query agent takes plain English questions, writes SQL, runs it against a real database, and explains the results. No dashboards, no query builders – just a conversation. The trick is giving the LLM the right tools and guardrails so it can explore your schema, execute queries safely, and self-correct when something goes wrong.

Here’s what the full agent looks like. We’ll build it piece by piece.

1
pip install openai

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
import sqlite3
import json
from openai import OpenAI

client = OpenAI()

# Create a sample database
conn = sqlite3.connect(":memory:")
conn.execute("PRAGMA foreign_keys = ON")
cursor = conn.cursor()

cursor.executescript("""
CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT UNIQUE,
    plan TEXT CHECK(plan IN ('free', 'pro', 'enterprise')),
    created_at TEXT DEFAULT (date('now'))
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(id),
    amount REAL NOT NULL,
    status TEXT CHECK(status IN ('pending', 'completed', 'refunded')),
    created_at TEXT DEFAULT (date('now'))
);

CREATE TABLE products (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    category TEXT,
    price REAL NOT NULL
);

CREATE TABLE order_items (
    id INTEGER PRIMARY KEY,
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER NOT NULL,
    unit_price REAL NOT NULL
);

INSERT INTO customers (name, email, plan) VALUES
    ('Alice Chen', '[email protected]', 'enterprise'),
    ('Bob Martinez', '[email protected]', 'pro'),
    ('Carol White', '[email protected]', 'free'),
    ('Dave Kim', '[email protected]', 'pro'),
    ('Eve Johnson', '[email protected]', 'enterprise');

INSERT INTO products (name, category, price) VALUES
    ('Widget Pro', 'hardware', 49.99),
    ('Data Sync', 'software', 19.99),
    ('Cloud Pack', 'software', 99.99),
    ('Sensor Kit', 'hardware', 149.99);

INSERT INTO orders (customer_id, amount, status, created_at) VALUES
    (1, 249.95, 'completed', '2026-01-15'),
    (1, 99.99, 'completed', '2026-02-01'),
    (2, 49.99, 'completed', '2026-01-20'),
    (3, 19.99, 'pending', '2026-02-10'),
    (4, 199.98, 'completed', '2026-02-05'),
    (5, 349.97, 'completed', '2026-01-25'),
    (2, 99.99, 'refunded', '2026-02-12');

INSERT INTO order_items (order_id, product_id, quantity, unit_price) VALUES
    (1, 1, 5, 49.99),
    (2, 3, 1, 99.99),
    (3, 1, 1, 49.99),
    (4, 2, 1, 19.99),
    (5, 2, 2, 19.99),
    (5, 1, 2, 49.99),
    (5, 4, 1, 149.99),
    (6, 3, 2, 99.99),
    (6, 4, 1, 149.99),
    (7, 3, 1, 99.99);
""")
conn.commit()

That gives us a small but realistic e-commerce database with customers, orders, products, and line items. Enough to ask real questions.

Building the Agent Tools

The agent needs three tools: one to inspect the database schema, one to execute queries, and one to format results. Each tool is a plain function with strict guardrails.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
DANGEROUS_KEYWORDS = {"INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "CREATE", "TRUNCATE", "REPLACE", "GRANT", "REVOKE"}

def get_schema() -> str:
    """Return the database schema as CREATE TABLE statements."""
    cursor.execute("SELECT sql FROM sqlite_master WHERE type='table' AND sql IS NOT NULL")
    tables = cursor.fetchall()
    schema = "\n\n".join(row[0] for row in tables)
    return schema


def execute_query(sql: str) -> str:
    """Execute a read-only SQL query with safety checks."""
    sql_stripped = sql.strip().rstrip(";")

    # Block anything that isn't a SELECT
    tokens = sql_stripped.upper().split()
    if not tokens:
        return json.dumps({"error": "Empty query"})

    if tokens[0] != "SELECT":
        return json.dumps({"error": "Only SELECT queries are allowed"})

    # Check for dangerous keywords anywhere in the query
    upper_sql = sql_stripped.upper()
    for keyword in DANGEROUS_KEYWORDS:
        # Look for the keyword as a standalone word
        if keyword in upper_sql.split():
            return json.dumps({"error": f"Forbidden keyword: {keyword}"})

    try:
        cursor.execute(sql_stripped)
        columns = [desc[0] for desc in cursor.description] if cursor.description else []
        rows = cursor.fetchmany(100)
        results = [dict(zip(columns, row)) for row in rows]
        return json.dumps({
            "columns": columns,
            "rows": results,
            "row_count": len(results),
            "truncated": len(results) == 100
        }, default=str)
    except sqlite3.Error as e:
        return json.dumps({"error": str(e)})


def format_results(data: str, format_type: str = "table") -> str:
    """Format query results as a readable table or summary."""
    parsed = json.loads(data)
    if "error" in parsed:
        return f"Error: {parsed['error']}"

    columns = parsed["columns"]
    rows = parsed["rows"]

    if not rows:
        return "No results found."

    if format_type == "summary":
        return f"Query returned {len(rows)} row(s) with columns: {', '.join(columns)}"

    # Simple text table
    col_widths = {col: max(len(str(col)), max(len(str(row.get(col, ""))) for row in rows)) for col in columns}
    header = " | ".join(col.ljust(col_widths[col]) for col in columns)
    separator = "-+-".join("-" * col_widths[col] for col in columns)
    body = "\n".join(
        " | ".join(str(row.get(col, "")).ljust(col_widths[col]) for col in columns)
        for row in rows
    )
    return f"{header}\n{separator}\n{body}"

The execute_query function does two critical things: it blocks any non-SELECT statement at the token level, and it scans for dangerous keywords that could be embedded in subqueries or CTEs. It also caps results at 100 rows to keep token usage under control.

Wiring Up the Agent Loop

Now we define the tool schemas and the agent loop. The LLM calls get_schema first to understand the database, then writes and executes queries.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_schema",
            "description": "Get the database schema (all CREATE TABLE statements). Call this first to understand available tables and columns.",
            "parameters": {
                "type": "object",
                "properties": {},
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "execute_query",
            "description": "Execute a read-only SQL SELECT query against the database. Returns JSON with columns, rows, and row count.",
            "parameters": {
                "type": "object",
                "properties": {
                    "sql": {
                        "type": "string",
                        "description": "A SQL SELECT query to execute. Only SELECT is allowed.",
                    },
                },
                "required": ["sql"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "format_results",
            "description": "Format raw query results into a readable table or summary.",
            "parameters": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "string",
                        "description": "JSON string of query results from execute_query.",
                    },
                    "format_type": {
                        "type": "string",
                        "enum": ["table", "summary"],
                        "description": "Output format: 'table' for full results, 'summary' for row count overview.",
                    },
                },
                "required": ["data"],
            },
        },
    },
]

SYSTEM_PROMPT = """You are a SQL query agent. You help users answer questions about data by writing and executing SQL queries.

Your workflow:
1. Call get_schema to see what tables and columns exist
2. Write a SQL SELECT query based on the user's question
3. Call execute_query to run it
4. Interpret the results and explain them clearly

Rules:
- Always inspect the schema before writing queries
- Only use SELECT statements
- Limit results to 50 rows unless the user wants more
- If a query fails, read the error and try a corrected version
- Explain your findings in plain language after getting results"""

# Tool dispatch map
tool_dispatch = {
    "get_schema": lambda **kwargs: get_schema(),
    "execute_query": lambda **kwargs: execute_query(kwargs["sql"]),
    "format_results": lambda **kwargs: format_results(kwargs["data"], kwargs.get("format_type", "table")),
}


def run_sql_agent(question: str, conversation: list = None) -> tuple[str, list]:
    """Run the SQL agent. Returns the response and updated conversation history."""
    if conversation is None:
        conversation = [{"role": "system", "content": SYSTEM_PROMPT}]

    conversation.append({"role": "user", "content": question})

    for _ in range(10):  # max 10 tool calls per turn
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=conversation,
            tools=tools,
        )

        msg = response.choices[0].message
        conversation.append(msg)

        # If no tool calls, we have the final answer
        if not msg.tool_calls:
            return msg.content, conversation

        # Execute each tool call
        for tc in msg.tool_calls:
            func_name = tc.function.name
            args = json.loads(tc.function.arguments)

            print(f"  [{func_name}] {json.dumps(args)[:200]}")

            if func_name in tool_dispatch:
                result = tool_dispatch[func_name](**args)
            else:
                result = json.dumps({"error": f"Unknown tool: {func_name}"})

            conversation.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": result,
            })

    return "Agent reached the tool call limit.", conversation

The run_sql_agent function returns the conversation history so you can pass it back for follow-up questions. That’s what makes multi-turn work.

Multi-Turn Conversations

The real power shows up when the agent can refine queries across turns. The user asks a vague question, the agent runs a query, and the user drills deeper based on the results.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# First question
answer, convo = run_sql_agent("Who are our biggest customers?")
print(answer)
# The agent will call get_schema, then run something like:
# SELECT c.name, SUM(o.amount) as total_spent
# FROM customers c JOIN orders o ON c.id = o.customer_id
# WHERE o.status = 'completed'
# GROUP BY c.id ORDER BY total_spent DESC

# Follow-up using the same conversation
answer, convo = run_sql_agent("What did Alice order specifically?", convo)
print(answer)
# It already knows the schema and can immediately query:
# SELECT p.name, oi.quantity, oi.unit_price
# FROM order_items oi
# JOIN orders o ON oi.order_id = o.id
# JOIN products p ON oi.product_id = p.id
# JOIN customers c ON o.customer_id = c.id
# WHERE c.name = 'Alice Chen'

# Push further
answer, convo = run_sql_agent("Compare that to the average order value across all customers", convo)
print(answer)

Each follow-up reuses the conversation history, so the agent remembers context. It knows “that” in the third question refers to Alice’s spending. This is the difference between a query tool and an agent – it maintains state and reasons across turns.

Safety: Preventing Bad Queries

The keyword-blocking approach above catches most attacks, but production systems need more layers. Here’s a stricter validator that also handles sneaky injection patterns.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import re

def validate_query(sql: str) -> tuple[bool, str]:
    """Validate a SQL query for safety. Returns (is_safe, reason)."""
    normalized = sql.strip().upper()

    # Must start with SELECT or WITH (for CTEs)
    if not (normalized.startswith("SELECT") or normalized.startswith("WITH")):
        return False, "Query must start with SELECT or WITH"

    # Block write operations even in subqueries
    write_patterns = [
        r'\bINSERT\b', r'\bUPDATE\b', r'\bDELETE\b', r'\bDROP\b',
        r'\bALTER\b', r'\bCREATE\b', r'\bTRUNCATE\b', r'\bREPLACE\b',
        r'\bGRANT\b', r'\bREVOKE\b', r'\bATTACH\b', r'\bDETACH\b',
    ]
    for pattern in write_patterns:
        if re.search(pattern, normalized):
            keyword = pattern.replace(r'\b', '').strip()
            return False, f"Forbidden keyword: {keyword}"

    # Block multiple statements (semicolons inside the query)
    # Remove string literals first so we don't flag semicolons inside quotes
    no_strings = re.sub(r"'[^']*'", "", sql)
    if ";" in no_strings:
        return False, "Multiple statements not allowed"

    # Block PRAGMA and other SQLite-specific admin commands
    if "PRAGMA" in normalized:
        return False, "PRAGMA not allowed"

    return True, "OK"


# Upgrade execute_query to use the validator
def execute_query_safe(sql: str) -> str:
    """Execute a validated read-only SQL query."""
    is_safe, reason = validate_query(sql)
    if not is_safe:
        return json.dumps({"error": reason})

    try:
        cursor.execute(sql.strip().rstrip(";"))
        columns = [desc[0] for desc in cursor.description] if cursor.description else []
        rows = cursor.fetchmany(100)
        results = [dict(zip(columns, row)) for row in rows]
        return json.dumps({
            "columns": columns,
            "rows": results,
            "row_count": len(results),
            "truncated": len(results) == 100
        }, default=str)
    except sqlite3.Error as e:
        return json.dumps({"error": str(e)})

Key safety measures in this design:

Read-only enforcement: Both token-level and regex-based checks block write operations
Multi-statement blocking: Strips string literals first, then checks for semicolons – this defeats '; DROP TABLE customers; -- style injections
CTE support: Allows WITH clauses since they’re common in analytical queries
PRAGMA blocking: Prevents schema modifications and database attach attacks specific to SQLite
Result limiting: fetchmany(100) caps output so a SELECT * on a million-row table doesn’t blow up your context window

For production, also consider running queries on a read-only database replica or using SQLite’s SQLITE_OPEN_READONLY flag:

1
2
# Open database in read-only mode at the connection level
conn = sqlite3.connect("file:mydb.db?mode=ro", uri=True)

This gives you defense in depth. Even if a crafted query somehow bypasses the text-level checks, the database connection itself will refuse writes.

Common Errors and Fixes

“Only SELECT queries are allowed” on valid queries. If your query starts with a comment or whitespace, the keyword check fails. Strip and normalize before checking:

1
2
3
sql = sql.strip()
if sql.startswith("--"):
    sql = "\n".join(line for line in sql.split("\n") if not line.strip().startswith("--"))

LLM generates queries with wrong column names. This happens when the agent skips the schema inspection step. Make the system prompt explicit: “Always call get_schema before writing any query.” You can also enforce this in code by checking if get_schema was called before allowing execute_query.

Token limits hit on large result sets. Cap rows in your execute_query function and tell the agent to use LIMIT clauses. If results are still too large, return only the first few rows with a count: “Showing 10 of 5,432 rows.”

Agent loops endlessly on syntax errors. Set a hard limit on tool calls per turn (we used 10 above). If the agent can’t fix its SQL after 3 attempts, it’s usually better to ask the user to rephrase.

SQLite type mismatches. SQLite is loosely typed, so WHERE created_at > '2026-01-01' works on TEXT dates but produces wrong results with mixed formats. Standardize your date columns or tell the agent the expected format in the system prompt.

Agent tries to join tables that don’t have a relationship. Include foreign key information in the schema tool output. The sql column from sqlite_master includes REFERENCES clauses, which is why we return the raw CREATE TABLE statements.

Building the Agent Tools#

Wiring Up the Agent Loop#

Multi-Turn Conversations#

Safety: Preventing Bad Queries#

Common Errors and Fixes#

Related Guides#

About the Author

Building the Agent Tools

Wiring Up the Agent Loop

Multi-Turn Conversations

Safety: Preventing Bad Queries

Common Errors and Fixes

Related Guides