The Quick Version

AI agents become useful when they can access real data. The pattern is straightforward: define tools that wrap your database queries and API calls, let the LLM decide which tools to call, and execute them with proper safety checks.

1
pip install openai sqlalchemy httpx
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
import json
from openai import OpenAI
from sqlalchemy import create_engine, text

client = OpenAI()
engine = create_engine("sqlite:///ecommerce.db")

def query_database(sql: str) -> str:
    """Execute a read-only SQL query and return results."""
    # Safety: only allow SELECT statements
    normalized = sql.strip().upper()
    if not normalized.startswith("SELECT"):
        return "Error: Only SELECT queries are allowed."

    with engine.connect() as conn:
        result = conn.execute(text(sql))
        rows = [dict(row._mapping) for row in result.fetchmany(50)]
        return json.dumps(rows, default=str)

# Test it
print(query_database("SELECT COUNT(*) as total FROM orders"))

That gives your agent read access to a database with a hard guardrail against writes. Now wire it into an agent.

Building the Agent with Database Tools

Define the tool schema so the LLM knows what queries it can run. Include your database schema in the system prompt so the model writes correct SQL.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
tools = [
    {
        "type": "function",
        "function": {
            "name": "query_database",
            "description": "Execute a read-only SQL query against the ecommerce database. Only SELECT statements allowed.",
            "parameters": {
                "type": "object",
                "properties": {
                    "sql": {
                        "type": "string",
                        "description": "SQL SELECT query to execute",
                    },
                },
                "required": ["sql"],
            },
        },
    },
]

SYSTEM_PROMPT = """You are a data analyst agent with access to an ecommerce database.

Database schema:
- customers (id, name, email, created_at)
- orders (id, customer_id, total, status, created_at)
- order_items (id, order_id, product_id, quantity, price)
- products (id, name, category, price, stock)

Rules:
- Only use SELECT queries. Never modify data.
- Limit results to 50 rows unless the user asks for more.
- Always explain your findings after querying."""

def agent_chat(question: str) -> str:
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": question},
    ]

    for _ in range(5):  # max 5 tool calls per question
        response = client.chat.completions.create(
            model="gpt-4o", messages=messages, tools=tools,
        )
        msg = response.choices[0].message
        messages.append(msg)

        if not msg.tool_calls:
            return msg.content

        for tc in msg.tool_calls:
            args = json.loads(tc.function.arguments)
            result = query_database(args["sql"])
            messages.append({
                "role": "tool",
                "tool_call_id": tc.id,
                "content": result,
            })

    return "Agent hit tool call limit."

# Ask questions in natural language
print(agent_chat("What are the top 5 products by revenue this month?"))
print(agent_chat("Which customers haven't ordered in the last 90 days?"))

The agent translates natural language into SQL, runs it, reads the results, and explains the findings. It can chain multiple queries to answer complex questions.

Adding REST API Access

Give the agent the ability to call external APIs — weather data, payment systems, CRM tools, anything with an HTTP endpoint.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import httpx

API_CONFIGS = {
    "get_customer_crm": {
        "base_url": "https://api.crm.example.com",
        "headers": {"Authorization": "Bearer crm_token_here"},
    },
    "get_shipping_status": {
        "base_url": "https://api.shipping.example.com",
        "headers": {"X-API-Key": "ship_key_here"},
    },
}

def call_api(endpoint: str, method: str = "GET", params: dict = None) -> str:
    """Call a pre-configured API endpoint."""
    # Allowlist of permitted endpoints
    allowed_prefixes = [
        "https://api.crm.example.com/",
        "https://api.shipping.example.com/",
    ]

    if not any(endpoint.startswith(prefix) for prefix in allowed_prefixes):
        return json.dumps({"error": "Endpoint not in allowlist"})

    try:
        response = httpx.request(
            method=method,
            url=endpoint,
            params=params,
            headers={"User-Agent": "AgentBot/1.0"},
            timeout=10,
        )
        return response.text[:5000]  # limit response size
    except Exception as e:
        return json.dumps({"error": str(e)})

api_tool = {
    "type": "function",
    "function": {
        "name": "call_api",
        "description": "Call an external REST API. Only pre-approved endpoints are allowed.",
        "parameters": {
            "type": "object",
            "properties": {
                "endpoint": {"type": "string", "description": "Full URL to call"},
                "method": {"type": "string", "enum": ["GET", "POST"], "description": "HTTP method"},
                "params": {
                    "type": "object",
                    "description": "Query parameters or request body",
                    "additionalProperties": {"type": "string"},
                },
            },
            "required": ["endpoint"],
        },
    },
}

The URL allowlist is critical. Without it, the LLM could call arbitrary endpoints — internal services, cloud metadata APIs, or anything else it decides to try.

Multi-Tool Agent with Database + API

Combine database and API tools so the agent can cross-reference data sources:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def handle_tool_call(tool_call) -> str:
    name = tool_call.function.name
    args = json.loads(tool_call.function.arguments)

    if name == "query_database":
        return query_database(args["sql"])
    elif name == "call_api":
        return call_api(
            args["endpoint"],
            args.get("method", "GET"),
            args.get("params"),
        )
    return json.dumps({"error": f"Unknown tool: {name}"})

all_tools = [tools[0], api_tool]

def multi_tool_agent(question: str) -> str:
    messages = [
        {
            "role": "system",
            "content": SYSTEM_PROMPT + """

You also have access to REST APIs:
- CRM API: https://api.crm.example.com/customers/{id} — get customer details
- Shipping API: https://api.shipping.example.com/track/{tracking_id} — track shipments

Use the database for order/product data and APIs for customer details and shipping status.""",
        },
        {"role": "user", "content": question},
    ]

    for _ in range(8):
        response = client.chat.completions.create(
            model="gpt-4o", messages=messages, tools=all_tools,
        )
        msg = response.choices[0].message
        messages.append(msg)

        if not msg.tool_calls:
            return msg.content

        for tc in msg.tool_calls:
            result = handle_tool_call(tc)
            messages.append({"role": "tool", "tool_call_id": tc.id, "content": result})

    return "Agent hit tool call limit."

Security: Preventing SQL Injection and Data Leaks

LLM-generated SQL is inherently risky. Layer multiple defenses:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import re

BLOCKED_KEYWORDS = {"DROP", "DELETE", "UPDATE", "INSERT", "ALTER", "TRUNCATE", "EXEC", "EXECUTE"}
BLOCKED_PATTERNS = [
    r";\s*(DROP|DELETE|UPDATE|INSERT|ALTER)",  # chained destructive statements
    r"--",                                      # SQL comments (bypass attempts)
    r"/\*",                                     # block comments
    r"UNION\s+SELECT",                          # UNION injection
]

def safe_query(sql: str) -> str:
    """Execute SQL with multiple safety layers."""
    normalized = sql.strip().upper()

    # Layer 1: Must start with SELECT
    if not normalized.startswith("SELECT"):
        return json.dumps({"error": "Only SELECT queries allowed"})

    # Layer 2: Block dangerous keywords anywhere in the query
    tokens = set(re.findall(r'\b\w+\b', normalized))
    blocked = tokens & BLOCKED_KEYWORDS
    if blocked:
        return json.dumps({"error": f"Blocked keywords: {blocked}"})

    # Layer 3: Block injection patterns
    for pattern in BLOCKED_PATTERNS:
        if re.search(pattern, normalized):
            return json.dumps({"error": "Query contains blocked pattern"})

    # Layer 4: Use a read-only database connection
    with engine.connect() as conn:
        # Set statement timeout to prevent long-running queries
        result = conn.execute(text(sql))
        rows = [dict(row._mapping) for row in result.fetchmany(50)]

    # Layer 5: Redact sensitive columns
    sensitive_fields = {"password", "ssn", "credit_card", "token", "secret"}
    for row in rows:
        for key in list(row.keys()):
            if key.lower() in sensitive_fields:
                row[key] = "***REDACTED***"

    return json.dumps(rows, default=str)

For production, also use a dedicated read-only database user with SELECT-only permissions. Defense in depth — don’t rely on any single layer.

Rate Limiting and Cost Control

Agents can go wild with tool calls. Add limits:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
from collections import defaultdict
import time

class RateLimitedAgent:
    def __init__(self, max_queries_per_minute: int = 20, max_api_calls_per_minute: int = 10):
        self.query_timestamps: list[float] = []
        self.api_timestamps: list[float] = []
        self.max_queries = max_queries_per_minute
        self.max_api_calls = max_api_calls_per_minute

    def check_rate_limit(self, call_type: str) -> bool:
        now = time.time()
        if call_type == "query":
            self.query_timestamps = [t for t in self.query_timestamps if now - t < 60]
            if len(self.query_timestamps) >= self.max_queries:
                return False
            self.query_timestamps.append(now)
        elif call_type == "api":
            self.api_timestamps = [t for t in self.api_timestamps if now - t < 60]
            if len(self.api_timestamps) >= self.max_api_calls:
                return False
            self.api_timestamps.append(now)
        return True

limiter = RateLimitedAgent()

Common Errors and Fixes

Agent writes invalid SQL for your schema

The model doesn’t know your exact column names. Include the full schema with column types in the system prompt. For large schemas (50+ tables), only include tables relevant to the user’s likely questions.

API responses are too large for the context window

Truncate API responses to 3000-5000 characters. Or add a “summarize_response” step that uses a cheap model to extract relevant data from large API payloads before feeding them back to the agent.

Agent calls the same query repeatedly

Add deduplication: track recent queries and return cached results if the same SQL runs within 60 seconds. This also saves database load.

Agent generates destructive SQL despite guardrails

Your keyword-based filter missed a creative bypass. Use parameterized queries where possible, and always run the database connection with a read-only user. The application-level filter is a safety net, not the primary defense.

Timeouts on slow queries

Set a query timeout at both the application level (conn.execute(text(sql).execution_options(timeout=5))) and the database level. Kill queries that run longer than 5 seconds — the agent can reformulate with a simpler query.