How to Build a Data Analysis Agent with Code Execution

The best way to build a data analysis agent is to let it write and execute Python code. You give it a natural language question like “show me the top 10 customers by revenue” and it generates pandas code, runs it in a sandbox, and returns the result. This pattern works better than trying to teach an LLM all of SQL or forcing it to use rigid query builders.

Here’s a working agent that takes a CSV file and answers questions about it with code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
import pandas as pd
from openai import OpenAI
import matplotlib.pyplot as plt
import io
import base64

class DataAnalysisAgent:
    def __init__(self, df, api_key):
        self.df = df
        self.client = OpenAI(api_key=api_key)
        self.conversation_history = []

    def analyze(self, question):
        # Build context about the dataframe
        df_info = f"""
You have access to a pandas DataFrame called 'df' with:
- {len(self.df)} rows
- Columns: {list(self.df.columns)}
- Data types: {self.df.dtypes.to_dict()}
- First few rows:
{self.df.head(3).to_string()}

Write Python code to answer this question: {question}

Rules:
- Use the 'df' variable (already loaded)
- Print your final answer
- For charts, use plt.savefig('/tmp/chart.png') and print "CHART_SAVED"
- Keep it simple and executable
"""

        messages = [
            {"role": "system", "content": "You are a data analyst who writes Python code to answer questions. Return only executable code, no explanations."},
            {"role": "user", "content": df_info}
        ]

        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            temperature=0
        )

        code = response.choices[0].message.content.strip()
        # Remove markdown code fences if present
        if code.startswith("```python"):
            code = code.split("```python")[1].split("```")[0].strip()
        elif code.startswith("```"):
            code = code.split("```")[1].split("```")[0].strip()

        return self.execute_code(code)

    def execute_code(self, code):
        # Simple sandboxing - restrict globals/locals
        namespace = {
            'df': self.df,
            'pd': pd,
            'plt': plt,
            'np': __import__('numpy')
        }

        # Capture stdout
        from io import StringIO
        import sys
        old_stdout = sys.stdout
        sys.stdout = captured_output = StringIO()

        try:
            exec(code, namespace, namespace)
            output = captured_output.getvalue()

            # Check if a chart was created
            chart_data = None
            if "CHART_SAVED" in output:
                with open('/tmp/chart.png', 'rb') as f:
                    chart_data = base64.b64encode(f.read()).decode()

            return {
                "success": True,
                "output": output.replace("CHART_SAVED", "").strip(),
                "chart": chart_data,
                "code": code
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e),
                "code": code
            }
        finally:
            sys.stdout = old_stdout

# Usage
df = pd.read_csv('sales_data.csv')
agent = DataAnalysisAgent(df, api_key="your-openai-key")

result = agent.analyze("What are the top 5 products by total revenue?")
print(result['output'])

This gives you a working agent in under 100 lines. The key insight is that we pass the dataframe schema to the LLM, let it generate code, then execute it with restricted globals to prevent filesystem access or network calls.

Why Code Execution Beats Query Generation

I’ve tried both approaches. Query generation (making the LLM write SQL) fails when you have complex joins or need pandas-specific operations like pivots. Code execution handles everything: data cleaning, statistical analysis, machine learning, and visualization.

The LLM can use any library you provide in the namespace. Want to do sentiment analysis? Add TextBlob to the allowed imports. Need time series forecasting? Give it statsmodels. This flexibility is impossible with SQL or query builders.

Building a Production-Grade Sandbox

The example above uses a basic sandbox with restricted globals. For production, use Docker or a process isolation library:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import docker
import tempfile
import json

class DockerSandbox:
    def __init__(self, df):
        self.client = docker.from_env()
        self.df = df

    def execute_code(self, code):
        # Write dataframe and code to temp directory
        with tempfile.TemporaryDirectory() as tmpdir:
            # Save dataframe
            df_path = f"{tmpdir}/data.parquet"
            self.df.to_parquet(df_path)

            # Create execution script
            script = f"""
import pandas as pd
import matplotlib.pyplot as plt
import sys
import json

df = pd.read_parquet('/workspace/data.parquet')

try:
    {code}
    result = {{"success": True, "output": "Code executed successfully"}}
except Exception as e:
    result = {{"success": False, "error": str(e)}}

print(json.dumps(result))
"""

            script_path = f"{tmpdir}/run.py"
            with open(script_path, 'w') as f:
                f.write(script)

            # Run in container with resource limits
            container = self.client.containers.run(
                "python:3.11-slim",
                "python /workspace/run.py",
                volumes={tmpdir: {'bind': '/workspace', 'mode': 'rw'}},
                network_mode='none',  # No network access
                mem_limit='512m',     # 512MB RAM limit
                cpu_period=100000,
                cpu_quota=50000,      # 50% CPU
                remove=True,
                detach=False
            )

            return json.loads(container.decode())

# Usage
sandbox = DockerSandbox(df)
result = sandbox.execute_code("print(df.groupby('category')['revenue'].sum())")

This approach gives you real isolation. The container has no network access, limited memory, and runs as a non-root user. If the LLM generates malicious code, it can’t escape the container or consume unlimited resources.

The downside is performance. Spinning up a Docker container adds 1-2 seconds per query. For interactive analysis, that’s too slow. Use the simple namespace sandbox for development and Docker for untrusted code.

Connecting to Databases Instead of DataFrames

For large datasets, loading everything into a dataframe kills performance. Connect the agent directly to your database:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import sqlalchemy as sa
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase
from langchain_openai import ChatOpenAI

# Connect to database
engine = sa.create_engine('postgresql://user:pass@localhost/sales')
db = SQLDatabase(engine)

# Create agent with SQL toolkit
llm = ChatOpenAI(model="gpt-4o", temperature=0)
toolkit = SQLDatabaseToolkit(db=db, llm=llm)

agent = create_sql_agent(
    llm=llm,
    toolkit=toolkit,
    verbose=True,
    agent_type="openai-tools"
)

# Ask questions in natural language
response = agent.run("Show me monthly revenue trends for 2025")
print(response)

LangChain’s SQL agent writes queries, executes them, and interprets results. It can handle follow-up questions and learns from errors. If a query fails, it reads the error message and tries again with a corrected query.

The agent has built-in tools for listing tables, describing schemas, and running queries with row limits. This prevents the LLM from writing SELECT * FROM giant_table and crashing your database.

Handling Charts and Visualizations

When the agent generates a chart, you need to return it to the user. The simplest approach is to save the chart to a file and encode it as base64:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
def analyze_with_chart(self, question):
    # ... (same as before until code execution)

    # After executing code
    chart_path = '/tmp/chart.png'
    if os.path.exists(chart_path):
        with open(chart_path, 'rb') as f:
            img_data = base64.b64encode(f.read()).decode()
            return {
                "output": output,
                "chart": f"data:image/png;base64,{img_data}"
            }
    return {"output": output}

On the frontend, display the chart with an <img> tag:

1
<img src="{result.chart}" alt="Generated chart" />

For interactive charts, use Plotly instead of matplotlib. The agent can generate HTML with embedded JavaScript:

1
2
3
4
5
6
7
8
9
import plotly.express as px

# In your namespace
namespace['px'] = px

# LLM generates:
fig = px.bar(df, x='product', y='revenue')
fig.write_html('/tmp/chart.html')
print("CHART_SAVED")

Then return the HTML as a string and inject it into an iframe. This gives you hover tooltips, zooming, and legends without extra work.

Common Errors and Fixes

Error: “NameError: name ‘df’ is not defined”

The LLM sometimes forgets to use the df variable you provided. Fix this by being explicit in your prompt:

1
2
3
4
5
6
df_info = f"""
IMPORTANT: Use the variable 'df' which contains:
{self.df.head(3).to_string()}

Do not load data from files or URLs. The df variable is already loaded.
"""

Error: “MemoryError” or agent times out

The LLM generated code that loads too much data into memory. Add a row limit to your dataframe before passing it:

1
self.df = df.head(10000)  # Limit to 10k rows for analysis

For larger datasets, use database queries instead of loading into pandas.

Error: Chart doesn’t appear or is corrupted

The agent printed output before saving the chart, which breaks the CHART_SAVED detection. Fix by checking if the file exists rather than looking for a string:

1
2
3
chart_path = '/tmp/chart.png'
if os.path.exists(chart_path) and os.path.getsize(chart_path) > 0:
    # Chart was created

Error: “ModuleNotFoundError” for pandas/numpy

Your sandbox doesn’t include required libraries. Install them in the Docker image or add to the namespace:

1
2
# Dockerfile
RUN pip install pandas numpy matplotlib seaborn plotly scikit-learn

When to Use OpenAI’s Code Interpreter vs Building Your Own

OpenAI’s Assistants API has a built-in code interpreter that handles sandboxing and file uploads. Use it when:

You want something that works out of the box
You don’t need custom libraries beyond pandas/numpy/matplotlib
You’re okay with OpenAI seeing your data

Build your own when:

You need custom libraries (database connectors, ML models, proprietary tools)
You want to keep data on-premises
You need finer control over execution timeouts and resource limits
You want to inspect or log the generated code before running it

I recommend starting with the simple namespace sandbox above, then upgrading to Docker if you need isolation. OpenAI’s code interpreter is convenient but less flexible.

The code execution pattern unlocks powerful data analysis capabilities. Your agent can answer questions that would take hours to manually code, and it learns from each interaction. Once you add memory and conversation history, it becomes a true data science copilot.

Why Code Execution Beats Query Generation#

Building a Production-Grade Sandbox#

Connecting to Databases Instead of DataFrames#

Handling Charts and Visualizations#

Common Errors and Fixes#

When to Use OpenAI’s Code Interpreter vs Building Your Own#

Related Guides#

About the Author