The best way to build a data analysis agent is to let it write and execute Python code. You give it a natural language question like “show me the top 10 customers by revenue” and it generates pandas code, runs it in a sandbox, and returns the result. This pattern works better than trying to teach an LLM all of SQL or forcing it to use rigid query builders.
Here’s a working agent that takes a CSV file and answers questions about it with code:
| |
This gives you a working agent in under 100 lines. The key insight is that we pass the dataframe schema to the LLM, let it generate code, then execute it with restricted globals to prevent filesystem access or network calls.
Why Code Execution Beats Query Generation
I’ve tried both approaches. Query generation (making the LLM write SQL) fails when you have complex joins or need pandas-specific operations like pivots. Code execution handles everything: data cleaning, statistical analysis, machine learning, and visualization.
The LLM can use any library you provide in the namespace. Want to do sentiment analysis? Add TextBlob to the allowed imports. Need time series forecasting? Give it statsmodels. This flexibility is impossible with SQL or query builders.
Building a Production-Grade Sandbox
The example above uses a basic sandbox with restricted globals. For production, use Docker or a process isolation library:
| |
This approach gives you real isolation. The container has no network access, limited memory, and runs as a non-root user. If the LLM generates malicious code, it can’t escape the container or consume unlimited resources.
The downside is performance. Spinning up a Docker container adds 1-2 seconds per query. For interactive analysis, that’s too slow. Use the simple namespace sandbox for development and Docker for untrusted code.
Connecting to Databases Instead of DataFrames
For large datasets, loading everything into a dataframe kills performance. Connect the agent directly to your database:
| |
LangChain’s SQL agent writes queries, executes them, and interprets results. It can handle follow-up questions and learns from errors. If a query fails, it reads the error message and tries again with a corrected query.
The agent has built-in tools for listing tables, describing schemas, and running queries with row limits. This prevents the LLM from writing SELECT * FROM giant_table and crashing your database.
Handling Charts and Visualizations
When the agent generates a chart, you need to return it to the user. The simplest approach is to save the chart to a file and encode it as base64:
| |
On the frontend, display the chart with an <img> tag:
| |
For interactive charts, use Plotly instead of matplotlib. The agent can generate HTML with embedded JavaScript:
| |
Then return the HTML as a string and inject it into an iframe. This gives you hover tooltips, zooming, and legends without extra work.
Common Errors and Fixes
Error: “NameError: name ‘df’ is not defined”
The LLM sometimes forgets to use the df variable you provided. Fix this by being explicit in your prompt:
| |
Error: “MemoryError” or agent times out
The LLM generated code that loads too much data into memory. Add a row limit to your dataframe before passing it:
| |
For larger datasets, use database queries instead of loading into pandas.
Error: Chart doesn’t appear or is corrupted
The agent printed output before saving the chart, which breaks the CHART_SAVED detection. Fix by checking if the file exists rather than looking for a string:
| |
Error: “ModuleNotFoundError” for pandas/numpy
Your sandbox doesn’t include required libraries. Install them in the Docker image or add to the namespace:
| |
When to Use OpenAI’s Code Interpreter vs Building Your Own
OpenAI’s Assistants API has a built-in code interpreter that handles sandboxing and file uploads. Use it when:
- You want something that works out of the box
- You don’t need custom libraries beyond pandas/numpy/matplotlib
- You’re okay with OpenAI seeing your data
Build your own when:
- You need custom libraries (database connectors, ML models, proprietary tools)
- You want to keep data on-premises
- You need finer control over execution timeouts and resource limits
- You want to inspect or log the generated code before running it
I recommend starting with the simple namespace sandbox above, then upgrading to Docker if you need isolation. OpenAI’s code interpreter is convenient but less flexible.
The code execution pattern unlocks powerful data analysis capabilities. Your agent can answer questions that would take hours to manually code, and it learns from each interaction. Once you add memory and conversation history, it becomes a true data science copilot.
Related Guides
- How to Build a GitHub Issue Triage Agent with LLMs and the GitHub API
- How to Build a Data Pipeline Agent with LLMs and Pandas
- How to Build a Planning Agent with Task Decomposition
- How to Build Agent Workflows with Microsoft AutoGen
- How to Build Autonomous Agents with the OpenAI Assistants API
- How to Build Agents with LangGraph
- How to Build a Retrieval Agent with Tool Calling and Reranking
- How to Build a Debugging Agent with Stack Trace Analysis
- How to Build a Multi-Agent Pipeline Using Anthropic’s Agent SDK and MCP
- How to Build a Research Agent with LangGraph and Tavily