A research agent takes a question, searches the web for relevant sources, analyzes what it finds, and produces a structured report. You can build one in under 100 lines of Python using LangGraph for the workflow graph and Tavily for web search.
Here’s the quick setup:
1
| pip install langgraph tavily-python langchain-openai langchain-core
|
You’ll need two API keys as environment variables:
1
2
| export OPENAI_API_KEY="sk-..."
export TAVILY_API_KEY="tvly-..."
|
Define the Agent State#
LangGraph agents revolve around a shared state object that flows through every node in the graph. Each node reads from the state, does its work, and returns updates. Define the state with a TypedDict and use Annotated with operator.add for fields that should accumulate values rather than overwrite them.
1
2
3
4
5
6
7
8
9
10
| from typing import TypedDict, Annotated
import operator
class ResearchState(TypedDict):
query: str
search_results: Annotated[list[dict], operator.add]
analysis: str
report: str
search_count: int
|
search_results uses the operator.add reducer so that each search node call appends its results to the list instead of replacing previous ones. The other fields (analysis, report, search_count) get overwritten on each update, which is the default behavior.
Build the Node Functions#
Each node in the graph is a plain Python function that takes the current state and returns a partial state update. You need three nodes: one to search the web, one to analyze the results, and one to write the final report.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
| from tavily import TavilyClient
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage
tavily_client = TavilyClient()
llm = ChatOpenAI(model="gpt-4o", temperature=0)
def search(state: ResearchState) -> dict:
"""Search the web for information about the query."""
response = tavily_client.search(
query=state["query"],
search_depth="advanced",
max_results=5,
include_answer=True,
)
results = [
{
"title": r["title"],
"url": r["url"],
"content": r["content"],
"score": r["score"],
}
for r in response["results"]
]
return {
"search_results": results,
"search_count": state.get("search_count", 0) + 1,
}
def analyze(state: ResearchState) -> dict:
"""Analyze search results and identify key findings."""
sources_text = "\n\n".join(
f"**{r['title']}** ({r['url']})\n{r['content']}"
for r in state["search_results"]
)
response = llm.invoke([
SystemMessage(content=(
"You are a research analyst. Analyze the following search results "
"and extract the key findings, facts, and insights. Be specific and "
"cite which source each finding comes from."
)),
HumanMessage(content=(
f"Research query: {state['query']}\n\n"
f"Search results:\n{sources_text}"
)),
])
return {"analysis": response.content}
def write_report(state: ResearchState) -> dict:
"""Write a structured research report from the analysis."""
source_urls = list({r["url"] for r in state["search_results"]})
sources_list = "\n".join(f"- {url}" for url in source_urls)
response = llm.invoke([
SystemMessage(content=(
"You are a research report writer. Write a clear, structured report "
"based on the analysis provided. Use markdown formatting with sections. "
"Include a Sources section at the end listing all URLs."
)),
HumanMessage(content=(
f"Query: {state['query']}\n\n"
f"Analysis:\n{state['analysis']}\n\n"
f"Source URLs:\n{sources_list}"
)),
])
return {"report": response.content}
|
Each function returns only the fields it wants to update. LangGraph merges those partial updates into the full state automatically.
Wire Up the Graph with Conditional Edges#
Now connect the nodes into a directed graph. The agent searches first, then decides whether it has enough results or needs another round of searching before moving on to analysis and report writing.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
| from langgraph.graph import StateGraph, START, END
def should_continue_searching(state: ResearchState) -> str:
"""Decide whether to search again or move to analysis."""
if state.get("search_count", 0) < 2 and len(state["search_results"]) < 8:
return "search"
return "analyze"
builder = StateGraph(ResearchState)
# Add nodes
builder.add_node("search", search)
builder.add_node("analyze", analyze)
builder.add_node("write_report", write_report)
# Add edges
builder.add_edge(START, "search")
builder.add_conditional_edges(
"search",
should_continue_searching,
{"search": "search", "analyze": "analyze"},
)
builder.add_edge("analyze", "write_report")
builder.add_edge("write_report", END)
# Compile
research_agent = builder.compile()
|
The conditional edge after the search node checks two things: whether the agent has searched fewer than 2 times, and whether it has fewer than 8 results. If both are true, it loops back for another search pass. Otherwise, it moves to analysis. This gives the agent a chance to gather more data without running forever.
Run the Agent#
Invoke the compiled graph with an initial state containing your research query:
1
2
3
4
5
6
7
8
9
| result = research_agent.invoke({
"query": "What are the latest advances in small language models for edge deployment in 2025?",
"search_results": [],
"search_count": 0,
"analysis": "",
"report": "",
})
print(result["report"])
|
The agent will search Tavily twice (because of the conditional loop), analyze all collected results, and produce a markdown-formatted report with source citations.
You can also stream the execution to see each node as it runs:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| for event in research_agent.stream({
"query": "How do mixture-of-experts models reduce inference costs?",
"search_results": [],
"search_count": 0,
"analysis": "",
"report": "",
}):
for node_name, output in event.items():
print(f"--- Node: {node_name} ---")
if "report" in output and output["report"]:
print(output["report"][:200] + "...")
elif "search_results" in output:
print(f"Found {len(output['search_results'])} results")
elif "analysis" in output:
print(f"Analysis length: {len(output['analysis'])} chars")
|
Common Errors and Fixes#
tavily.errors.InvalidAPIKeyError: Invalid API key
Your TAVILY_API_KEY environment variable is missing or wrong. Tavily keys start with tvly-. Double-check by printing it:
1
2
| import os
print(os.environ.get("TAVILY_API_KEY", "NOT SET"))
|
KeyError: 'search_results' when accessing state
You forgot to initialize search_results as an empty list in the input state. Because it uses operator.add, it needs a base list to append to:
1
2
3
4
5
6
7
8
9
10
11
| # Wrong - missing initial value
result = research_agent.invoke({"query": "my question"})
# Right - provide all required fields
result = research_agent.invoke({
"query": "my question",
"search_results": [],
"search_count": 0,
"analysis": "",
"report": "",
})
|
InvalidRequestError: maximum context length exceeded
Your search results are too large for the model’s context window. Fix this by limiting max_results in the Tavily search call or truncating each result’s content:
1
2
3
4
5
6
7
8
9
| results = [
{
"title": r["title"],
"url": r["url"],
"content": r["content"][:500], # Truncate long content
"score": r["score"],
}
for r in response["results"]
]
|
langgraph.errors.InvalidUpdateError on state updates
A node is returning a key that isn’t defined in your ResearchState. Every key in the returned dict must exist in the TypedDict. Check for typos in your return statements.
Graph compiles but hangs indefinitely
Your conditional edge function is always returning the same path, creating an infinite loop. Add a hard limit on search_count and make sure the condition eventually evaluates to the exit path. Print the state values inside should_continue_searching to debug:
1
2
3
4
5
6
7
| def should_continue_searching(state: ResearchState) -> str:
count = state.get("search_count", 0)
num_results = len(state["search_results"])
print(f"Search count: {count}, Results: {num_results}")
if count < 2 and num_results < 8:
return "search"
return "analyze"
|