A research agent takes a question, searches the web for relevant sources, analyzes what it finds, and produces a structured report. You can build one in under 100 lines of Python using LangGraph for the workflow graph and Tavily for web search.

Here’s the quick setup:

1
pip install langgraph tavily-python langchain-openai langchain-core

You’ll need two API keys as environment variables:

1
2
export OPENAI_API_KEY="sk-..."
export TAVILY_API_KEY="tvly-..."

Define the Agent State

LangGraph agents revolve around a shared state object that flows through every node in the graph. Each node reads from the state, does its work, and returns updates. Define the state with a TypedDict and use Annotated with operator.add for fields that should accumulate values rather than overwrite them.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from typing import TypedDict, Annotated
import operator


class ResearchState(TypedDict):
    query: str
    search_results: Annotated[list[dict], operator.add]
    analysis: str
    report: str
    search_count: int

search_results uses the operator.add reducer so that each search node call appends its results to the list instead of replacing previous ones. The other fields (analysis, report, search_count) get overwritten on each update, which is the default behavior.

Build the Node Functions

Each node in the graph is a plain Python function that takes the current state and returns a partial state update. You need three nodes: one to search the web, one to analyze the results, and one to write the final report.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
from tavily import TavilyClient
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage

tavily_client = TavilyClient()
llm = ChatOpenAI(model="gpt-4o", temperature=0)


def search(state: ResearchState) -> dict:
    """Search the web for information about the query."""
    response = tavily_client.search(
        query=state["query"],
        search_depth="advanced",
        max_results=5,
        include_answer=True,
    )
    results = [
        {
            "title": r["title"],
            "url": r["url"],
            "content": r["content"],
            "score": r["score"],
        }
        for r in response["results"]
    ]
    return {
        "search_results": results,
        "search_count": state.get("search_count", 0) + 1,
    }


def analyze(state: ResearchState) -> dict:
    """Analyze search results and identify key findings."""
    sources_text = "\n\n".join(
        f"**{r['title']}** ({r['url']})\n{r['content']}"
        for r in state["search_results"]
    )
    response = llm.invoke([
        SystemMessage(content=(
            "You are a research analyst. Analyze the following search results "
            "and extract the key findings, facts, and insights. Be specific and "
            "cite which source each finding comes from."
        )),
        HumanMessage(content=(
            f"Research query: {state['query']}\n\n"
            f"Search results:\n{sources_text}"
        )),
    ])
    return {"analysis": response.content}


def write_report(state: ResearchState) -> dict:
    """Write a structured research report from the analysis."""
    source_urls = list({r["url"] for r in state["search_results"]})
    sources_list = "\n".join(f"- {url}" for url in source_urls)

    response = llm.invoke([
        SystemMessage(content=(
            "You are a research report writer. Write a clear, structured report "
            "based on the analysis provided. Use markdown formatting with sections. "
            "Include a Sources section at the end listing all URLs."
        )),
        HumanMessage(content=(
            f"Query: {state['query']}\n\n"
            f"Analysis:\n{state['analysis']}\n\n"
            f"Source URLs:\n{sources_list}"
        )),
    ])
    return {"report": response.content}

Each function returns only the fields it wants to update. LangGraph merges those partial updates into the full state automatically.

Wire Up the Graph with Conditional Edges

Now connect the nodes into a directed graph. The agent searches first, then decides whether it has enough results or needs another round of searching before moving on to analysis and report writing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
from langgraph.graph import StateGraph, START, END


def should_continue_searching(state: ResearchState) -> str:
    """Decide whether to search again or move to analysis."""
    if state.get("search_count", 0) < 2 and len(state["search_results"]) < 8:
        return "search"
    return "analyze"


builder = StateGraph(ResearchState)

# Add nodes
builder.add_node("search", search)
builder.add_node("analyze", analyze)
builder.add_node("write_report", write_report)

# Add edges
builder.add_edge(START, "search")
builder.add_conditional_edges(
    "search",
    should_continue_searching,
    {"search": "search", "analyze": "analyze"},
)
builder.add_edge("analyze", "write_report")
builder.add_edge("write_report", END)

# Compile
research_agent = builder.compile()

The conditional edge after the search node checks two things: whether the agent has searched fewer than 2 times, and whether it has fewer than 8 results. If both are true, it loops back for another search pass. Otherwise, it moves to analysis. This gives the agent a chance to gather more data without running forever.

Run the Agent

Invoke the compiled graph with an initial state containing your research query:

1
2
3
4
5
6
7
8
9
result = research_agent.invoke({
    "query": "What are the latest advances in small language models for edge deployment in 2025?",
    "search_results": [],
    "search_count": 0,
    "analysis": "",
    "report": "",
})

print(result["report"])

The agent will search Tavily twice (because of the conditional loop), analyze all collected results, and produce a markdown-formatted report with source citations.

You can also stream the execution to see each node as it runs:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
for event in research_agent.stream({
    "query": "How do mixture-of-experts models reduce inference costs?",
    "search_results": [],
    "search_count": 0,
    "analysis": "",
    "report": "",
}):
    for node_name, output in event.items():
        print(f"--- Node: {node_name} ---")
        if "report" in output and output["report"]:
            print(output["report"][:200] + "...")
        elif "search_results" in output:
            print(f"Found {len(output['search_results'])} results")
        elif "analysis" in output:
            print(f"Analysis length: {len(output['analysis'])} chars")

Common Errors and Fixes

tavily.errors.InvalidAPIKeyError: Invalid API key

Your TAVILY_API_KEY environment variable is missing or wrong. Tavily keys start with tvly-. Double-check by printing it:

1
2
import os
print(os.environ.get("TAVILY_API_KEY", "NOT SET"))

KeyError: 'search_results' when accessing state

You forgot to initialize search_results as an empty list in the input state. Because it uses operator.add, it needs a base list to append to:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Wrong - missing initial value
result = research_agent.invoke({"query": "my question"})

# Right - provide all required fields
result = research_agent.invoke({
    "query": "my question",
    "search_results": [],
    "search_count": 0,
    "analysis": "",
    "report": "",
})

InvalidRequestError: maximum context length exceeded

Your search results are too large for the model’s context window. Fix this by limiting max_results in the Tavily search call or truncating each result’s content:

1
2
3
4
5
6
7
8
9
results = [
    {
        "title": r["title"],
        "url": r["url"],
        "content": r["content"][:500],  # Truncate long content
        "score": r["score"],
    }
    for r in response["results"]
]

langgraph.errors.InvalidUpdateError on state updates

A node is returning a key that isn’t defined in your ResearchState. Every key in the returned dict must exist in the TypedDict. Check for typos in your return statements.

Graph compiles but hangs indefinitely

Your conditional edge function is always returning the same path, creating an infinite loop. Add a hard limit on search_count and make sure the condition eventually evaluates to the exit path. Print the state values inside should_continue_searching to debug:

1
2
3
4
5
6
7
def should_continue_searching(state: ResearchState) -> str:
    count = state.get("search_count", 0)
    num_results = len(state["search_results"])
    print(f"Search count: {count}, Results: {num_results}")
    if count < 2 and num_results < 8:
        return "search"
    return "analyze"