How to Build a ReAct Agent from Scratch with Python

The ReAct Pattern in 30 Seconds

ReAct (Reasoning + Acting) is an agent pattern where the LLM explicitly writes out its reasoning before taking an action. Instead of jumping straight to a tool call, the model produces a Thought, then an Action, then receives an Observation from the environment. This loop repeats until the model has enough information to produce a final answer.

Here’s the skeleton you’ll have running by the end of this post:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import re
import json
import openai

client = openai.OpenAI()

REACT_SYSTEM_PROMPT = """You are a ReAct agent. You solve tasks by repeating this cycle:

Thought: <your reasoning about what to do next>
Action: <tool_name>({"arg": "value"})
Observation: <you will receive the tool result here>

When you have enough information, respond with:
Thought: I now have the answer.
Answer: <your final answer>

Available tools:
- search({"query": "search terms"}) — Search the web for information.
- calculate({"expression": "math expression"}) — Evaluate a math expression.

Rules:
- Always start with a Thought.
- Only call one Action per turn.
- Never make up Observations. Wait for real tool results.
"""

That system prompt is doing the heavy lifting. You’re telling the model the exact format to follow, which tools exist, and when to stop. The rest is parsing and plumbing.

Define Your Tools

Keep tools dead simple. Each one takes a dict of arguments, does one thing, and returns a string.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def search(args: dict) -> str:
    """Simulated web search. Swap in SerpAPI, Tavily, or Brave Search for production."""
    query = args["query"]
    fake_results = {
        "population of France": "As of 2025, France has approximately 68.4 million people.",
        "height of Mount Everest": "Mount Everest is 8,849 meters (29,032 feet) tall.",
        "speed of light": "The speed of light in vacuum is 299,792,458 meters per second.",
    }
    for key, result in fake_results.items():
        if key in query.lower():
            return result
    return f"No results found for: {query}"


def calculate(args: dict) -> str:
    """Evaluate a math expression safely."""
    expr = args["expression"]
    allowed = set("0123456789+-*/().e ")
    if not all(c in allowed for c in expr):
        return f"Error: invalid characters in expression '{expr}'"
    try:
        return str(eval(expr))
    except Exception as e:
        return f"Error: {e}"


TOOLS = {
    "search": search,
    "calculate": calculate,
}

You could wire up any tool here – a database client, an HTTP request, a file reader. The agent doesn’t care what the tool does internally. It only sees the string you return as the Observation.

Parse the LLM Output

The model returns free text with Thought:, Action:, and Answer: markers. You need a parser that extracts these reliably.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
def parse_action(text: str) -> tuple[str, dict] | None:
    """Extract the tool name and arguments from an Action line."""
    match = re.search(r'Action:\s*(\w+)\((\{.*?\})\)', text, re.DOTALL)
    if not match:
        return None
    tool_name = match.group(1)
    try:
        tool_args = json.loads(match.group(2))
    except json.JSONDecodeError:
        return None
    return tool_name, tool_args


def parse_answer(text: str) -> str | None:
    """Extract the final answer if the agent is done."""
    match = re.search(r'Answer:\s*(.+)', text, re.DOTALL)
    if match:
        return match.group(1).strip()
    return None

Regex-based parsing is the right call for a from-scratch implementation. It’s explicit, easy to debug, and you can see exactly what broke when the model drifts off format. Frameworks hide this behind abstractions, which is fine until something goes wrong and you’re staring at a 200-line stack trace from LangChain’s output parser.

The ReAct Loop

This is the core. Send the conversation to the LLM, check if it produced an Action or an Answer, execute the tool if needed, append the Observation, and repeat.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def run_react_agent(question: str, max_steps: int = 8) -> str:
    """Run the full ReAct loop until the agent produces a final answer."""
    messages = [
        {"role": "system", "content": REACT_SYSTEM_PROMPT},
        {"role": "user", "content": question},
    ]

    for step in range(max_steps):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            temperature=0,
        )
        assistant_text = response.choices[0].message.content
        messages.append({"role": "assistant", "content": assistant_text})

        print(f"\n--- Step {step + 1} ---")
        print(assistant_text)

        # Check for a final answer
        answer = parse_answer(assistant_text)
        if answer:
            return answer

        # Check for a tool action
        action = parse_action(assistant_text)
        if action is None:
            # Model didn't produce a valid action or answer -- nudge it
            messages.append({
                "role": "user",
                "content": "You must respond with either an Action or an Answer. Try again.",
            })
            continue

        tool_name, tool_args = action

        if tool_name not in TOOLS:
            observation = f"Error: Unknown tool '{tool_name}'. Available: {list(TOOLS.keys())}"
        else:
            observation = TOOLS[tool_name](tool_args)

        print(f"Observation: {observation}")

        # Feed the observation back as a user message
        messages.append({
            "role": "user",
            "content": f"Observation: {observation}",
        })

    return "Agent failed to reach an answer within the step limit."

A few design choices worth calling out:

temperature=0 keeps the output deterministic. You want the agent to be predictable, not creative.
Observations go as user messages. Some people use system messages for this, but user messages keep the conversation structure clean and work across all LLM APIs.
The nudge message handles the case where the model produces something unparseable. Instead of crashing, you give it another chance.

Running It

1
2
result = run_react_agent("What is the population of France divided by 1000?")
print(f"\nFinal answer: {result}")

Output looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
--- Step 1 ---
Thought: I need to find the population of France first, then divide it by 1000.
Action: search({"query": "population of France"})
Observation: As of 2025, France has approximately 68.4 million people.

--- Step 2 ---
Thought: France has 68.4 million people, which is 68,400,000. I need to divide by 1000.
Action: calculate({"expression": "68400000 / 1000"})
Observation: 68400.0

--- Step 3 ---
Thought: I now have the answer.
Answer: The population of France divided by 1000 is 68,400.

Final answer: The population of France divided by 1000 is 68,400.

Three steps. The agent searched, computed, and answered. No framework needed.

Why Not Just Use LangChain or LangGraph?

Frameworks are great when you need batteries-included features – memory, streaming, complex graph topologies. But for understanding what’s actually happening, building from scratch wins every time.

The ReAct loop above is about 80 lines of code. A LangChain equivalent pulls in dozens of transitive dependencies and wraps the same logic in layers of abstraction. If your agent breaks at 2 AM, you want to understand the 80 lines, not reverse-engineer a framework’s internal state machine.

My recommendation: build your first agent from scratch. Once you’ve internalized the pattern, graduate to a framework when you genuinely need features like persistent memory, human-in-the-loop gates, or multi-agent orchestration.

Common Errors

json.JSONDecodeError when parsing Action arguments. The model sometimes outputs single quotes instead of double quotes in JSON. Fix it by normalizing quotes before parsing:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def parse_action(text: str) -> tuple[str, dict] | None:
    match = re.search(r'Action:\s*(\w+)\((\{.*?\})\)', text, re.DOTALL)
    if not match:
        return None
    tool_name = match.group(1)
    raw_json = match.group(2).replace("'", '"')  # fix single quotes
    try:
        tool_args = json.loads(raw_json)
    except json.JSONDecodeError:
        return None
    return tool_name, tool_args

Agent loops forever without producing an Answer. This happens when the Observation is ambiguous or the model keeps requesting the same tool. Two fixes: lower max_steps to fail fast, and add a check that detects repeated identical actions:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
if len(messages) >= 4:
    prev = messages[-3].get("content", "")
    curr = assistant_text
    if "Action:" in prev and "Action:" in curr:
        if parse_action(prev) == parse_action(curr):
            messages.append({
                "role": "user",
                "content": "You already tried this action. Use the observation you received or provide an Answer.",
            })
            continue

KeyError on tool dispatch. The model invents tool names that don’t exist. Always validate tool_name in TOOLS before calling, and return a helpful error Observation so the model can self-correct. The loop implementation above already handles this.

Model ignores the ReAct format entirely. Some smaller models struggle with the Thought/Action/Observation structure. If you’re using a weaker model, add a one-shot example in the system prompt showing a complete cycle. This dramatically improves format compliance.

The ReAct Pattern in 30 Seconds#

Define Your Tools#

Parse the LLM Output#

The ReAct Loop#

Running It#

Why Not Just Use LangChain or LangGraph?#

Common Errors#

Related Guides#

About the Author