What You’re Building

An AI agent that doesn’t just generate text — it decides which tools to call, calls them, reads the results, and keeps going until it has the answer. This is the core pattern behind ChatGPT plugins, Claude’s computer use, and every production AI agent.

We’ll use Claude’s native tool use API to build an agent that can check the weather and do math, but the pattern scales to any tool: databases, APIs, file systems, web scrapers.

The Agent Loop

Every tool-using agent follows the same loop:

  1. Send a message to the LLM with available tools
  2. The LLM either responds with text OR requests a tool call
  3. Execute the tool and send the result back
  4. Repeat until the LLM responds with a final text answer
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import anthropic
import json

client = anthropic.Anthropic()

# Define tools the agent can use
tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a city. Use this when the user asks about weather conditions.",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'San Francisco'"
                }
            },
            "required": ["city"]
        }
    },
    {
        "name": "calculate",
        "description": "Evaluate a mathematical expression. Use for any arithmetic or math.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Math expression to evaluate, e.g. '(45 * 12) + 89'"
                }
            },
            "required": ["expression"]
        }
    }
]

Implement the Tools

Each tool is a plain Python function. Keep them simple and focused on one thing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def get_weather(city: str) -> str:
    """Simulated weather lookup. Replace with a real API call."""
    weather_data = {
        "San Francisco": {"temp": 62, "condition": "foggy", "humidity": 78},
        "New York": {"temp": 45, "condition": "cloudy", "humidity": 55},
        "London": {"temp": 48, "condition": "rainy", "humidity": 82},
    }
    data = weather_data.get(city, {"temp": 70, "condition": "unknown", "humidity": 50})
    return json.dumps({"city": city, **data})


def calculate(expression: str) -> str:
    """Safely evaluate a math expression."""
    allowed_chars = set("0123456789+-*/().% ")
    if not all(c in allowed_chars for c in expression):
        return json.dumps({"error": "Invalid characters in expression"})
    try:
        result = eval(expression)  # Safe because we validated chars above
        return json.dumps({"expression": expression, "result": result})
    except Exception as e:
        return json.dumps({"error": str(e)})


# Map tool names to functions
tool_functions = {
    "get_weather": get_weather,
    "calculate": calculate,
}

The Agent Loop

This is the core of any tool-using agent. It calls the LLM, checks if it wants to use a tool, executes the tool, and feeds the result back.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def run_agent(user_message: str, max_iterations: int = 10) -> str:
    """Run the agent loop until we get a final text response."""
    messages = [{"role": "user", "content": user_message}]

    for i in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-5-20250514",
            max_tokens=1024,
            tools=tools,
            messages=messages,
        )

        # Check if the model wants to use tools
        if response.stop_reason == "tool_use":
            # Extract tool calls from the response
            tool_results = []
            assistant_content = response.content

            for block in assistant_content:
                if block.type == "tool_use":
                    tool_name = block.name
                    tool_input = block.input
                    tool_id = block.id

                    # Execute the tool
                    print(f"  [Tool Call] {tool_name}({json.dumps(tool_input)})")
                    result = tool_functions[tool_name](**tool_input)
                    print(f"  [Tool Result] {result}")

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": tool_id,
                        "content": result,
                    })

            # Add the assistant's response and tool results to messages
            messages.append({"role": "assistant", "content": assistant_content})
            messages.append({"role": "user", "content": tool_results})

        else:
            # Model responded with text — we're done
            final_text = ""
            for block in response.content:
                if hasattr(block, "text"):
                    final_text += block.text
            return final_text

    return "Agent reached maximum iterations without a final answer."

Running the Agent

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Simple tool call
answer = run_agent("What's the weather in San Francisco?")
print(answer)
#   [Tool Call] get_weather({"city": "San Francisco"})
#   [Tool Result] {"city": "San Francisco", "temp": 62, "condition": "foggy", "humidity": 78}
# "It's currently 62°F and foggy in San Francisco, with 78% humidity."

# Chained reasoning — the agent calls multiple tools
answer = run_agent(
    "What's the temperature difference between NYC and London? "
    "Give me the answer in Celsius."
)
print(answer)
#   [Tool Call] get_weather({"city": "New York"})
#   [Tool Call] get_weather({"city": "London"})
#   [Tool Call] calculate({"expression": "(45 - 48) * 5 / 9"})
# "New York is 45°F and London is 48°F. The difference is about -1.67°C..."

The agent figured out on its own that it needed to call both weather tools, then use the calculator to convert to Celsius. That’s the power of the agent loop — the LLM decides what tools to call and in what order.

Common Issues

Tool not being called. If the model ignores your tool, the description is probably too vague. Be explicit: “Use this when the user asks about weather conditions” beats “Weather tool.”

Infinite loops. Always set a max_iterations limit. If the agent keeps calling tools without converging, your tool responses might be too ambiguous — return clear, structured data.

Input validation. Never trust tool inputs blindly. The calculate function above validates characters before evaluating. For production agents, use Pydantic models to validate inputs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
from pydantic import BaseModel, validator

class CalculateInput(BaseModel):
    expression: str

    @validator("expression")
    def validate_expression(cls, v):
        allowed = set("0123456789+-*/().% ")
        if not all(c in allowed for c in v):
            raise ValueError("Expression contains invalid characters")
        return v

Tips for Production Agents

  • Log every tool call — you need an audit trail for debugging and compliance
  • Set timeouts on tool execution so a slow API doesn’t hang your agent
  • Use structured outputs (JSON) from tools so the model can parse them reliably
  • Add a “done” tool if your agent needs to explicitly signal completion
  • Test with adversarial inputs — users will try to make your agent call tools it shouldn’t