How to Build a Financial Analysis Agent with LLMs and Market Data

A financial analysis agent pulls live market data, crunches the numbers, and writes up the analysis for you. Instead of tabbing between Yahoo Finance, a spreadsheet, and ChatGPT, you build one system that does all three.

The agent uses yfinance for free market data (no API key), OpenAI for reasoning, and tool calling to glue them together. You ask “How does AAPL look compared to MSFT?” and the agent fetches prices, computes moving averages and RSI, then writes an actual analysis.

Fetching Market Data with yfinance

Start with the tools. Each tool is a plain Python function that the LLM can call by name.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import yfinance as yf
import json
from datetime import datetime, timedelta

def get_stock_price(ticker: str) -> dict:
    """Fetch current price and basic info for a stock."""
    stock = yf.Ticker(ticker)
    info = stock.info
    return {
        "ticker": ticker.upper(),
        "current_price": info.get("currentPrice", info.get("regularMarketPrice")),
        "previous_close": info.get("previousClose"),
        "market_cap": info.get("marketCap"),
        "pe_ratio": info.get("trailingPE"),
        "forward_pe": info.get("forwardPE"),
        "dividend_yield": info.get("dividendYield"),
        "fifty_two_week_high": info.get("fiftyTwoWeekHigh"),
        "fifty_two_week_low": info.get("fiftyTwoWeekLow"),
        "sector": info.get("sector"),
        "currency": info.get("currency", "USD"),
    }

def get_historical_prices(ticker: str, period: str = "6mo") -> dict:
    """Fetch historical closing prices. Period: 1mo, 3mo, 6mo, 1y, 2y, 5y."""
    stock = yf.Ticker(ticker)
    hist = stock.history(period=period)
    prices = [
        {"date": d.strftime("%Y-%m-%d"), "close": round(float(row["Close"]), 2)}
        for d, row in hist.iterrows()
    ]
    return {
        "ticker": ticker.upper(),
        "period": period,
        "data_points": len(prices),
        "prices": prices[-30:],  # last 30 entries to keep token count sane
        "start_price": prices[0]["close"] if prices else None,
        "end_price": prices[-1]["close"] if prices else None,
    }

def get_financials(ticker: str) -> dict:
    """Fetch key financial statements data."""
    stock = yf.Ticker(ticker)
    income = stock.income_stmt
    balance = stock.balance_sheet
    result = {"ticker": ticker.upper()}
    if not income.empty:
        latest = income.iloc[:, 0]
        result["revenue"] = float(latest.get("Total Revenue", 0))
        result["net_income"] = float(latest.get("Net Income", 0))
        result["gross_profit"] = float(latest.get("Gross Profit", 0))
        if result["revenue"] > 0:
            result["profit_margin"] = round(result["net_income"] / result["revenue"] * 100, 2)
    if not balance.empty:
        latest_b = balance.iloc[:, 0]
        result["total_assets"] = float(latest_b.get("Total Assets", 0))
        result["total_debt"] = float(latest_b.get("Total Debt", 0))
    return result

These functions return plain dicts. No classes, no abstractions. The LLM sees the output as JSON and reasons over it.

One thing to watch: yfinance returns numpy types (numpy.float64, numpy.int64) which json.dumps chokes on. We handle that shortly.

Building the Agent with Tool Calling

Wire the tools into OpenAI’s function calling using the tools parameter. Define the schemas, then build the agent loop.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
import json
import numpy as np
from openai import OpenAI

client = OpenAI()  # reads OPENAI_API_KEY from env

# Tool definitions for the API
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_stock_price",
            "description": "Get current price, PE ratio, market cap, and basic info for a stock ticker",
            "parameters": {
                "type": "object",
                "properties": {
                    "ticker": {
                        "type": "string",
                        "description": "Stock ticker symbol, e.g. AAPL, MSFT, GOOGL"
                    }
                },
                "required": ["ticker"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_historical_prices",
            "description": "Get historical closing prices for a stock over a time period",
            "parameters": {
                "type": "object",
                "properties": {
                    "ticker": {
                        "type": "string",
                        "description": "Stock ticker symbol"
                    },
                    "period": {
                        "type": "string",
                        "enum": ["1mo", "3mo", "6mo", "1y", "2y", "5y"],
                        "description": "Time period for historical data"
                    }
                },
                "required": ["ticker"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_financials",
            "description": "Get revenue, net income, profit margin, and balance sheet data",
            "parameters": {
                "type": "object",
                "properties": {
                    "ticker": {
                        "type": "string",
                        "description": "Stock ticker symbol"
                    }
                },
                "required": ["ticker"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculate_technical_indicators",
            "description": "Calculate moving averages and RSI for a stock",
            "parameters": {
                "type": "object",
                "properties": {
                    "ticker": {
                        "type": "string",
                        "description": "Stock ticker symbol"
                    },
                    "period": {
                        "type": "string",
                        "enum": ["3mo", "6mo", "1y"],
                        "description": "Lookback period for calculations"
                    }
                },
                "required": ["ticker"]
            }
        }
    }
]

# Map function names to actual functions
tool_functions = {
    "get_stock_price": get_stock_price,
    "get_historical_prices": get_historical_prices,
    "get_financials": get_financials,
    "calculate_technical_indicators": calculate_technical_indicators,
}


def serialize_result(obj):
    """Handle numpy types that json.dumps can't serialize."""
    if isinstance(obj, (np.integer,)):
        return int(obj)
    if isinstance(obj, (np.floating,)):
        return float(obj)
    if isinstance(obj, np.ndarray):
        return obj.tolist()
    raise TypeError(f"Object of type {type(obj)} is not JSON serializable")


def run_agent(user_query: str) -> str:
    """Run the financial analysis agent with a tool-calling loop."""
    messages = [
        {
            "role": "system",
            "content": (
                "You are a financial analyst agent. Use the available tools to fetch "
                "real market data before making any claims. Always base your analysis "
                "on actual numbers. Compare metrics to sector averages when possible. "
                "Be direct about risks and bullish/bearish signals."
            ),
        },
        {"role": "user", "content": user_query},
    ]

    while True:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )

        msg = response.choices[0].message
        messages.append(msg)

        # If no tool calls, the agent is done
        if not msg.tool_calls:
            return msg.content

        # Execute each tool call and feed results back
        for tool_call in msg.tool_calls:
            fn_name = tool_call.function.name
            fn_args = json.loads(tool_call.function.arguments)

            print(f"  -> Calling {fn_name}({fn_args})")
            result = tool_functions[fn_name](**fn_args)
            result_json = json.dumps(result, default=serialize_result)

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": result_json,
            })


# Run it
analysis = run_agent("Compare AAPL and MSFT. Which looks better right now?")
print(analysis)

The loop is the core pattern. The LLM decides which tools to call (often multiple in one turn via parallel tool calling), you execute them, and feed results back as tool messages. The loop continues until the model responds with plain text instead of tool calls.

Notice tool_choice="auto" – this lets the model decide when it has enough data to stop fetching and start writing. For financial analysis, it typically makes 4-6 tool calls before producing the final report.

Adding Financial Calculations

The agent is smarter when it can compute technical indicators itself rather than asking the LLM to do math. Here’s the calculate_technical_indicators function referenced above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import pandas as pd

def calculate_technical_indicators(ticker: str, period: str = "6mo") -> dict:
    """Calculate SMA, EMA, and RSI for a stock."""
    stock = yf.Ticker(ticker)
    hist = stock.history(period=period)

    if hist.empty:
        return {"error": f"No data found for {ticker}"}

    close = hist["Close"]

    # Simple Moving Averages
    sma_20 = float(close.rolling(window=20).mean().iloc[-1])
    sma_50 = float(close.rolling(window=50).mean().iloc[-1])

    # Exponential Moving Average
    ema_20 = float(close.ewm(span=20, adjust=False).mean().iloc[-1])

    # RSI (14-period)
    delta = close.diff()
    gain = delta.where(delta > 0, 0.0)
    loss = -delta.where(delta < 0, 0.0)
    avg_gain = gain.rolling(window=14).mean().iloc[-1]
    avg_loss = loss.rolling(window=14).mean().iloc[-1]
    rs = avg_gain / avg_loss if avg_loss != 0 else 0
    rsi = 100 - (100 / (1 + rs))

    current_price = float(close.iloc[-1])

    # Price relative to moving averages
    above_sma_20 = current_price > sma_20
    above_sma_50 = current_price > sma_50

    # Volatility (annualized standard deviation of daily returns)
    daily_returns = close.pct_change().dropna()
    volatility = float(daily_returns.std() * (252 ** 0.5) * 100)

    return {
        "ticker": ticker.upper(),
        "current_price": round(current_price, 2),
        "sma_20": round(sma_20, 2),
        "sma_50": round(sma_50, 2),
        "ema_20": round(ema_20, 2),
        "rsi_14": round(float(rsi), 2),
        "above_sma_20": above_sma_20,
        "above_sma_50": above_sma_50,
        "annualized_volatility_pct": round(volatility, 2),
        "signal": "overbought" if rsi > 70 else "oversold" if rsi < 30 else "neutral",
    }

The RSI calculation uses the standard 14-period lookback. Values above 70 signal overbought, below 30 oversold. The agent sees these labels in the tool output and incorporates them into its analysis.

You can extend this pattern easily. Want Bollinger Bands? Add a tool. Want to compare PE ratios across a sector? Add a tool. The LLM figures out when to call what – you just need to give it good descriptions in the tool schema.

A few things to keep in mind for the calculations:

SMA vs EMA: The 20-day SMA lags more than the EMA. The agent gets both so it can discuss crossover signals.
Volatility: We annualize by multiplying daily std by the square root of 252 (trading days per year). This is standard but assumes normal distribution of returns, which stocks don’t perfectly follow.
RSI edge case: If avg_loss is zero (stock went up every day in the window), RSI is 100. The if avg_loss != 0 guard handles this.

Common Errors and Fixes

yfinance returns empty data for a valid ticker. This happens when the ticker got delisted or when Yahoo Finance is having issues. Always check hist.empty before running calculations.

1
2
3
hist = stock.history(period="6mo")
if hist.empty:
    return {"error": f"No historical data available for {ticker}"}

TypeError: Object of type float64 is not JSON serializable. This is the numpy serialization issue. The serialize_result function we wrote handles it, but you can also cast inline:

1
2
3
4
5
# Instead of this:
result["price"] = close.iloc[-1]  # numpy.float64

# Do this:
result["price"] = float(close.iloc[-1])  # plain Python float

Rate limiting from Yahoo Finance. If you hit yfinance hard with many tickers in quick succession, you’ll get HTTPError 429. Add a small delay between calls:

1
2
3
4
5
6
import time

for ticker in ["AAPL", "MSFT", "GOOGL", "AMZN", "META"]:
    result = get_stock_price(ticker)
    print(f"{ticker}: ${result['current_price']}")
    time.sleep(0.5)  # half-second delay between requests

Tool call arguments are invalid JSON. Rarely, the LLM generates malformed JSON in tool_call.function.arguments. Wrap the parse in a try/except:

1
2
3
4
5
6
7
8
9
try:
    fn_args = json.loads(tool_call.function.arguments)
except json.JSONDecodeError:
    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps({"error": "Invalid arguments received"}),
    })
    continue

KeyError on financial statement fields. Not every company reports the same line items. yfinance returns whatever Yahoo Finance has, and the field names can vary. Always use .get() with a default:

1
2
result["revenue"] = float(latest.get("Total Revenue", 0))
result["net_income"] = float(latest.get("Net Income", 0))

A production agent should also handle OpenAI API errors (rate limits, timeouts) with retries. The openai library raises openai.RateLimitError and openai.APITimeoutError which you can catch and retry with exponential backoff.

Fetching Market Data with yfinance#

Building the Agent with Tool Calling#

Adding Financial Calculations#

Common Errors and Fixes#

Related Guides#

About the Author

Fetching Market Data with yfinance

Building the Agent with Tool Calling

Adding Financial Calculations

Common Errors and Fixes

Related Guides