The Core Idea
A browser automation agent combines two things: Playwright controls the browser, and an LLM decides what to do next. The agent loop is simple — observe the page, ask the LLM what action to take, execute it, repeat. No hardcoded selectors, no brittle scraping scripts. The LLM reads the page and figures out where to click.
Here’s the full working agent in one shot. We’ll break it down after.
| |
That’s the whole agent. Let’s walk through what makes it work.
Setting Up Playwright
Install the dependencies first:
| |
Playwright’s async API is the right choice here because we’re already in an async context waiting on LLM responses. The async_playwright context manager handles browser lifecycle cleanly — launch a browser, create a page, and the context manager tears everything down when you’re done.
The key Playwright calls you’ll use most:
page.goto(url)— navigate to a URLpage.click(selector)— click an elementpage.fill(selector, text)— type into an inputpage.evaluate(js)— run JavaScript in the browser contextpage.wait_for_load_state()— wait for page to settle after navigation
Extracting Page Content for the LLM
The get_page_content function is where the magic happens. You can’t just dump page.content() (raw HTML) into the LLM — it’s too noisy and eats your token budget. Instead, we extract a structured text representation that tells the LLM what’s on the page and how to interact with it.
The JavaScript in page.evaluate() walks through visible elements and formats them as tagged lines:
[link] "Sign In" (href=/login)— the LLM knows it can click this[input:text] name="email" placeholder="Enter email"— the LLM knows it can type here[button] "Submit"— clickable action element[h1] Product Catalog— structural context
This format gives the LLM enough information to make decisions without overwhelming it. We also cap each element’s text at 200 characters and the total page content at 8,000 characters to stay within reasonable token limits.
The Agent Loop
The agent follows a classic observe-act cycle:
- Observe: Extract the current page content and URL
- Decide: Send the observation to the LLM with available tools
- Act: Execute whatever tool call the LLM returns
- Repeat: Feed the action result back and observe the new page state
OpenAI’s function calling (the tools parameter) is what makes this structured. Instead of parsing free-text responses like “click the login button,” the LLM returns a structured JSON tool call: {"name": "click", "arguments": {"selector": "a:has-text(\"Login\")"}}. No regex parsing, no prompt injection risk from the page content leaking into action parsing.
The tool_choice="auto" setting lets the model decide when to call tools and when to respond with text. When the task is done, the agent calls extract_data to return structured results, then done to signal completion.
Practical Example: Scraping Product Data
Here’s how you’d use the agent to extract structured data from a website:
| |
The agent will load the page, read the book listings, and call extract_data with something like:
| |
No CSS selectors to maintain. No XPath expressions to update when the site changes layout. The LLM reads the page and figures out where the data lives.
Common Errors and Fixes
TimeoutError: page.click: Timeout 5000ms exceeded
The selector doesn’t match any element. Playwright selectors are CSS-based — make sure you’re using the right format. For text-based clicks, use a:has-text("Login") or button:has-text("Submit"). The fallback in execute_action tries page.get_by_text() if the CSS selector fails.
Error: browser.newContext: Browser has been closed
You’re trying to use the browser after the async with block exits. Make sure all your agent logic runs inside the async_playwright() context manager. Don’t return a page object and try to use it later.
openai.RateLimitError: Rate limit reached
Each step in the agent loop makes an API call. With 15 max steps, that’s 15 calls minimum. Add a await asyncio.sleep(1) between steps if you’re hitting rate limits, or use a cheaper model like gpt-4o-mini for simple navigation tasks.
playwright._impl._errors.TargetClosedError
The page navigated away or a popup opened. Use page.wait_for_load_state("domcontentloaded") after clicks that trigger navigation. The execute_action function already does this.
Agent gets stuck in a loop visiting the same pages
Cap your max_steps and add the conversation history to the LLM context so it can see what it already tried. The agent above does this by appending every observation and action to the messages list. If it’s still looping, add a system prompt instruction like “Do not revisit URLs you have already visited.”
page.evaluate returns None for dynamic content
Single-page apps load content asynchronously. Add await page.wait_for_selector("selector-for-expected-content", timeout=10000) before extracting content, or use await page.wait_for_load_state("networkidle") to wait for all network requests to finish (though this can be slow).
Related Guides
- How to Build a Scheduling Agent with Calendar and Email Tools
- How to Build a Debugging Agent with Stack Trace Analysis
- How to Build an Email Triage Agent with LLMs and IMAP
- How to Build a File Management Agent with Filesystem Tools
- How to Build Autonomous Agents with the OpenAI Assistants API
- How to Build a Memory-Augmented Agent with Vector Search
- How to Build a GitHub Issue Triage Agent with LLMs and the GitHub API
- How to Build a Meeting Summarization Agent with Whisper and LLMs
- How to Build a Research Agent with LangGraph and Tavily
- How to Build a Data Pipeline Agent with LLMs and Pandas