The Browser Agent Landscape

By: Sean Cai, Rayan Garg, Tanmay Sharma, Gurvir Singh

Overview

Browser agents refer to AI systems that operate within or on top of web browsers to complete tasks (e.g. filling forms, clicking buttons, scraping data, navigating sites). They can be considered a sub-category within computer use agents, which can more generally interact with software interfaces within and beyond a browser. Many agents also have browser use as one tool in a broader array of capabilities. Browser agents tend to focus specifically on web-based workflows—an important distinction as we consider specialized infrastructure versus general-purpose AI assistants.

In this market map, we treat browser agents and computer-use agents as a unified category— essentially, AI systems that perform actions in software interfaces (web or desktop) in response to high-level user goals. Founders in this space are building on a wave of technical tailwinds to deliver AI co-workers that “orchestrate existing software” rather than replace it. We’re already beginning to see a productivity explosion for most white collar tasks which deal with applications across disparate data sources with little API exposure.

Trends in Browser Agent Infrastructure

1. Rise of Enterprise Agents

As agents become more capable, they’re increasingly expected to access real-time information and take meaningful actions across enterprise systems. Deploying agents in production and enterprise environments often requires much higher reliability than consumer use cases, which generalist agents tend to struggle with.

A vertical focus often improves reliability, since the agent can be fine-tuned on a narrower set of actions and scenarios. This specialization is a natural response to the “jack of all trades, master of none” problem that plagues more general agents. We expect to see many more domain-specific agents emerging—think AI agents for healthcare admin, legal contract review, e-commerce operations, and so on—especially as the foundation models advance and the vertical RL training stack becomes democratized. Over time, we expect these vertical companies to invest heavily in building their own domain-specific fine-tuned models.

Marketing, sales, recruiting, and QA (Astral, Unify, SonicJobs, Spur) are the first industries where vertical browser agents have started to emerge—web navigation is relatively trivial and extensive domain knowledge isn't required. Spur is one example. Spur offers an AI QA agent that automates end-to-end testing of web applications, replacing traditional test scripts. Browser agents are especially well-suited for QA workflows, where test cases are well-defined, interfaces are easy to navigate, and tasks typically involve interacting with just one piece of software. A number of other startups leveraging browser agents for QA have started to crop up as well, including Momentic, Meticulous, and Ranger.

2. The Space Is Still Early

Computer/browser use is a relatively new frontier of model capabilities, but we’ve already seen massive leaps in performance on existing benchmarks alongside new unlocks for real-world workflows. The first major release from a lab, Anthropic’s Claude Computer Use, was only 7 months ago, and we’ve since seen releases like OpenAI’s Operator, ByteDance’s UI-TARS, and Google’s Project Mariner. How fast these models continue to improve is largely a function of how much these labs continue to commit towards this frontier, with the current trajectory only accelerating.

As browser agents execute increasingly long and complex tasks in real-world environments, speed and accuracy have become key metrics on the Pareto frontier. There’s inherent tension: an agent that uses a larger VLM or more inference-time compute might solve a task more accurately but naturally adds higher latency and costs. A faster agent that cuts corners might feel snappier, but could be more prone to errors on complex websites or long tasks. This tradeoff between speed and accuracy has lead to a diverging paradigm: prosumer agents requiring frequent user interaction are leveraged for simpler, day-to-day tasks, while enterprise agents place a greater emphasis on reliability and autonomy.

The space is still early and agents still struggle with completing economically valuable tasks. Looking at CUB, which measures performance on end-to-end workflows across consumer, finance, healthcare, and other domains, no agent was able to achieve a score above 9.23%. Not only do agents struggle with reliably using software like spreadsheets or EHR platforms, they struggle to apply domain-specific knowledge when required. Memory and instruction following, particularly over long sequences of actions, are key weaknesses. They fail to reliably perform repetitive tasks and maintain coherence, and are generally inefficient in their trajectories. These issues only become amplified when tasks require coordination across different software applications and interacting with complex GUIs.

Hi! My name is Sean, and I'm based in SF. I’m a General Catalyst Fellow who is currently investing at Hummingbird Ventures and Weekend Fund and previously built 360° live videoconferencing and across different vertical AI. Refer here to read about the companies above. Feel free to reach out at @SeanZCai.

The Browser Agent Landscape

The Browser Agent Landscape

Overview

Trends in Browser Agent Infrastructure

1. Rise of Enterprise Agents

2. The Space Is Still Early

Keep reading

Introducing CUB: Theta's Computer Use Benchmark