browser-mcp

Web browsing for AI agents, the way screen readers do it.

AI agents can't browse the web reliably. HTML parsing is brittle — sites change structure constantly. Screenshots need vision models, burning context and adding latency. Neither approach is great.

browser-mcp takes a different path: accessibility semantics.

The Accessibility Angle

Screen readers have solved this problem for decades. They don't parse raw HTML or analyze pixels. They use the accessibility tree — landmarks, labels, headings, roles. The structured layer browsers build for assistive technology.

browser-mcp gives your agents the same view. A page becomes:

[main landmark]
  [heading level 1] "Product Dashboard"
  [navigation] 
    [link] "Settings"
    [link] "Profile"
  [region "Recent Activity"]
    [list] 3 items
      [link] "Order #4521 shipped"
      ...

Structured. Predictable. No CSS selectors that break when someone redesigns the header.

How It Works

browser-mcp is an MCP server. Add it to your agent's configuration, and they get tools for:

  • Navigate — Go to URLs, click links, fill forms
  • Read — Get the accessibility tree for any page
  • Interact — Click buttons, select options, submit forms
  • Wait — Handle dynamic content that loads after the initial page

The agent sees the page the way a screen reader user would. Landmarks tell them where they are. Labels tell them what things do. Headings give them structure.

Why It's Better

More reliable. Accessibility semantics are stable. Sites that change their visual design rarely change their accessibility structure — it would break screen reader users.

Less context. A page's accessibility tree is much smaller than its HTML. Your agent gets the structure without the noise.

No vision models. You don't need to burn context on screenshots or add latency for image analysis. The structure is already there.

Works with dynamic sites. SPAs, React apps, sites that load content after the initial render — the accessibility tree updates as the page changes.

Try It

Add to your MCP configuration:

{
  "mcpServers": {
    "browser": {
      "command": "npx",
      "args": ["github:sanity-io/browser-mcp"]
    }
  }
}

Want to see what the agent sees? Add --no-headless to watch the browser in action.

Your agent can now browse the web.

GitHub


Built with Miriad. Start building →