AI coding agents like Claude Code, Cursor, and the rest have gotten remarkably good at reading and writing code. But the moment they need to look at something on the web, they hit a wall. They can't see your staging site. They can't read the error in your analytics dashboard. They can't check whether the form they just built actually submits.

The usual fix is to hand the agent a headless browser — Puppeteer or Playwright driving a fresh Chromium instance. That works, sort of. But a headless Chromium starts every session as a stranger: no logins, no cookies, no sessions. It spins up a second browser engine that pushes your CPU and spins up your fan. And a growing number of sites simply block it on sight.

There's another option, and on a Mac it's a good one: let the agent drive the Safari you already use — the one that's already logged into GitHub, your analytics, your staging environment. That's what Safari MCP does. It's an open-source MCP server that exposes Safari to any MCP-capable agent through around 80 tools, with no Chromium, no WebDriver, and no separate browser to babysit.

In this tutorial you'll connect Safari MCP to an AI agent, run your first automation, and then build something a headless browser fundamentally cannot do: an automation that works inside a page you're logged into. By the end you'll understand not just how to wire this up, but when native browser automation is the right call — and when it isn't.

Here's what you'll need:

  • A Mac (Safari MCP is macOS-only — more on that trade-off later)

  • Node.js 18 or newer

  • An MCP-capable AI agent — this tutorial uses Claude Code and Cursor, but any MCP client works

Table of Contents

What is MCP, and Why Does Browser Automation Need It?

Before wiring anything up, it helps to know what the "MCP" in Safari MCP stands for.

MCP is the Model Context Protocol — an open standard for connecting AI agents to external tools and data. Think of it the way you'd think of a USB port. Before USB, every device needed its own connector. MCP is the equivalent of agreeing on one connector: an agent that speaks MCP can use any tool that speaks MCP, with no custom integration code on either side.

An MCP server exposes a set of tools. An MCP client — your AI agent — discovers those tools and calls them. The server describes each tool (its name, what it does, what arguments it takes) and the agent decides when to call it. When Claude Code decides it needs to read a web page, it doesn't run browser code itself. It calls a tool that some MCP server provides.

Browser automation is a natural fit for this model. The agent's job is reasoning — "I need to see what's on the staging site, then check the console for errors." The actual mechanics — open a tab, wait for load, read the DOM, capture console output — are well-defined operations that belong behind a stable interface. That interface is exactly what an MCP server provides.

Safari MCP is one such server. It runs as a local process, exposes around 80 browser tools (navigate, click, fill, read, screenshot, extract, and more), and any MCP client can drive it. The agent never touches AppleScript or WebKit internals. It just calls safari_navigate and gets a result.

The "USB port" framing matters for a practical reason: nothing in this tutorial is Claude-specific. Wire Safari MCP into Cursor, Cline, Windsurf, or your own MCP client and the tools are identical.

Why Safari Instead of Chrome or Playwright?

If you've automated a browser before, you've almost certainly used Chrome through Puppeteer, Playwright, or Selenium. So why reach for Safari?

It comes down to three differences that matter once an AI agent, not a test script, is the thing driving the browser.

1. It's your real browser, with your real sessions. A headless Chromium launched by Playwright is a clean room. It has never logged into anything. If you want your agent to read your analytics dashboard, you first have to solve authentication — store credentials somewhere, script the login, handle two-factor prompts, refresh tokens. Safari MCP skips all of that. It drives the Safari instance you use every day, which is already logged into your dashboards, your GitHub, your email. The agent inherits those sessions for free.

2. It doesn't melt your laptop. A headless Chromium is a second, full browser engine running alongside the browser you already have open. On a laptop that's real CPU, real memory, and a fan you can hear. Safari MCP uses the WebKit engine that's already running on every Mac — there's no second engine to start. The project measures this at roughly 60% less CPU for the browsing work, and the automation runs with Safari in the background, so it doesn't steal your screen.

3. Sites don't treat it as a bot. Headless browsers leak. They expose navigator.webdriver, they ship with telltale automation fingerprints, and bot-detection services — Cloudflare's challenge pages, reCAPTCHA, the WAFs in front of a lot of B2B sites — have gotten very good at spotting them. Your real Safari, driven through the operating system, looks like exactly what it is: a person's browser. (To be clear: this is for automating your own accounts and sites — not for evading access controls you don't own.)

The cost of all this is the obvious one: Safari MCP is macOS-only. It's built on WebKit and AppleScript, so there's no Windows or Linux story. If your agent runs on a Linux CI box, this isn't your tool. If it runs on your Mac — which, for a coding agent, it very often does — the trade is a good one. We'll come back to limitations honestly at the end.

Installing Safari MCP

Installation is genuinely one command, but there are two Safari settings to flip first. Let's do it in order.

Step 1 — Enable Safari's developer features

Safari MCP reads and controls pages by running JavaScript inside Safari. Two settings have to be on:

  1. Open Safari → Settings → Advanced and check "Show features for web developers." This reveals the Develop menu.

  2. Open the new Develop menu and check "Allow JavaScript from Apple Events."

That second one is the important one. It's what lets an outside process — the MCP server — ask Safari to run JavaScript on a page. Without it, every tool call fails.

Step 2 — Run the server

npx safari-mcp

That's the whole install. npx fetches the package and runs it; there's nothing to build. The first time an agent calls a tool, macOS will pop up a permission prompt — something like "Terminal wants to control Safari." Click OK. That's the standard Automation permission, and you can review it later under System Settings → Privacy & Security → Automation.

If you'd rather have it installed permanently:

npm install -g safari-mcp

Step 3 — Tell your agent about it

Your AI agent needs to know the server exists. For Claude Code, one command does it:

claude mcp add safari -- npx safari-mcp

For Cursor, create .cursor/mcp.json in your project:

{
  "mcpServers": {
    "safari": {
      "command": "npx",
      "args": ["safari-mcp"]
    }
  }
}

The process is the same for every client — Claude Desktop, Cline, Windsurf, Continue, VS Code. You're telling the agent: "there's an MCP server named safari; start it by running npx safari-mcp."

Restart your agent (or reload its MCP servers) and it will connect. In Claude Code you can confirm with the /mcp command, which lists connected servers and their tools. You should see safari with around 80 tools available.

That's it. Your agent now has a browser.

Your First Automation: Reading a Page

Let's prove the wiring works with the simplest possible task: have the agent read a web page.

In your agent, just ask in plain language:

"Use the safari tools to open example.com and tell me what the page says."

Behind that request, the agent makes two tool calls. First it navigates:

{ "tool": "safari_navigate", "arguments": { "url": "https://example.com" } }

Then it reads the content:

{ "tool": "safari_read_page", "arguments": {} }

safari_read_page returns the page's title, URL, and text content with the HTML stripped out — exactly the form an LLM wants. The agent gets back something like this:

Example Domain
https://example.com/
This domain is for use in illustrative examples in documents. You may
use this domain in literature without prior coordination or asking for
permission.

And it relays that to you. You just watched your agent browse.

A quick note on how the agent should look at a page, because it changes everything downstream. safari_read_page is great for "what does this say." But when the agent needs to act — click a button, fill a field — text isn't enough. It needs to know what's actually there and how to target it. For that, the better first move is safari_snapshot:

{ "tool": "safari_snapshot", "arguments": {} }

This returns an accessibility-tree view of the page, where every interactive element has a stable ref ID:

[textbox ref=0_8] "Full Name" value=""
[combobox ref=0_10] "Subject"
[button ref=0_15] "Submit"

Those ref IDs are the agent's reliable handles. CSS selectors break when a page re-renders. A snapshot ref stays valid for the life of the page. Keep that in mind — it's the difference between an automation that works once and one that works every time.

The Payoff: Automating a Logged-in Workflow

Reading example.com is a wiring test. Here's the thing a headless browser genuinely cannot do.

Pick a site you're logged into in Safari right now — your analytics, your project board, your CI dashboard. We'll use GitHub, because every developer has an account and the notifications page is a real, mildly annoying chore. The task: have the agent open your GitHub notifications and summarize what actually needs your attention.

Ask the agent:

"Open my GitHub notifications, read them, and group them into 'needs a reply' versus 'just FYI'."

The agent navigates:

{ "tool": "safari_navigate", "arguments": { "url": "https://github.com/notifications" } }

Stop and notice what didn't happen. No login screen. No OAuth dance. No personal access token in an environment variable. Safari is already authenticated as you, so the agent lands directly on your real notifications. A headless Chromium would have hit a login wall here and stopped.

Notification lists load incrementally, so the agent should wait for content before reading. safari_wait_for polls the page until a selector or piece of text appears, or a timeout elapses:

{ "tool": "safari_wait_for", "arguments": { "text": "Inbox", "timeout": 10000 } }

Then it reads. safari_read_page scoped to the notifications region returns the list as clean text:

{ "tool": "safari_read_page", "arguments": { "selector": "main" } }

The agent reasons over that text and hands you the grouped summary. The whole loop — navigate, wait, read, summarize — is a handful of tool calls.

When you need data in a precise shape rather than prose — to feed another step, or to write to a file — the agent can reach for safari_evaluate, which runs custom JavaScript on the page and returns whatever you build:

{
  "tool": "safari_evaluate",
  "arguments": {
    "expression": "JSON.stringify([...document.querySelectorAll('li')].map(li => li.innerText.trim()))"
  }
}

The agent writes that expression itself, against the structure it just saw in the snapshot — you don't hand-author selectors.

You might be thinking: GitHub has an API, why scrape the page? Fair. For GitHub specifically, the API is excellent. But the point generalizes. Most of the dashboards you stare at every day — your billing portal, your error tracker's specific filtered view, a client's analytics, the admin panel of some tool your company pays for — either have no usable API or would cost you an afternoon of OAuth setup to reach. With Safari MCP, "the page I'm already looking at" is the API. The agent reads what you can see, because it's using the browser you're seeing it in.

That's the capability headless automation can't match. Not speed, not features — access.

Handling the Tricky Parts

A first automation always looks easy. Three things tend to bite on the second one.

Tab Safety — The Agent Must not Hijack Your Tabs

This is the scariest failure mode: you're typing in a tab, the agent navigates that tab, and your work is gone. Safari MCP guards against it by stamping each automation tab with an identity marker — it uses window.name, which survives page navigations — and resolving "the agent's tab" through that marker on every call. If it can't positively identify its own tab, it refuses to act and raises a re-anchor error rather than guessing.

The practical rule for you: let the agent open its own tab with safari_new_tab, and it will stay in its lane. Don't point it at "the current tab" and assume.

Waiting for Dynamic Content

Modern pages render after load. If the agent reads too early, it reads an empty shell. Don't have it guess with fixed sleeps — use safari_wait_for, which polls for a selector or text until it appears or the timeout elapses:

{ "tool": "safari_wait_for", "arguments": { "selector": ".results-list", "timeout": 8000 } }

This is the single most common fix for "the automation works when I step through it slowly but fails when it runs."

Framework Forms

Set a React or Vue input's .value directly and the framework never notices — its internal state stays empty, and your "filled" form submits blank. Safari MCP's safari_fill and safari_fill_form use the native value setters and dispatch the input and change events the framework listens for, so React, Vue, Angular, and Svelte state all stay in sync:

{
  "tool": "safari_fill_form",
  "arguments": {
    "fields": [
      { "selector": "#email", "value": "jane@example.com" },
      { "selector": "#message", "value": "Looks great." }
    ]
  }
}

For framework-heavy pages where CSS selectors are fragile, go back to the snapshot refs from the previous section — pass { "ref": "0_9" } instead of { "selector": "#email" }. Refs survive re-renders; selectors don't.

None of these are exotic. They're just the difference between a demo and an automation you'd actually leave running.

Limitations: When Not to Use This

A tool tutorial that only lists strengths isn't worth much. Here's where Safari MCP is the wrong choice.

It's macOS-only, and that's structural. Safari MCP is built on WebKit and AppleScript. There's no Windows or Linux port coming, because the foundation doesn't exist on those platforms. If your agent runs in Linux CI, use Playwright.

It drives one Safari, on one Mac. This is browser automation for your machine — a coding agent working alongside you. It is not a fleet. If you need 50 parallel browsers scraping in a data center, that's a headless-Chromium-in-containers job, and Safari MCP is the wrong shape for it.

Cross-browser test suites should stay on Playwright. If you're writing end-to-end tests that must pass on Chrome, Firefox, and Safari, use the tool built for that. Safari MCP drives exactly one engine: WebKit.

It shares a browser with you. Because it uses your real Safari, the agent and you are in the same browser. That's the entire point — but it means you should let the agent work in its own tabs and not fight it for the same window.

The honest summary: Safari MCP is built for one specific situation — an AI agent doing real browser work on the Mac you're sitting at, against sites you're already logged into. In that situation it's hard to beat. Outside it, reach for the headless tools. Knowing which situation you're in is the actual skill.

Wrapping Up

You've gone from an AI agent that could only see code to one that can see the web — the real web, behind your real logins.

To recap what you did: you learned what MCP is and why browser automation belongs behind that interface. You saw why a native Safari engine beats a headless Chromium for an agent working on your Mac and you installed Safari MCP with one command and two settings. You ran a first read, and then you did the thing that actually matters — an automation inside a logged-in page, with no auth code at all. Finally, you saw the edges: tab safety, waiting for dynamic content, framework forms, and the cases where you should pick a different tool.

The bigger idea is worth holding onto. An AI agent is only as capable as the tools you connect to it. Giving it a browser — a real one — turns "write me code" into "go look at the staging site, find the bug, and tell me what's wrong." That's a different kind of collaborator.

Safari MCP is open source under the MIT license, and it exposes around 80 tools beyond the handful you used here — screenshots, network inspection, storage, accessibility audits, multi-tab workflows. The repository and full tool reference are at github.com/achiya-automation/safari-mcp. Point your agent at it and see what it does when it can finally look around.