
Cua Driver
Background computer-use MCP server that drives native macOS, Windows, and Linux desktop apps without stealing focus.
Add to your client
Copy the config for your MCP client and paste it into its config file.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh)"Paste into ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"cua-driver": {
"command": "cua-driver",
"args": [
"mcp"
]
}
}
}Step-by-step guides: Add to Claude Desktop · Add to Cursor · Add to Windsurf
Before you start
- macOS 14 (Sonoma) or newer for the macOS backend (Windows and pre-release Linux backends also supported)
- cua-driver CLI + CuaDriver.app installed via the install.sh / install.ps1 one-liner
- macOS: Accessibility and Screen Recording permissions granted to CuaDriver.app (verify with `cua-driver check_permissions`)
- An MCP client such as Claude Code, Cursor, Codex, or a custom client
About Cua Driver
Cua Driver exposes native-desktop control as MCP tools so coding agents can operate real GUI apps in the background. Register it with Claude Code via claude mcp add --transport stdio cua-driver -- cua-driver mcp. The core workflow is launch/list a window, call get_window_state to snapshot the accessibility tree (and optional screenshot), act by element_index or pixel coordinates (click/type/scroll/hotkey), then re-snapshot to verify. Backgrounded-first design means synthetic input lands without raising the window or moving the user's cursor across Spaces.
Tools & capabilities (33)
get_window_stateSnapshot a window's accessibility tree (Screen-of-Marks markdown with each actionable element tagged [N]) plus an optional screenshot; required before acting by element_index.
get_desktop_stateCapture the overall desktop / multi-window state.
list_appsApp-level discovery: list installed / running applications.
list_windowsList the windows for a given pid.
launch_appLaunch (or focus) an app by bundle id; idempotent and returns the pid plus a windows array.
kill_appTerminate a running application by pid.
bring_to_frontBring an app/window to the foreground when foreground interaction is needed.
clickClick a target by element_index (AX/UIA) or by pixel x/y within a window, without raising it.
double_clickDouble-click a target by element_index or pixel coordinates.
right_clickRight-click (context-menu) a target by element_index or pixel coordinates.
dragPerform a drag gesture between coordinates / elements.
type_textType a string into the focused element / window.
press_keyPress a single key (e.g. Enter/Go to commit a field).
hotkeySend a keyboard shortcut / modifier chord.
set_valueSet an element's value directly (workaround for apps where keyboard commit is a no-op, e.g. minimized Chrome).
scrollScroll within a window or element.
get_accessibility_treeReturn the raw accessibility tree for a window.
get_screen_sizeReturn the screen dimensions.
get_cursor_positionReturn the current cursor position.
move_cursorMove the cursor to a position.
zoomZoom into a region of a window capture for finer inspection.
pageCross-platform paging helper for navigating large captures/content.
screenshotCapture a screenshot; in Claude Code compat mode it requires pid and window_id and captures only that window.
check_permissionsReport whether Accessibility and Screen Recording TCC grants are enabled for CuaDriver.
get_configRead driver configuration (e.g. capture_mode som/vision).
set_configSet driver configuration such as capture_mode or capture_scope.
set_agent_cursor_enabledToggle the on-screen agent cursor overlay.
set_agent_cursor_motionConfigure agent cursor motion/animation.
set_agent_cursor_styleConfigure the agent cursor style.
get_agent_cursor_stateRead the current agent cursor state.
health_reportReturn a driver health/diagnostics report.
check_for_updateRead-only probe for whether a newer Cua Driver release is available.
replay_trajectoryReplay a previously recorded interaction trajectory (recording/replay tools) for demos and regressions.
When to use it
- Give a coding agent (Claude Code, Cursor, Codex) the ability to operate native desktop apps while you keep using your computer.
- Automate GUI tasks in apps without an API — Finder, Numbers, browsers, Electron/Tauri apps — by clicking and typing on the accessibility tree.
- Run background desktop automation that doesn't steal focus or move the cursor across Spaces.
- Record and replay interaction trajectories for demos or regression testing.
Security notes
Runs locally and drives the user's real desktop. On macOS it ships as an .app bundle (com.trycua.driver) and requires Accessibility and Screen Recording TCC grants in System Settings -> Privacy & Security; verify with `cua-driver check_permissions`. Requires macOS 14 (Sonoma) or newer. It can control any granted application, so only run it on machines where you trust the connected agent. No API key or remote auth is used.
Cua Driver FAQ
How do I add it to Claude Code?
Install the driver with the install.sh one-liner, then run `claude mcp add --transport stdio cua-driver -- cua-driver mcp`. For Claude Code's vision/computer-use grounding, register the compat server instead: `claude mcp add --transport stdio cua-computer-use -- cua-driver mcp --claude-code-computer-use-compat`.
Which platforms are supported?
macOS and Windows are supported with the same CLI/MCP server; Linux is available as a pre-release backend while platform testing continues. The macOS backend requires macOS 14 (Sonoma) or newer.
What permissions does it need?
On macOS, CuaDriver.app needs Accessibility and Screen Recording grants in System Settings -> Privacy & Security. Verify both are true with `cua-driver check_permissions`. No API key is required.
Does it steal my mouse or focus?
No. Cua Driver is backgrounded-first: synthetic clicks and keystrokes land on the target window without raising it, warping the cursor, or pulling you across Spaces.
Alternatives to Cua Driver
Compare all alternatives →AI-powered task-management system for AI-driven development that drops into Cursor, Windsurf, Claude Code, and more.
Self-hosted MCP server for Jira and Confluence Cloud and Server/Data Center.
Create, read, and modify Excel workbooks with your AI agent — no Microsoft Excel required.