
Touchpoint
Give AI agents eyes and hands on any desktop via native accessibility APIs — no vision model required.
Add to your client
Copy the config for your MCP client and paste it into its config file.
pip install touchpoint-pyPaste into ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"touchpoint": {
"command": "touchpoint-mcp",
"args": []
}
}
}Step-by-step guides: Add to Claude Desktop · Add to Cursor · Add to Windsurf
Before you start
- Python 3.10+
- pip install touchpoint-py
- macOS: grant Accessibility permission (System Settings → Privacy & Security → Accessibility)
- Linux: install xdotool (input + minimize_window) and wmctrl (window management); python3-gi and gir1.2-atspi-2.0 if missing
- Windows: none — uses built-in COM/UIA APIs
- Optional: launch Chrome/Electron apps with --remote-debugging-port for CDP support
About Touchpoint
Touchpoint ships an MCP server (touchpoint-mcp) that lets MCP-compatible agents read and control any desktop application through native accessibility APIs — AT-SPI2 (Linux), UI Automation (Windows), and AX (macOS) — plus Chromium/Electron apps via the Chrome DevTools Protocol. It runs over stdio, requires no authentication or API key, and supports both a vision mode (screenshots + element IDs/coordinates) and a no-vision mode (compact structured text snapshots) so even local non-vision models can drive a desktop. Install with pip install touchpoint-py, which bundles the platform backend, CDP support, the MCP server, and screenshot capabilities.
Tools & capabilities (28)
screenshotCapture the full desktop or crop to an app/window/element/monitor (vision mode).
snapshotGet a compact structured text tree of the active window for orientation.
diff_snapshotShow what changed between snapshots (no-vision mode orientation).
appsList application names present in the accessibility tree.
windowsList all windows with id, title, app, position, size, and active state.
findSearch elements by name using 4-stage matching: exact → contains → word → fuzzy.
get_elementFetch a fresh snapshot of a single element by ID (vision mode).
read_textRead the text content of an element.
clickClick an element (or coordinates in vision mode) via accessibility action with coordinate fallback.
set_valueSet text content of an element (replace=True to clear first).
set_numeric_valueSet a slider or spinbox value.
select_textSelect a substring within an element's text content.
focusMove keyboard focus to an element.
actionExecute a raw accessibility action on an element by name.
type_textType text into the currently focused element.
press_keyPress and release a key (e.g. enter, tab, escape).
mouse_moveMove the cursor to coordinates (vision mode).
scrollScroll at the current cursor position.
activate_windowBring a window to the foreground (restores from minimized).
minimize_windowMinimize a window.
fullscreen_windowEnter or exit fullscreen for a window.
close_windowPolitely close a window.
move_windowMove a window to a new screen position (vision mode).
resize_windowResize a window to a given width and height (vision mode).
wait_forPoll until matching elements appear (or disappear).
wait_for_appPoll until an app appears or disappears.
wait_for_windowPoll until a window appears or disappears.
diagnosticsReport backend, input, CDP, timeout, and dependency health.
What this server can do
Touchpoint provides tools for these capabilities — tap one to see every MCP server that does the same:
When to use it
- Let an LLM agent drive native desktop apps (Notes, Finder, System Settings, Excel) without a vision model.
- Automate cross-app workflows — e.g. research data in Chrome, then build a formatted Excel table.
- Control Electron apps like Slack, Discord, and VS Code via merged native + CDP web content.
- Run desktop automation with local/non-vision models using compact structured snapshots.
- Build reliable, fast UI automation that reads the real accessibility tree instead of scraping pixels.
Security notes
The server gives an AI agent full control over the local desktop — reading the accessibility tree and simulating clicks, typing, and window management across all running apps. Grant only to trusted clients. On macOS you must explicitly grant Accessibility permission (System Settings → Privacy & Security → Accessibility). Browser/Electron control requires launching the app with a remote-debugging port. Wayland (without XWayland) cannot simulate input; the accessibility tree and native actions still work.
Touchpoint FAQ
Does Touchpoint require a vision model?
No. It reads the native accessibility tree for structured element data. It offers a no-vision mode using compact text snapshots, plus an optional vision mode with screenshots for frontier models.
Which platforms are supported?
Linux (AT-SPI2 + xdotool), Windows (UIA + SendInput), and macOS (AX + CGEvent), all tested. Chromium and Electron apps are supported cross-platform via CDP.
How do I add it to Claude Desktop?
Add an entry under mcpServers with command "touchpoint-mcp" in claude_desktop_config.json. If you installed into a virtualenv, use the full path to the touchpoint-mcp binary.
How do I control browsers and Electron apps?
Launch the app with --remote-debugging-port (e.g. 9222). CDP auto-discovery is enabled by default, so Touchpoint merges native chrome with full web content automatically.
Is it stable?
It is in Alpha — fully functional and tested on all three platforms, but the API may change before 1.0 based on user feedback.
Alternatives to Touchpoint
Compare all alternatives →Automate your existing browser with AI using your real profile, logged-in sessions, and stealth fingerprint.
All-in-one web access MCP — Web Unlocker, SERP, Scraper API, and a cloud Scraping Browser.
macOS screen capture, AI visual analysis, and full GUI automation for AI agents.
Compare Touchpoint with: