Touchpoint

Give AI agents eyes and hands on any desktop via native accessibility APIs — no vision model required.

Unverified

stdio (local)

No auth

Python

View repo 39

Add to your client

Copy the config for your MCP client and paste it into its config file.

Install / run

pip install touchpoint-py

Paste into ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "touchpoint": {
      "command": "touchpoint-mcp",
      "args": []
    }
  }
}

Step-by-step guides: Add to Claude Desktop · Add to Cursor · Add to Windsurf

Before you start

Python 3.10+
pip install touchpoint-py
macOS: grant Accessibility permission (System Settings → Privacy & Security → Accessibility)
Linux: install xdotool (input + minimize_window) and wmctrl (window management); python3-gi and gir1.2-atspi-2.0 if missing
Windows: none — uses built-in COM/UIA APIs
Optional: launch Chrome/Electron apps with --remote-debugging-port for CDP support

About Touchpoint

Touchpoint ships an MCP server (touchpoint-mcp) that lets MCP-compatible agents read and control any desktop application through native accessibility APIs — AT-SPI2 (Linux), UI Automation (Windows), and AX (macOS) — plus Chromium/Electron apps via the Chrome DevTools Protocol. It runs over stdio, requires no authentication or API key, and supports both a vision mode (screenshots + element IDs/coordinates) and a no-vision mode (compact structured text snapshots) so even local non-vision models can drive a desktop. Install with pip install touchpoint-py, which bundles the platform backend, CDP support, the MCP server, and screenshot capabilities.

Tools & capabilities (28)

screenshot

Capture the full desktop or crop to an app/window/element/monitor (vision mode).

snapshot

Get a compact structured text tree of the active window for orientation.

diff_snapshot

Show what changed between snapshots (no-vision mode orientation).

apps

List application names present in the accessibility tree.

windows

List all windows with id, title, app, position, size, and active state.

find

Search elements by name using 4-stage matching: exact → contains → word → fuzzy.

get_element

Fetch a fresh snapshot of a single element by ID (vision mode).

read_text

Read the text content of an element.

click

Click an element (or coordinates in vision mode) via accessibility action with coordinate fallback.

set_value

Set text content of an element (replace=True to clear first).

set_numeric_value

Set a slider or spinbox value.

select_text

Select a substring within an element's text content.

focus

Move keyboard focus to an element.

action

Execute a raw accessibility action on an element by name.

type_text

Type text into the currently focused element.

press_key

Press and release a key (e.g. enter, tab, escape).

mouse_move

Move the cursor to coordinates (vision mode).

scroll

Scroll at the current cursor position.

activate_window

Bring a window to the foreground (restores from minimized).

minimize_window

Minimize a window.

fullscreen_window

Enter or exit fullscreen for a window.

close_window

Politely close a window.

move_window

Move a window to a new screen position (vision mode).

resize_window

Resize a window to a given width and height (vision mode).

wait_for

Poll until matching elements appear (or disappear).

wait_for_app

Poll until an app appears or disappears.

wait_for_window

Poll until a window appears or disappears.

diagnostics

Report backend, input, CDP, timeout, and dependency health.

What this server can do

Touchpoint provides tools for these capabilities — tap one to see every MCP server that does the same:

Automate a browser

When to use it

Let an LLM agent drive native desktop apps (Notes, Finder, System Settings, Excel) without a vision model.
Automate cross-app workflows — e.g. research data in Chrome, then build a formatted Excel table.
Control Electron apps like Slack, Discord, and VS Code via merged native + CDP web content.
Run desktop automation with local/non-vision models using compact structured snapshots.
Build reliable, fast UI automation that reads the real accessibility tree instead of scraping pixels.

Security notes

The server gives an AI agent full control over the local desktop — reading the accessibility tree and simulating clicks, typing, and window management across all running apps. Grant only to trusted clients. On macOS you must explicitly grant Accessibility permission (System Settings → Privacy & Security → Accessibility). Browser/Electron control requires launching the app with a remote-debugging port. Wayland (without XWayland) cannot simulate input; the accessibility tree and native actions still work.

Touchpoint FAQ

Does Touchpoint require a vision model?

No. It reads the native accessibility tree for structured element data. It offers a no-vision mode using compact text snapshots, plus an optional vision mode with screenshots for frontier models.

Which platforms are supported?

Linux (AT-SPI2 + xdotool), Windows (UIA + SendInput), and macOS (AX + CGEvent), all tested. Chromium and Electron apps are supported cross-platform via CDP.

How do I add it to Claude Desktop?

Add an entry under mcpServers with command "touchpoint-mcp" in claude_desktop_config.json. If you installed into a virtualenv, use the full path to the touchpoint-mcp binary.

How do I control browsers and Electron apps?

Launch the app with --remote-debugging-port (e.g. 9222). CDP auto-discovery is enabled by default, so Touchpoint merges native chrome with full web content automatically.

Is it stable?

It is in Alpha — fully functional and tested on all three platforms, but the API may change before 1.0 based on user feedback.

#desktop-automation #accessibility #computer-use #ui-automation #cross-platform #cdp #no-vision #python

Alternatives to Touchpoint

Compare all alternatives →

Browser MCP

Browser Automation

6.7k

Automate your existing browser with AI using your real profile, logged-in sessions, and stealth fingerprint.

Unverified

stdio (local)

No auth

Stale

TypeScript

13 tools

Updated 1 year agoRepo

Bright Data MCP

Browser Automation

5.0k

All-in-one web access MCP — Web Unlocker, SERP, Scraper API, and a cloud Scraping Browser.

Verified

stdio (local)

API key

JavaScript

12 tools

Updated 26 days agoRepo

Peekaboo

Browser Automation

4.8k

macOS screen capture, AI visual analysis, and full GUI automation for AI agents.

Unverified

stdio (local)

API key

Swift

28 tools

Updated 3 days agoRepo

Compare Touchpoint with:

vs Browser MCP vs Bright Data MCP vs Peekaboo vs Playwright MCP (ExecuteAutomation)