MCP Directory

How to add EvalView to Claude Desktop

Behavior regression gate for AI agents — snapshot behavior, diff tool calls, and catch silent regressions, exposed to Claude Code over MCP. Paste the config into ~/Library/Application Support/Claude/claude_desktop_config.json and restart Claude Desktop.

Last updated June 14, 2026 · 117 · stdio · no auth

Claude Desktop config for EvalView

pip install evalview
{
  "mcpServers": {
    "evalview": {
      "command": "evalview",
      "args": [
        "mcp",
        "serve"
      ]
    }
  }
}

Setup steps

  1. 1Open Claude Desktop → Settings → Developer → Edit Config (this opens ~/Library/Application Support/Claude/claude_desktop_config.json).
  2. 2Paste the EvalView config below under the top-level "mcpServers" key.
  3. 3Fill in any placeholder secrets (API keys, paths) in the snippet.
  4. 4Save the file, then fully quit and reopen Claude Desktop.
  5. 5Open a chat and confirm EvalView's tools appear under the 🔌 tools menu.

Before you start

  • Python with `pip` (install the `evalview` package: `pip install evalview`)
  • An AI agent / test suite for EvalView to snapshot and check (run `evalview init` to auto-detect and scaffold one)
  • Optional: an LLM provider API key (e.g. OPENAI_API_KEY or ANTHROPIC_API_KEY) only if you enable the semantic-similarity or LLM-as-judge scoring layers

What EvalView can do in Claude Desktop

create_test

Create an EvalView test case for an agent behavior.

run_snapshot

Run tests and save the resulting traces as golden baselines.

run_check

Replay tests, diff against baselines, and report regressions/changes with the ship verdict.

list_tests

List the EvalView tests defined in the workspace.

validate_skill

Validate a skill (e.g. for Claude Code / Codex / OpenClaw) against EvalView's expectations.

generate_skill_tests

Generate EvalView tests for a skill.

run_skill_test

Run an EvalView test against a skill.

generate_visual_report

Generate a visual (HTML) report of EvalView results.

Security

Your data stays local by default — nothing leaves your machine unless you opt in to cloud sync via `evalview login`. The deterministic tool + sequence diff runs without any API key; semantic similarity and LLM-as-judge layers are optional and require an OpenAI/Anthropic (or other provider) API key when enabled.

EvalView + Claude Desktop FAQ

Where is the Claude Desktop config file?

Claude Desktop reads MCP servers from ~/Library/Application Support/Claude/claude_desktop_config.json. Paste the EvalView config there under the "mcpServers" key and restart the client.

Is EvalView safe to use with Claude Desktop?

Your data stays local by default — nothing leaves your machine unless you opt in to cloud sync via `evalview login`. The deterministic tool + sequence diff runs without any API key; semantic similarity and LLM-as-judge layers are optional and require an OpenAI/Anthropic (or other provider) API key when enabled.

How do I connect EvalView to Claude Code?

Install the package with `pip install evalview`, then run `claude mcp add --transport stdio evalview -- evalview mcp serve`. Optionally copy `CLAUDE.md.example` to `CLAUDE.md` to make Claude Code proactively run checks.

What tools does the MCP server expose?

Eight tools: create_test, run_snapshot, run_check, list_tests, validate_skill, generate_skill_tests, run_skill_test, and generate_visual_report.

Does it require an API key?

No for the core regression gate — the deterministic tool + sequence diff and code-based checks run offline with no API key, and your data stays local by default. An LLM provider API key is only needed if you opt into the semantic-similarity or LLM-as-judge scoring layers.

View repo Full EvalView page