
How to add EvalView to Windsurf
Behavior regression gate for AI agents — snapshot behavior, diff tool calls, and catch silent regressions, exposed to Claude Code over MCP. Paste the config into ~/.codeium/windsurf/mcp_config.json and restart Windsurf.
Last updated June 14, 2026 · 117★ · stdio · no auth
Windsurf config for EvalView
pip install evalview{
"mcpServers": {
"evalview": {
"command": "evalview",
"args": [
"mcp",
"serve"
]
}
}
}Setup steps
- 1Open Windsurf → Cascade → the hammer/MCP icon → Configure (or edit ~/.codeium/windsurf/mcp_config.json).
- 2Paste the EvalView config below.
- 3Fill in placeholder secrets, then save.
- 4Click Refresh in the MCP panel.
- 5EvalView's tools become available to Cascade.
Before you start
- Python with `pip` (install the `evalview` package: `pip install evalview`)
- An AI agent / test suite for EvalView to snapshot and check (run `evalview init` to auto-detect and scaffold one)
- Optional: an LLM provider API key (e.g. OPENAI_API_KEY or ANTHROPIC_API_KEY) only if you enable the semantic-similarity or LLM-as-judge scoring layers
What EvalView can do in Windsurf
create_testCreate an EvalView test case for an agent behavior.
run_snapshotRun tests and save the resulting traces as golden baselines.
run_checkReplay tests, diff against baselines, and report regressions/changes with the ship verdict.
list_testsList the EvalView tests defined in the workspace.
validate_skillValidate a skill (e.g. for Claude Code / Codex / OpenClaw) against EvalView's expectations.
generate_skill_testsGenerate EvalView tests for a skill.
run_skill_testRun an EvalView test against a skill.
generate_visual_reportGenerate a visual (HTML) report of EvalView results.
Security
Your data stays local by default — nothing leaves your machine unless you opt in to cloud sync via `evalview login`. The deterministic tool + sequence diff runs without any API key; semantic similarity and LLM-as-judge layers are optional and require an OpenAI/Anthropic (or other provider) API key when enabled.
EvalView + Windsurf FAQ
Where is the Windsurf config file?
Windsurf reads MCP servers from ~/.codeium/windsurf/mcp_config.json. Paste the EvalView config there under the "mcpServers" key and restart the client.
Is EvalView safe to use with Windsurf?
Your data stays local by default — nothing leaves your machine unless you opt in to cloud sync via `evalview login`. The deterministic tool + sequence diff runs without any API key; semantic similarity and LLM-as-judge layers are optional and require an OpenAI/Anthropic (or other provider) API key when enabled.
How do I connect EvalView to Claude Code?
Install the package with `pip install evalview`, then run `claude mcp add --transport stdio evalview -- evalview mcp serve`. Optionally copy `CLAUDE.md.example` to `CLAUDE.md` to make Claude Code proactively run checks.
What tools does the MCP server expose?
Eight tools: create_test, run_snapshot, run_check, list_tests, validate_skill, generate_skill_tests, run_skill_test, and generate_visual_report.
Does it require an API key?
No for the core regression gate — the deterministic tool + sequence diff and code-based checks run offline with no API key, and your data stays local by default. An LLM provider API key is only needed if you opt into the semantic-similarity or LLM-as-judge scoring layers.