MCP servers for an on-call / SRE agent
The MCP server bundle to build an AI that helps triage incidents from observability data.
3 servers · ~29 tools · Last updated June 17, 2026
TL;DR: When something breaks, the slow part is correlating signals across tools. This bundle gives an agent logs/metrics and error tracking so it can summarize what changed and where — speeding triage while you keep remediation in human hands.
Bottom line: start with Datadog MCP Server (Official Remote) and add the rest as your needs grow. All 3 install together via the merged config below (~29 tools total).
Tool budget: this stack exposes about 29 tools. That's within Cursor's practical ~40-tool ceiling, so all servers can stay enabled together. Check your own config →
What's in the stack
One-click config
All 3 servers merged into a single block — pick your client and paste.
{
"mcpServers": {
"datadog-mcp-server-official-remote": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://mcp.datadoghq.com/api/unstable/mcp-server/mcp"
]
},
"prometheus-mcp-server": {
"command": "docker",
"args": [
"run",
"-i",
"--rm",
"-e",
"PROMETHEUS_URL",
"ghcr.io/pab1it0/prometheus-mcp-server:latest"
],
"env": {
"PROMETHEUS_URL": "<your-prometheus-url>"
}
},
"sentry-mcp-server-remote": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://mcp.sentry.dev/mcp"
]
}
}
}Paste into ~/Library/Application Support/Claude/claude_desktop_config.json and fully restart Claude Desktop. Replace any placeholder keys/tokens with your own.
Capabilities this stack covers
FAQ
Can an AI resolve incidents on its own?
It's best at triage — gathering and correlating signals and summarizing. Keep humans in the loop for any remediation action.
What if I use Grafana or New Relic?
Swap in the matching server — the query-logs-metrics capability lists Grafana, New Relic, Dynatrace, Honeycomb and more.