llm-router

Local-first LLM router for AI coding tools — routes prompts to the cheapest capable model with automatic provider fallback.

Unverified

stdio (local)

No auth

Python

View repo 33

Add to your client

Copy the config for your MCP client and paste it into its config file.

Install / run

pip install llm-routing && llm-router install

Paste into ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "llm-router": {
      "command": "llm-router",
      "args": [
        "install"
      ]
    }
  }
}

Step-by-step guides: Add to Claude Desktop · Add to Cursor · Add to Windsurf

Before you start

Python 3.10+
An MCP host (Claude Code, Codex CLI, Gemini CLI, VS Code, or Cursor)
Optional provider API keys (e.g. OPENAI_API_KEY, GEMINI_API_KEY, OPENROUTER_API_KEY) or a local Ollama instance (OLLAMA_BASE_URL); works with zero keys on Claude Code Pro/Max

About llm-router

A local-first universal LLM router that intercepts prompts from AI coding tools and routes each one to the cheapest model capable of handling it, with automatic cross-provider fallback. It is installed into MCP hosts (Claude Code, Cursor, VS Code, Codex CLI, Gemini CLI) via its bundled installer, exposes 60 MCP tools (routing, text/media generation, orchestration, filesystem ops, usage/budget admin), and logs cost and estimated savings to local SQLite. Routing aggressiveness is controlled by policies (aggressive / balanced / conservative / cost_aggressive) and enforcement modes (smart / hard / soft / off). Works with zero API keys on Claude Code Pro/Max; add provider keys (OpenAI, Gemini, Ollama, OpenRouter, DeepSeek, Groq, etc.) to widen the routing chain.

Tools & capabilities (21)

llm_route

Route a prompt to the most cost-effective capable model based on the active policy and provider chain.

llm_classify

Classify prompt complexity (heuristic or model-assisted) to drive routing decisions.

llm_auto

Automatic routing entry point used by hook-driven auto-routing.

llm_stream

Stream a routed model response.

llm_query

General-purpose text query routed through the free-first chain (primary manual tool for MCP clients like VS Code/Cursor).

llm_code

Code-oriented generation routed to an appropriate coding model.

llm_analyze

Analysis-oriented text task routed to a capable model.

llm_research

Research-oriented query, optionally using web-grounded providers.

llm_image

Generate images via configured media providers (e.g. fal/Flux, Stability).

llm_video

Generate video via configured media providers (e.g. Kling, Runway Gen-3).

llm_audio

Generate audio / TTS via configured providers (e.g. ElevenLabs).

llm_orchestrate

Run multi-step research/generation pipelines using templates.

llm_pipeline_templates

List available orchestration pipeline templates.

llm_usage

Report token usage and per-model spend from the local usage database.

llm_budget

Inspect and manage budget pressure / spend limits.

llm_health

Check provider connectivity and routing health.

llm_savings

Report estimated savings vs. an all-premium baseline.

llm_fs_find

Find files on the filesystem for routed bulk operations.

llm_fs_edit_many

Bulk-edit multiple files using cheap models.

llm_check_usage

Check subscription usage tracking.

llm_refresh_claude_usage

Refresh tracked Claude subscription usage data.

When to use it

Cut cost on routine AI-coding prompts by routing simple questions to free local (Ollama) or cheap models while reserving premium models for hard tasks
Protect Claude Code 5-hour subscription quota with budget-pressure auto-downgrade or strict zero_claude enforcement
Keep working when a provider is rate-limited or down via automatic cross-provider fallback with circuit breakers
Track token spend and estimated savings locally with usage/savings tools and session-end reports
Generate images, video, and audio through routed media providers from inside your coding tool
Run multi-step research/generation pipelines and bulk file edits with cost-appropriate models

Security notes

Runs locally with no hosted proxy, telemetry, or account. API keys live in local files (.env or ~/.llm-router/config.yaml) and are never transmitted by the router; keys are scrubbed from structured logs. Usage logs are stored as UNENCRYPTED SQLite at ~/.llm-router/usage.db — use full-disk encryption if needed (chmod 700 ~/.llm-router, 600 config.yaml recommended). Prompts are sent to whichever provider the router selects, so review each provider's privacy policy. The router cannot prevent provider-level jailbreaks or prompt injection. Hook scripts are local, inspectable shell scripts in ~/.claude/hooks/.

llm-router FAQ

Do I need API keys to use it?

No. It works with zero API keys on Claude Code Pro/Max subscriptions, where routing uses MCP tools that only call external models when beneficial. Adding provider keys (e.g. OPENROUTER_API_KEY) widens the routing chain and unlocks policies like cost_aggressive.

What is the PyPI package name?

The package is `llm-routing` (`pip install llm-routing`); the CLI command and GitHub repo are named `llm-router`. The older `claude-code-llm-router` package is deprecated and redirects to `llm-routing`.

How do I add it to a host other than Claude Code?

Run `llm-router install` (defaults to Claude Code) or pass a host flag: `--host codex`, `--host gemini-cli`, `--host vscode`, or `--host cursor`. Any MCP client can also use the manual tools such as `llm_query`.

Is my data sent anywhere?

It runs entirely locally with no hosted proxy, telemetry, or account. Your prompts go only to the providers the router selects (exactly as if you used them directly), and API keys stay in local files. Usage logs are stored as unencrypted SQLite under ~/.llm-router/.

How are savings calculated?

Each routed task logs the model, tokens, and estimated cost; savings = (baseline - actual)/baseline, where the baseline assumes the same tokens went to the most expensive model (Opus/Sonnet). Token counts use a len(text)/4 approximation and pricing comes from LiteLLM tables, so figures are estimates (observed range 35–80%).

#llm-router #cost-optimization #model-routing #claude-code #ollama #openrouter #litellm #fallback #local-first

Alternatives to llm-router

Compare all alternatives →

Memory (Knowledge Graph)

AI, Data & Knowledge

74k

Official MCP server providing persistent, file-backed knowledge-graph memory across sessions.

Verified

stdio (local)

No auth

TypeScript

9 tools

Updated 5 months agoRepo

Sequential Thinking

AI, Data & Knowledge

62k

Structured step-by-step reasoning tool for breaking problems into revisable thought sequences.

Verified

stdio (local)

No auth

TypeScript

1 tool

Updated 6 months agoRepo

AWS Knowledge MCP

AI, Data & Knowledge

7.0k

Fully managed remote server for AWS docs, blogs, What's-New and Well-Architected guidance — no key.

Verified

HTTP (remote)

No auth

Hosted

6 tools

Updated 5 months agoRepo

Compare llm-router with:

vs Memory (Knowledge Graph)vs Sequential Thinking vs AWS Knowledge MCP vs MCP Server Chart