Step 4 economics — Token Max

Cut inference 60-80% with the same agents

Once your monthly Claude API spend crosses ~$1,000, the math flips: routing Claude Code through DeepSeek saves more than the engineering hours to set it up. Same prompts, same agents, fraction of the cost.

Real client savings

Before

$2,400

/ month — Anthropic API

After

$520

/ month — DeepSeek via claude-code-deepseek-backend

Saved

78%

$22.5k/year

Quality drop on internal benchmarks: less than 5%. Quality drop on customer-facing tasks where we kept Claude: 0% (we route per task type).

Routing strategy

High-stakes / customer-facing

Keep on Claude

Models: Claude Sonnet 4.6 / Opus 4.7

Customer emails, contracts, clinical drafts, board summaries

High-volume / internal

Route to DeepSeek

Models: DeepSeek (via claude-code-deepseek-backend)

Code refactors, log triage, audit-log queries, scoring checks, draft outlines

Embeddings + small classification

Route to DeepSeek

Models: Self-hosted bge-large-en-v1.5 + DeepSeek

Chunking the vault, FAQ matching, ticket categorisation

3-step migration

Clone the open-source backend

claude-code-deepseek-backend is a drop-in shim that speaks the Claude API on the inbound and DeepSeek on the outbound. Your existing agent code doesn't change.

Clone + install

git clone https://github.com/chatgptnotes/claude-code-deepseek-backend
cd claude-code-deepseek-backend
npm install

Wire your DeepSeek credentials and a routing rule

Get a DeepSeek API key from platform.deepseek.com. Add it + the routing rule to your env.

.env (the shim)

DEEPSEEK_API_KEY=sk-...
ANTHROPIC_FALLBACK_API_KEY=sk-ant-...   # used for high-stakes paths
ROUTE_RULES=high-stakes->anthropic,internal->deepseek,default->deepseek

Point your agents at the shim instead of api.anthropic.com

In each Edge Function, change the Anthropic SDK base URL from https://api.anthropic.com to your shim's URL. That's it. The shim looks at request metadata, picks the model, returns Claude-shaped responses.

Switch the base URL (one-line change per agent)

const claude = new Anthropic({
  apiKey: Deno.env.get("ANTHROPIC_API_KEY")!,
  baseURL: Deno.env.get("CLAUDE_CODE_DEEPSEEK_URL") ?? "https://api.anthropic.com",
});

Validation gate before flipping anything

Run 100 historical agent inputs through both backends in shadow mode (capture both outputs, ship Claude's).
Diff the outputs. Score against your existing test harness.
Only route to DeepSeek for any task type where the difference is < 5% on the harness.
Keep Claude as the always-on fallback for the routes you flipped — the shim handles this automatically when DeepSeek errors or returns empty.

When to do this

✅ Yes if monthly Claude bill exceeds $1,000 AND you have agents that don't touch customer-facing output.
✅ Yes if you're on the Scale Retainer — the migration is included.
⚠️ Wait if you're still in DRY_RUN on your first agent. Get to live first; optimise the bill later.
❌ No if all your agents are customer-facing AND quality drops > 5% on the harness.

Math it

See your specific savings

ROI calculator

Source

Open-source repo

github.com/chatgptnotes/claude-code-deepseek-backend