Cloudflare Code Mode MCP: stop paying 1M tokens to describe your tools
A 49-tool MCP server burns about 29,000 input tokens before your user types a single character. A 2,500-tool server, which is roughly what Cloudflare ships internally, burns 1.17 million. That is the full input window of Claude Opus spent describing tools, not solving the user's problem. Every turn pays the bill again. Every retry pays it again. At scale, the line item for "tool definitions" outruns the line item for "actual work."
In April 2026 Cloudflare shipped Code Mode MCP, a pattern that collapses that 1.17 million token footprint to around 1,000 tokens, a 99.9% cut. The trick is simple: stop describing tools to the model. Give the model a typed API and a sandbox, and let it write the code that calls the tools. Here is why the classic pattern leaks tokens, how Code Mode fixes it, and when you should bother switching.
The 1.17M-token problem
Classic MCP sends tool definitions as part of the system context on every request. Each tool carries a name, a description, an input schema, and often an output schema. A compact example for a weather lookup tool looks like this:
{
"name": "weather.lookup",
"description": "Return the current weather for a city. Use this when the user asks about temperature, conditions, humidity, or wind.",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, for example San Francisco"
},
"units": {
"type": "string",
"enum": ["metric", "imperial"],
"description": "Unit system for the response"
}
},
"required": ["city"]
},
"output_schema": {
"type": "object",
"properties": {
"temp_c": { "type": "number" },
"temp_f": { "type": "number" },
"conditions": { "type": "string" },
"humidity": { "type": "number" },
"wind_kph": { "type": "number" }
}
}
} That one schema runs about 600 tokens once you count the JSON structural overhead, the descriptions the model needs to pick the tool, and the enum values. Multiply by 49 curated tools on botoi's MCP server and you land at roughly 29,400 tokens per turn. A 10-turn conversation pays that 10 times, because the model has no memory between turns and the orchestrator ships the whole bundle every time. Scale the tool count to Cloudflare's full internal API surface (about 2,500 endpoints) and the per-turn cost hits 1.17 million tokens, which overflows even the 1M-token Opus window.
Count your own footprint in a single call. Botoi's token counter accepts any string; feed it a tool schema and you get the exact Anthropic token count:
curl -X POST https://api.botoi.com/v1/token/count \
-H "Content-Type: application/json" \
-d '{
"text": "{\"name\":\"weather.lookup\",\"description\":\"Return the current weather for a city. Use this when the user asks about temperature, conditions, humidity, or wind.\",\"input_schema\":{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\",\"description\":\"City name, for example San Francisco\"},\"units\":{\"type\":\"string\",\"enum\":[\"metric\",\"imperial\"]}},\"required\":[\"city\"]}}",
"model": "claude"
}' How Code Mode MCP flips the pattern
Humans do not read API schemas before every call. You read the docs once, open an editor, and write code that imports functions. The runtime handles dispatch. Code Mode gives the model the same setup.
The agent runs inside a V8 isolate (Cloudflare's Workers sandbox). MCP tools show up as typed functions on an imported object. The model sees a TypeScript type declaration, not a JSON schema broadcast. When the user asks "what's the air quality where I live," the model writes a short program:
// The agent writes this. The runtime compiles and executes it.
// Only the two functions it calls ever hit the wire.
import { botoi } from "@botoi/mcp";
export async function run(input: { city: string }) {
const weather = await botoi.weather.current({ city: input.city });
const air = await botoi.airQuality.check({
lat: weather.lat,
lon: weather.lon,
});
return {
city: input.city,
temp: weather.temp_c,
aqi: air.aqi,
advice: air.aqi > 100 ? "stay inside" : "go for a walk",
};
}
The runtime compiles the snippet, runs it inside the isolate, and only the two functions it
actually calls (botoi.weather.current and botoi.airQuality.check)
touch the network. The model never saw the schema for the other 47 tools, because it never
needed to. The type file sits on disk once and informs the compiler, not the context window.
Code Mode is closer to how you'd write a script against an SDK than how you'd drive a form. The model's output is code, the runtime's job is to execute code safely, and the network cost maps to real calls instead of hypothetical ones.
The math on botoi's 49-tool server
Botoi's MCP server exposes 49 curated tools across lookup, text, developer, image, and security categories. The table below compares classic MCP against Code Mode for a typical workload: 10-turn conversations, 10,000 conversations per month, Opus input pricing.
| Metric | Classic MCP | Code Mode MCP |
|---|---|---|
| Tokens per turn (tool descriptions) | 29,400 | 0 (type file loaded once) |
| Cold-start type-surface load | 0 | ~1,000 tokens |
| 10-turn conversation cost in descriptions | 294,000 tokens | 1,000 tokens |
| Primary failure mode | Model picks wrong tool | Generated code throws at runtime |
| Debuggability | Tool-call trace | Stack trace plus tool-call trace |
| Best-fit use case | <10 tools, desktop clients | 50+ tools, multi-step workflows |
| Added latency | None | 10-50ms compile + isolate startup |
At Opus input rates (roughly $15 per million tokens), the classic pattern costs about $0.44 per 10-turn conversation in tool-description tokens alone. Code Mode drops that to fractions of a cent. Across 10,000 conversations a month, you save around $4,400 and reclaim 2.9 billion tokens of context budget for the work that matters.
Measure your own server today before you commit to either pattern:
# One-shot: fetch botoi's MCP manifest, pipe it into the token counter
curl -s https://api.botoi.com/v1/mcp/tools.json \
| jq -c . \
| jq -Rs '{text: ., model: "claude"}' \
| curl -s -X POST https://api.botoi.com/v1/token/count \
-H "Content-Type: application/json" \
-d @- \
| jq '.data.tokens' When Code Mode is worth it, when it isn't
Code Mode is not free. The sandbox adds 10 to 50 milliseconds of compile and isolate startup per turn. Generated code can throw, which means you need retry logic and a fallback path. Debugging shifts from "the model picked the wrong tool" to "the model wrote code that referenced an undefined symbol." Your observability stack needs to capture both the source code and the tool calls it triggered.
Stick with classic MCP when:
- You expose fewer than 10 tools and the schema footprint is under 6,000 tokens.
- Your target client is Claude Desktop, Cursor, or VS Code (they only speak classic MCP).
- The agent loop is single-shot: one user message, one tool call, one response.
- Latency budgets are tight and you cannot spend the 10-50ms compile overhead.
Switch to Code Mode when:
- You expose 50 or more tools, or your schema footprint crosses 15,000 tokens.
- Workflows chain 3+ tool calls, because Code Mode avoids re-describing tools on every hop.
- You own the runtime (Cloudflare Agents, Mastra, LangGraph) and can compile agent output.
- The Anthropic bill's largest line item reads "system input tokens."
A migration path without rewriting your server
You don't have to pick one pattern. Most teams should run both and route clients by capability. Here is a three-step path that avoids rewriting your MCP server:
Step 1: measure. Fetch your MCP tool manifest and run it through the token counter. If you cross 15,000 tokens, Code Mode will pay off. If you are under 6,000, skip the rest of this post.
Step 2: expose a typed surface alongside MCP. You already have an OpenAPI
spec if you run an HTTP API. Generate TypeScript types from it (botoi's SDK does this; see
packages/sdk-typescript) and host the resulting .d.ts file at a
stable URL. Code Mode runtimes fetch this file once per session and use it as the import
target. Your MCP endpoint keeps serving classic clients unchanged.
Step 3: route by client. Claude Desktop, Cursor, and VS Code continue to hit
/mcp and receive classic tool schemas. Agent frameworks (Cloudflare Agents,
Mastra, LangGraph) hit a new /code-mode route that returns the type definition and
a runtime handle. Same server, same business logic, two protocols.
Botoi ships both shapes today. The classic MCP endpoint at api.botoi.com/mcp serves 49 curated tools with full JSON schemas for desktop clients. The typed SDK at api.botoi.com/docs gives agent frameworks a single-file import surface. Free tier (5 req/min, no key) covers exploration; developer tier (1,000 req/day with a free key) covers production agent loops. If the Anthropic bill's biggest line item is tool descriptions, switching pays for itself in the first week.
Frequently asked questions
- Why does injecting tool schemas waste tokens when the model might only call one tool?
- Classic MCP ships every tool's JSON schema into the system context on every turn, so the model pays the full cost whether it calls one tool or none. The model cannot know which tools exist unless you tell it, and you tell it with schemas. Code Mode replaces that broadcast with a single type definition the runtime consults only when the generated code actually imports a function.
- Does Code Mode work with Claude Desktop or Cursor today?
- Not yet. Claude Desktop, Cursor, and VS Code's MCP integration all speak the classic MCP protocol, so they still receive inline tool schemas. Cloudflare's Code Mode targets agent frameworks (Cloudflare Agents, Mastra, LangGraph) where you control the runtime and can compile the agent's output before running it.
- What about security, isn't letting the model write code risky?
- It is, which is why Code Mode runs the generated code inside a V8 isolate with no filesystem access, no network access outside the typed API surface, and a CPU budget. The sandbox is the same shape Cloudflare uses for Workers. The model cannot escape the isolate any more than a user can escape a browser tab.
- Can I use both Classic MCP and Code Mode from the same server?
- Yes, and you should. Keep the classic MCP endpoint for desktop clients and editors that need zero-config tool discovery. Add a typed surface (OpenAPI or TypeScript types) for agent frameworks that run Code Mode. Botoi does this today: the MCP endpoint serves Claude Desktop, and the OpenAPI spec powers the SDK that agent frameworks import as a type definition.
- How much does this actually save on the Anthropic bill?
- For a 49-tool server at Anthropic's Opus input rate, 29,400 tokens per turn costs about $0.44 per 10-turn conversation in tool-description tokens alone. Code Mode collapses that to a one-time 1K-token type load, cutting per-conversation description cost to a fraction of a cent. At 10,000 conversations a month the difference is roughly $4,400.
Try this API
Token Count API — interactive playground and code examples
More guide posts
Start building with botoi
150+ API endpoints for lookup, text processing, image generation, and developer utilities. Free tier, no credit card.