Question 1

How accurate is the estimate?

Accepted Answer

Within 5-10% of the real BPE tokenizer for English text. Non-English scripts (Chinese, Arabic, Hindi) tokenize at roughly 1 token per character and are over-estimated by this ratio-based approach. For production cost calculations near-limits, use the model's official tokenizer.

Question 2

Why is method "estimated" and not "exact"?

Accepted Answer

This endpoint uses a characters-per-token ratio (3.6-3.8) rather than running the actual BPE tokenizer. Running tiktoken or Claude's tokenizer in Workers requires WASM and adds size and latency.

Question 3

Which model should I pass?

Accepted Answer

Pick the closest model to what you'll actually call. Ratios are similar across families for English (3.5-3.8 chars/token) so cross-model variance is small. Default is gpt-4o.

Question 4

Does this count system prompts and tool schemas?

Accepted Answer

It counts only the text you pass. For total request tokens, separately count your system prompt, user message, tool definitions, and assistant output, then sum.

Question 5

Can I pass multiple strings?

Accepted Answer

Not in one call. For batch counts (e.g., token-per-message in a conversation), loop client-side. Requests are cheap.

Token Count API - Estimate LLM Tokens for GPT, Claude, Llama

Parameters

Code examples

When to use this API

Budget LLM costs before the request

Fit prompts in the context window

Rate-limit by token spend, not request count

Frequently asked questions

Get your API key