تخطي إلى المحتوى
POST AI agent ready /v1/token/count

Token Count API - Estimate LLM Tokens for GPT, Claude, Llama

Approximates token count using family-specific characters-per-token ratios (GPT, Claude, Llama, Mistral, Gemini). Returns token count, the model used, the estimation method, character count, and word count. Accurate enough for budgeting prompts and splitting long inputs.

Parameters

stringrequired

Input text to tokenize.

string

Target model for ratio selection.

Code examples

curl -X POST https://api.botoi.com/v1/token/count \
  -H "Content-Type: application/json" \
  -d '{"text":"Summarize this article in 3 bullet points.","model":"claude-3.5-sonnet"}'

When to use this API

Budget LLM costs before the request

Estimate tokens on user input, multiply by the model's per-token price, and display expected cost in your UI. Prevents surprise bills on long prompts.

Fit prompts in the context window

Combined with /v1/token/truncate, check whether your assembled prompt fits and chop intelligently if it overshoots. Especially useful for RAG pipelines with variable retrieved-context lengths.

Rate-limit by token spend, not request count

Enforce per-user LLM spending by summing estimated tokens per request. Stops a single user with long prompts from exhausting your monthly quota.

Frequently asked questions

How accurate is the estimate?
Within 5-10% of the real BPE tokenizer for English text. Non-English scripts (Chinese, Arabic, Hindi) tokenize at roughly 1 token per character and are over-estimated by this ratio-based approach. For production cost calculations near-limits, use the model's official tokenizer.
Why is method "estimated" and not "exact"?
This endpoint uses a characters-per-token ratio (3.6-3.8) rather than running the actual BPE tokenizer. Running tiktoken or Claude's tokenizer in Workers requires WASM and adds size and latency.
Which model should I pass?
Pick the closest model to what you'll actually call. Ratios are similar across families for English (3.5-3.8 chars/token) so cross-model variance is small. Default is gpt-4o.
Does this count system prompts and tool schemas?
It counts only the text you pass. For total request tokens, separately count your system prompt, user message, tool definitions, and assistant output, then sum.
Can I pass multiple strings?
Not in one call. For batch counts (e.g., token-per-message in a conversation), loop client-side. Requests are cheap.

Get your API key

Free tier includes 5 requests per minute with no credit card required. Upgrade for higher limits.