How many tokens are in a word?

On average, one English word equals about 1.3 tokens. Short common words like "the" or "is" are one token. Longer or uncommon words like "authentication" split into 2-4 subword tokens. The exact count depends on the model's tokenizer.

What is a token in GPT?

A token is a chunk of text that the model processes as a single unit. GPT models use a byte-pair encoding (BPE) tokenizer that splits text into subword pieces. Common words stay whole, while rare or long words split into smaller fragments. Punctuation and whitespace are also tokenized.

How do I count tokens before an API call?

Send your text to POST https://api.botoi.com/v1/token/count with an optional model parameter (gpt-4o, claude-3.5-sonnet, llama-3, etc.). The API returns the estimated token count, word count, and character count in a single response.

Do different LLMs tokenize text the same way?

No. GPT models use cl100k_base or o200k_base encoding. Claude uses a similar but distinct BPE tokenizer. Llama uses SentencePiece. The same sentence produces different token counts across models. Always count tokens with the specific model you plan to call.

What happens when you exceed a model's context window?

Most LLM APIs return an error when the input exceeds the context window. Some silently truncate the input, which can cut off critical instructions or context. Pre-checking token count and truncating to fit prevents both failure modes.

tutorial

Token counting for GPT, Claude, and Llama in one API

Mar 26, 2026 | 7 min read

AI model comparison dashboard — Photo by Possessed Photography on Unsplash

You send a prompt to GPT-4o and the response cuts off mid-sentence. You check your bill and find a batch job burned through $40 because the input was 3x larger than you expected. You paste a long document into Claude and get an error: context window exceeded. Every one of these problems traces back to the same root cause; you didn't know how many tokens your text contained before you sent it.

Token counting is the pre-flight check every LLM integration needs. Character count won't help you. Word count gets you in the ballpark, but tokenizers split text differently depending on the model. You need the exact count for the model you're calling.

Why character count is not token count

LLMs don't process raw characters. They break text into tokens using a tokenizer, which is a vocabulary of subword pieces trained on a large corpus. The mapping from text to tokens is non-obvious and model-specific.

Some examples that show why counting characters misleads you:

"I can't" splits into 3 tokens in GPT-4: I, can, 't. That's 7 characters but 3 tokens.
"antidisestablishmentarianism" is one word but 6-8 tokens depending on the model. The tokenizer breaks it into subword pieces it recognizes.
"Hello" is 1 token. " Hello" (with leading spaces) might be 2 tokens because the whitespace gets its own token.
Code snippets tokenize differently from prose. Curly braces, semicolons, and indentation each consume tokens. A 500-character function can easily cost 200+ tokens.

GPT models use BPE (byte-pair encoding) with the cl100k_base or o200k_base vocabulary. Claude uses a similar but distinct BPE tokenizer. Llama uses SentencePiece. The same paragraph produces different token counts across all three.

Count tokens with one API call

Send your text to the botoi /v1/token/count endpoint with the target model. The API returns the estimated token count along with character and word counts.

curl -X POST https://api.botoi.com/v1/token/count \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The quick brown fox jumps over the lazy dog. This sentence is used to test tokenizers across different language models.",
    "model": "gpt-4o"
  }'

Response:

{
  "success": true,
  "data": {
    "tokens": 24,
    "model": "gpt-4o",
    "method": "estimated",
    "characters": 116,
    "words": 20
  }
}

The response tells you this 20-word sentence costs 24 tokens in GPT-4o. You also get characters and words for quick reference. The method field indicates the counting approach used.

Token counts by model

The same text produces different token counts depending on which model you target. The model parameter accepts 15 models across the major families. Here's how they compare for the same input:

Model	Tokenizer	Context window	Tokens (same text)
gpt-4o	o200k_base (BPE)	128K	24
gpt-3.5-turbo	cl100k_base (BPE)	16K	24
claude-3.5-sonnet	Claude BPE	200K	25
claude-4-opus	Claude BPE	200K	25
llama-3.2	SentencePiece	128K	24
gemini-2.0-flash	SentencePiece	1M	24
mistral	SentencePiece (BPE)	32K	24

The differences are small for short English sentences but grow as input length increases. Non-English text, code, and structured data (JSON, XML) can show larger variation. Always count tokens with the specific model you plan to call.

AI language model visualization — Different LLMs use different tokenizers; the same text produces different token counts Photo by Steve Johnson on Unsplash

Truncate text to a token limit

When your prompt exceeds the context window, you need to trim it without breaking mid-word. The /v1/token/truncate endpoint cuts text to a target token count at a word boundary.

curl -X POST https://api.botoi.com/v1/token/truncate \
  -H "Content-Type: application/json" \
  -d '{
    "text": "You are a helpful assistant. Summarize the following document in three bullet points. The document discusses the impact of renewable energy adoption on global carbon emissions over the past decade, with specific focus on solar and wind installations in Europe and Southeast Asia.",
    "max_tokens": 20,
    "model": "claude-3.5-sonnet"
  }'

Response:

{
  "success": true,
  "data": {
    "truncated": "You are a helpful assistant. Summarize the following document in three bullet points. The",
    "tokens": 18,
    "was_truncated": true,
    "model": "claude-3.5-sonnet",
    "max_tokens": 20,
    "original_tokens": 48
  }
}

The original prompt was 48 tokens. The API truncated it to 18 tokens (within the 20-token budget) at a clean word boundary. The was_truncated flag tells you whether the text was modified. The original_tokens field shows how many tokens the full text contained.

This is useful for fitting system prompts into tight token budgets, trimming chat history to stay within the context window, and chunking documents before sending them to an embeddings API.

Build a pre-flight check for LLM calls

The highest-value integration: a function that counts tokens, compares against the model's context window, and truncates if the prompt is too long. This prevents both silent truncation and API errors.

const MODEL_LIMITS = {
  "gpt-4o": 128000,
  "gpt-4o-mini": 128000,
  "claude-3.5-sonnet": 200000,
  "claude-4-sonnet": 200000,
  "llama-3.2": 128000,
};

async function countTokens(text, model = "gpt-4o") {
  const res = await fetch("https://api.botoi.com/v1/token/count", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ text, model }),
  });
  const { data } = await res.json();
  return data.tokens;
}

async function truncateText(text, maxTokens, model = "gpt-4o") {
  const res = await fetch("https://api.botoi.com/v1/token/truncate", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ text, max_tokens: maxTokens, model }),
  });
  const { data } = await res.json();
  return data;
}

async function preflightCheck(prompt, model = "gpt-4o") {
  const limit = MODEL_LIMITS[model];
  if (!limit) throw new Error("Unknown model: " + model);

  const tokens = await countTokens(prompt, model);

  // Reserve 20% of the context window for the model's response
  const inputBudget = Math.floor(limit * 0.8);

  if (tokens <= inputBudget) {
    return { safe: true, tokens, limit, model };
  }

  // Truncate to fit within the input budget
  const result = await truncateText(prompt, inputBudget, model);

  return {
    safe: false,
    original_tokens: tokens,
    truncated_tokens: result.tokens,
    truncated_text: result.truncated,
    limit,
    model,
  };
}

// Usage
const prompt = buildPromptFromChatHistory(messages);
const check = await preflightCheck(prompt, "claude-3.5-sonnet");

if (!check.safe) {
  console.log(
    "Prompt truncated from " +
    check.original_tokens + " to " +
    check.truncated_tokens + " tokens"
  );
  prompt = check.truncated_text;
}

const response = await callLLM(prompt, "claude-3.5-sonnet");

The function reserves 20% of the context window for the model's response. If the input fits, it passes through unchanged. If it's too large, it gets truncated to the input budget. You always know exactly how many tokens you're sending.

Wrap this around every LLM call in your application. It adds one HTTP request (two if truncation is needed) and eliminates an entire class of production failures.

Real-world use cases

Cost estimation before API calls. Count tokens in a batch of prompts, multiply by the model's per-token price, and know the total cost before you commit. This Node.js function does it in a few lines:

async function estimateCost(text, model = "gpt-4o") {
  const tokens = await countTokens(text, model);

  // Price per 1M input tokens (March 2026 pricing)
  const rates = {
    "gpt-4o": 2.50,
    "gpt-4o-mini": 0.15,
    "claude-3.5-sonnet": 3.00,
    "claude-4-sonnet": 4.00,
    "llama-3.2": 0.00,  // self-hosted
  };

  const rate = rates[model] || 0;
  const cost = (tokens / 1_000_000) * rate;

  return {
    tokens,
    model,
    estimated_cost_usd: cost.toFixed(6),
  };
}

// Check cost before sending a large document
const estimate = await estimateCost(longDocument, "gpt-4o");
console.log(estimate);
// { tokens: 14320, model: "gpt-4o", estimated_cost_usd: "0.035800" }

Prompt size validation. Reject or trim user-submitted prompts that exceed your application's token budget. Prevent a single long input from consuming your entire rate limit.
Chunking documents for embeddings. Split long documents into chunks that fit within your embedding model's token limit (typically 512 or 8,192 tokens). Count tokens per chunk to ensure none exceed the limit.
Chat history management. As conversations grow, older messages push the total token count past the context window. Count the cumulative token total after each message and drop the oldest messages when you approach the limit.
CI/CD pipeline guards. Add a token count step to your deployment pipeline. If a prompt template exceeds a defined threshold, fail the build before it reaches production.

Key points

Token count varies by model. GPT, Claude, and Llama tokenize the same text differently. Always specify the target model when counting.
Two endpoints cover the full workflow. /v1/token/count tells you the size. /v1/token/truncate trims to fit. Both support 15 models.
Pre-flight checks prevent production failures. Count tokens before every LLM call to avoid truncated responses, context window errors, and surprise costs.
No account required. The free tier allows 5 requests per minute with no signup. Get an API key for higher volume at botoi.com/api.

The full API docs cover the complete list of supported models and additional developer utility endpoints.

Frequently asked questions

How many tokens are in a word?: On average, one English word equals about 1.3 tokens. Short common words like "the" or "is" are one token. Longer or uncommon words like "authentication" split into 2-4 subword tokens. The exact count depends on the model's tokenizer.
What is a token in GPT?: A token is a chunk of text that the model processes as a single unit. GPT models use a byte-pair encoding (BPE) tokenizer that splits text into subword pieces. Common words stay whole, while rare or long words split into smaller fragments. Punctuation and whitespace are also tokenized.
How do I count tokens before an API call?: Send your text to POST https://api.botoi.com/v1/token/count with an optional model parameter (gpt-4o, claude-3.5-sonnet, llama-3, etc.). The API returns the estimated token count, word count, and character count in a single response.
Do different LLMs tokenize text the same way?: No. GPT models use cl100k_base or o200k_base encoding. Claude uses a similar but distinct BPE tokenizer. Llama uses SentencePiece. The same sentence produces different token counts across models. Always count tokens with the specific model you plan to call.
What happens when you exceed a model's context window?: Most LLM APIs return an error when the input exceeds the context window. Some silently truncate the input, which can cut off critical instructions or context. Pre-checking token count and truncating to fit prevents both failure modes.

Try this API

Text Stats API — interactive playground and code examples

Start building with botoi

150+ API endpoints for lookup, text processing, image generation, and developer utilities. Free tier, no credit card.

View API docs Try free tools