跳转到内容
POST AI agent ready /v1/token/truncate

Token Truncate API - Cut Text to LLM Token Budget

Estimates the original token count; if it exceeds max_tokens, truncates the text at a word boundary to fit the budget. Returns the truncated string, final token count, was_truncated flag, original_tokens for comparison, and the model used.

Parameters

stringrequired

Source text to truncate.

numberrequired

Target token budget. Must be positive.

string

Model whose tokenizer ratio to use.

Code examples

curl -X POST https://api.botoi.com/v1/token/truncate \
  -H "Content-Type: application/json" \
  -d '{"text":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.","max_tokens":10,"model":"gpt-4o"}'

When to use this API

Fit retrieved context under the LLM limit

In RAG, the retrieved documents may exceed the remaining context window. Truncate them to the available budget (model limit minus prompt + expected output) before appending.

Cap user inputs

Before sending a long user prompt to an LLM, truncate it to a reasonable budget. Protects against prompt-injection where an attacker pads the input to blow the context window.

Shrink long assistant turns in chat history

When replaying a conversation, selectively truncate older assistant messages to preserve recent context while still honoring the context-window limit.

Frequently asked questions

Does it cut mid-word?
No. It cuts at the last space within the character budget, as long as that space is past the halfway mark. Shorter cutoffs may cut precisely at the budget if no suitable boundary exists.
What if max_tokens exceeds the text length?
The original text is returned unchanged with was_truncated:false. The tokens field still reports the estimated count.
Is it safe for multi-turn prompts?
Yes, with care. If you pass a single concatenated conversation, the cut may land inside a speaker turn. For multi-turn truncation, loop per message and drop or truncate the oldest messages until you fit.
Does this handle emoji and CJK correctly?
Character-based truncation does not count grapheme clusters. A multi-codepoint emoji at the cut boundary may render as mojibake. For human-readable cuts use /v1/text/truncate which operates on characters.
Why does tokens differ slightly from original_tokens * ratio?
Each count runs through the estimation separately on different text lengths. Minor rounding differences are expected and always within a token or two.

Get your API key

Free tier includes 5 requests per minute with no credit card required. Upgrade for higher limits.