Token Truncate API - Cut Text to LLM Token Budget
Estimates the original token count; if it exceeds max_tokens, truncates the text at a word boundary to fit the budget. Returns the truncated string, final token count, was_truncated flag, original_tokens for comparison, and the model used.
Code examples
curl -X POST https://api.botoi.com/v1/token/truncate \
-H "Content-Type: application/json" \
-d '{"text":"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.","max_tokens":10,"model":"gpt-4o"}'When to use this API
Fit retrieved context under the LLM limit
In RAG, the retrieved documents may exceed the remaining context window. Truncate them to the available budget (model limit minus prompt + expected output) before appending.
Cap user inputs
Before sending a long user prompt to an LLM, truncate it to a reasonable budget. Protects against prompt-injection where an attacker pads the input to blow the context window.
Shrink long assistant turns in chat history
When replaying a conversation, selectively truncate older assistant messages to preserve recent context while still honoring the context-window limit.
Frequently asked questions
Does it cut mid-word?
What if max_tokens exceeds the text length?
Is it safe for multi-turn prompts?
Does this handle emoji and CJK correctly?
Why does tokens differ slightly from original_tokens * ratio?
Get your API key
Free tier includes 5 requests per minute with no credit card required. Upgrade for higher limits.