Question 1

Does it cut mid-word?

Accepted Answer

No. It cuts at the last space within the character budget, as long as that space is past the halfway mark. Shorter cutoffs may cut precisely at the budget if no suitable boundary exists.

Question 2

What if max_tokens exceeds the text length?

Accepted Answer

The original text is returned unchanged with was_truncated:false. The tokens field still reports the estimated count.

Question 3

Is it safe for multi-turn prompts?

Accepted Answer

Yes, with care. If you pass a single concatenated conversation, the cut may land inside a speaker turn. For multi-turn truncation, loop per message and drop or truncate the oldest messages until you fit.

Question 4

Does this handle emoji and CJK correctly?

Accepted Answer

Character-based truncation does not count grapheme clusters. A multi-codepoint emoji at the cut boundary may render as mojibake. For human-readable cuts use /v1/text/truncate which operates on characters.

Question 5

Why does tokens differ slightly from original_tokens * ratio?

Accepted Answer

Each count runs through the estimation separately on different text lengths. Minor rounding differences are expected and always within a token or two.

Token Truncate API - Cut Text to LLM Token Budget

Parameters

Code examples

When to use this API

Fit retrieved context under the LLM limit

Cap user inputs

Shrink long assistant turns in chat history

Frequently asked questions

Get your API key