Question 1

What is the difference between cosine and Jaccard?

Accepted Answer

Cosine similarity compares term-frequency vectors, so it rewards texts that share words in similar proportions. Jaccard compares sets, so it only counts whether a word appears in both, ignoring how often. Use cosine for longer passages, Jaccard for short tags or keywords.

Question 2

How is the score scaled?

Accepted Answer

Both methods return a value between 0 and 1. A score of 1 means the tokens are identical in distribution, and 0 means no words overlap. The response rounds to four decimal places.

Question 3

Is this semantic similarity?

Accepted Answer

No. The scoring is lexical. "Dog" and "canine" score 0 even though they mean the same thing. For semantic matching, embed the texts with an LLM and compute cosine similarity on the vectors.

Question 4

How is text tokenized?

Accepted Answer

The API lowercases input and splits on word boundaries using a Unicode-aware regex. Punctuation, emoji, and whitespace are dropped. Numbers and accented characters are preserved.

Question 5

Does word order matter?

Accepted Answer

No. Both algorithms treat each text as a bag of words. Sentences with the same tokens in different orders receive the same score.

Text Similarity API - Cosine & Jaccard Scoring

Parameters

Code examples

When to use this API

Detect duplicate support tickets

Deduplicate user-generated content

Route FAQ answers in a chatbot

Grade short-answer quiz responses

Frequently asked questions

Get your API key