Curiosity

Diagram showing a seed ticket with arrows pointing to similar tickets and their similarity scores.

What is similarity search?

Similarity search finds items that are semantically close to a seed — a query, a document, or another node. It's the engine behind "more like this", related items, and candidate generation for AI.

Two building blocks:

Sentence embeddings — a dense vector representation of a text field, stored per node. Two nodes with similar vectors are semantically similar.

The similarity engine — a composable ranking system that combines multiple signals (embeddings, graph traversals, external lookups) into a single ranked list.

When to use it:

Use case	What you build
"More like this" button	Seed node → similar nodes by embedding
Related items panel	Seed node → neighbours by graph, then re-rank by embedding
Duplicate detection	Pairs with similarity > threshold
RAG candidate generation	Query text → top-k similar chunks as LLM context

Similarity search is a complement to keyword/hybrid search — not a replacement. Use them together.

→ Similarity engine

Next step 02-sentence-embeddings

Referenced by

index