Curiosity - Reindexing and re-embedding

Reindexing and re-embedding

Indexes in a Curiosity Workspace stay in sync with the graph automatically: when you commit a node, its indexed fields are updated. You only need to rebuild when the recipe changes — which fields are indexed, which analyzer is used, which embedding model produces vectors. This page tells you when that's necessary and how to do it without downtime.

Mental model

Three things can get out of sync with the graph, and each is rebuilt separately:

Text indexes — built from the fields you list under Settings → Search → Indexes.
Vector indexes (embeddings) — built from the fields you list under Settings → AI Settings → Embeddings.
NLP enrichments — built by the entity-extraction pipelines under Settings → NLP.

You don't need to rebuild everything when one drifts. Pick the narrowest operation that fixes the issue.

When you need a rebuild

Change	What to rebuild
Added a field to Settings → Search → Indexes	Text index
Changed an analyzer or field boost	Text index
Removed an indexed field	Text index (the field disappears on the next rebuild)
Added an embedding field, or changed chunk size	Embeddings for the affected fields
Switched embedding provider or model	Embeddings for all embedded fields
Changed an NLP pipeline (added/removed an extractor)	Re-parse the affected fields
Restored from a backup taken on the same workspace version	Nothing — text and vector indexes are part of the snapshot
Restored from a backup on a different patch version	A safety rebuild is recommended
Schema migration that renamed a property	Text + vector index for that node type

Triggering a rebuild

From the UI, Settings → Maintenance exposes three actions:

Rebuild text indexes
Rebuild embeddings
Re-parse content

Each runs as a background task you can watch under Settings → Tasks. Rebuilds are non-blocking: existing search and AI queries continue to use the previous index until the new one is ready, then the engine swaps atomically.

For automation, schedule the same operations from Settings → Scheduled Tasks with task types Reindex, RebuildEmbeddings, and Reparse. See Scheduled Tasks.

Capacity planning

Rebuilds are CPU- and disk-bound:

A text rebuild is dominated by parsing — typically minutes for hundreds of thousands of nodes, hours for tens of millions.
An embedding rebuild is dominated by the embedding provider's throughput and your rate limit. Budget the token cost ahead of time — re-embedding a large corpus is the most expensive operation you can trigger.
A re-parse is dominated by your NLP pipeline complexity. Dictionary spotters are fast; LLM-driven extraction is slow and expensive.

To estimate cost ahead of a model switch:

// Total characters that will be embedded under the new model
return Q().StartAt(nameof(Ticket))
          .AsEnumerable()
          .Sum(n => (n.GetString(nameof(Ticket.Body)) ?? "").Length);

Divide by the provider's tokens-per-character ratio (~4 for English) and multiply by the model's per-token price.

Avoiding downtime

The atomic swap behavior means most rebuilds are invisible to users. To make absolutely sure:

Schedule the rebuild during off-hours.
Watch Settings → Monitoring while it runs; query latency should stay flat.
Confirm post-rebuild with a golden query set (see Search Optimization) — if precision regresses, roll back the configuration change before the next rebuild.

Partial rebuilds during a schema migration

When you rename or retype a property, only nodes of that type need their text and vector entries refreshed. You don't need a full reindex.

Apply the schema change in your connector and re-run it for the affected type — the upserts touch the changed nodes only.
Trigger Rebuild text index for <Type> (drop-down available on the Maintenance page).
Trigger Rebuild embeddings for <Type> if the renamed field was embedded.

If a property's type changed (e.g., string → number), the old values may be unsearchable until you backfill them with the new shape. Use a one-off scheduled task to normalize.

Rollback

A rebuild starts a new index alongside the old one. If you change your mind mid-rebuild, Settings → Tasks → Cancel stops it and discards the partial new index — the previous index keeps serving traffic. After completion, you can revert a configuration change and trigger another rebuild; previous index data is not retained after a successful swap, so revert promptly if needed.

Connector implications

Re-running a connector after a rebuild is safe: idempotent upserts touch only changed nodes. There is no need to "warm" the new index by replaying ingestion.