Reindexing and re-embedding
Indexes in a Curiosity Workspace stay in sync with the graph automatically: when you commit a node, its indexed fields are updated. You only need to rebuild when the recipe changes — which fields are indexed, which analyzer is used, which embedding model produces vectors. This page tells you when that's necessary and how to do it without downtime.
Mental model
Three things can get out of sync with the graph, and each is rebuilt separately:
- Text indexes — built from the fields you list under Settings → Search → Indexes.
- Vector indexes (embeddings) — built from the fields you list under Settings → AI Settings → Embeddings.
- NLP enrichments — built by the entity-extraction pipelines under Settings → NLP.
You don't need to rebuild everything when one drifts. Pick the narrowest operation that fixes the issue.
When you need a rebuild
| Change | What to rebuild |
|---|---|
| Added a field to Settings → Search → Indexes | Text index |
| Changed an analyzer or field boost | Text index |
| Removed an indexed field | Text index (the field disappears on the next rebuild) |
| Added an embedding field, or changed chunk size | Embeddings for the affected fields |
| Switched embedding provider or model | Embeddings for all embedded fields |
| Changed an NLP pipeline (added/removed an extractor) | Re-parse the affected fields |
| Restored from a backup taken on the same workspace version | Nothing — text and vector indexes are part of the snapshot |
| Restored from a backup on a different patch version | A safety rebuild is recommended |
| Schema migration that renamed a property | Text + vector index for that node type |
Triggering a rebuild
From the UI, Settings → Maintenance exposes three actions:
- Rebuild text indexes
- Rebuild embeddings
- Re-parse content
Each runs as a background task you can watch under Settings → Tasks. Rebuilds are non-blocking: existing search and AI queries continue to use the previous index until the new one is ready, then the engine swaps atomically.
For automation, schedule the same operations from Settings → Scheduled Tasks with task types Reindex, RebuildEmbeddings, and Reparse. See Scheduled Tasks.
Capacity planning
Rebuilds are CPU- and disk-bound:
- A text rebuild is dominated by parsing — typically minutes for hundreds of thousands of nodes, hours for tens of millions.
- An embedding rebuild is dominated by the embedding provider's throughput and your rate limit. Budget the token cost ahead of time — re-embedding a large corpus is the most expensive operation you can trigger.
- A re-parse is dominated by your NLP pipeline complexity. Dictionary spotters are fast; LLM-driven extraction is slow and expensive.
To estimate cost ahead of a model switch:
// Total characters that will be embedded under the new model
return Q().StartAt(nameof(Ticket))
.AsEnumerable()
.Sum(n => (n.GetString(nameof(Ticket.Body)) ?? "").Length);
Divide by the provider's tokens-per-character ratio (~4 for English) and multiply by the model's per-token price.
Avoiding downtime
The atomic swap behavior means most rebuilds are invisible to users. To make absolutely sure:
- Schedule the rebuild during off-hours.
- Watch Settings → Monitoring while it runs; query latency should stay flat.
- Confirm post-rebuild with a golden query set (see Search Optimization) — if precision regresses, roll back the configuration change before the next rebuild.
Partial rebuilds during a schema migration
When you rename or retype a property, only nodes of that type need their text and vector entries refreshed. You don't need a full reindex.
- Apply the schema change in your connector and re-run it for the affected type — the upserts touch the changed nodes only.
- Trigger Rebuild text index for
<Type>(drop-down available on the Maintenance page). - Trigger Rebuild embeddings for
<Type>if the renamed field was embedded.
If a property's type changed (e.g., string → number), the old values may be unsearchable until you backfill them with the new shape. Use a one-off scheduled task to normalize.
Rollback
A rebuild starts a new index alongside the old one. If you change your mind mid-rebuild, Settings → Tasks → Cancel stops it and discards the partial new index — the previous index keeps serving traffic. After completion, you can revert a configuration change and trigger another rebuild; previous index data is not retained after a successful swap, so revert promptly if needed.
Connector implications
Re-running a connector after a rebuild is safe: idempotent upserts touch only changed nodes. There is no need to "warm" the new index by replaying ingestion.
See also
- Search Model — what gets indexed and why.
- Vector Search — embedding-specific design.
- Upgrades and migrations — when a Workspace version bump forces a rebuild.
- Backup and restore — the case where the snapshot already includes indexes.