Curiosity

Migration: legacy enterprise search

If you're moving from a traditional enterprise-search platform (Coveo, Algolia for enterprise, Endeca/Oracle, Sinequa, Mindbreeze, GSA, FAST/SharePoint Search, …), this page maps the patterns onto Curiosity Workspace.

What maps onto what

Legacy enterprise search	Curiosity Workspace
Crawler / source connector	Connector using `Curiosity.Library`
Out-of-the-box source plugins (Confluence, SharePoint, Box, etc.)	Built-in integrations under Settings → Integrations, plus custom connectors for what isn't covered
Document-level ACL ingestion ("crawl-time security")	`RestrictAccessToTeam`, `RestrictAccessToUser`, ReBAC
Late-binding security (filter at query time using user's groups)	Built-in — `CreateSearchAsUserAsync` applies the user's ACL filter at query time
Tuning UI for relevance	Search Optimization, boosts per indexed field, hybrid retrieval
Synonyms, dictionaries, stop words	NLP pipelines, language-specific analyzers
Faceted search	Property facets and graph-relationship facets — the latter is the differentiator
Connectors marketplace	Smaller built-in set + custom connectors via the SDK
Query-rewriting layer	A custom endpoint in front of search
Personalized re-ranking	Custom endpoint that re-ranks results using user-context signals from the graph
Multilingual analyzer packs	Built-in internationalization
Search UI widgets	Tesserae components in a custom front-end
Analytics / query log dashboards	Workspace monitoring + your own custom analytics endpoint

What you gain

Knowledge graph alongside search. Most legacy platforms have no concept of entities and relationships. Curiosity gives you both, with consistent permissions across them.
Built-in AI. No need to bolt on a separate LLM gateway, vector DB, or RAG framework. Embeddings, chat, and citations are part of the platform.
One license, one image, one deployment. Legacy enterprise search often comes with separately licensed crawlers, indexes, analyzers, and admin UIs.
Modern auth. Native OIDC / Entra ID / Okta / Auth0 / SAML. No proprietary identity adapters.

What changes

Crawler vs connector mindset. Legacy crawlers are configured to "discover" content; Curiosity connectors are explicit programs that map source records into typed nodes and edges. This is more code but produces a cleaner data model.
ACL model. Most legacy platforms store flat ACLs on each document. Curiosity uses a graph-based ReBAC model — user → team → owns → resource. Migration usually means mapping source-system groups onto Workspace teams.
Configuration as code. Legacy platforms expose configuration through admin UIs that are hard to version. Curiosity's schema, endpoints, and tools are C# you commit to git.
No proprietary query language. No XQuery, no Coveo's syntax, no Endeca rules. Curiosity exposes a small set of built-in operators and lets you implement complex behaviors as endpoints.

A practical migration plan

Catalog your current sources. Which connectors are in use? Which are critical, which are optional?
Stand up Workspace in parallel (Installation).
Map identities first. Configure SSO and the group → team mapping. Permissions must work before content goes in.
Migrate one source end-to-end. Pick the highest-value source (usually the one with the most user queries). Build the connector, configure search and ACLs, validate with real users.
Run both stacks in parallel. A/B for at least a week on a real query log; compare precision, latency, and user behaviour.
Add the AI layer. This is what was missing from the legacy platform. Build the first chat endpoint and watch user behaviour shift toward it.
Migrate the remaining sources one at a time. Retire the legacy platform when the last source is cut over.

Migrating ACLs specifically

Document-level ACLs are the most error-prone part of a search migration. Two patterns work well:

Group → team mapping. For most platforms the right move is: each source-system group becomes a Workspace _AccessGroup. The connector mirrors group membership and calls RestrictAccessToTeam(doc, group).
Per-document override. When a document has a unique ACL (a one-off share), model it with a per-user restriction. Use sparingly; teams scale better.

Test with a non-admin account before declaring the migration done. The most common bug is the connector running as system context and accidentally making everything visible to everyone.

Common surprises

Search "loses" results. Almost always a permission bug or an analyzer mismatch. Sign in as admin to bisect.
Result counts differ. Legacy platforms often counted duplicates from multiple sources. Curiosity dedupes by stable key.
Boost values don't transfer. Different scoring math. Re-tune against your evaluation set.
Crawl-time vs query-time security drift. Legacy stacks sometimes index ACLs and forget to refresh them. Curiosity ReBAC is graph-based, so a membership change is reflected on the next user request.

Architecture overview.
Access Control Model.
From Elasticsearch + vector DB + LangChain — the sibling migration page for teams coming from a stitched modern stack.

Referenced by

migration-from-elasticsearch-vector-db-langchain