Curiosity

Schema Design

Schema design is the most important step in building a Curiosity Workspace application. It determines:

  • how users navigate and explore data
  • how search is scoped and filtered
  • how AI features can ground and enrich results
  • how connectors keep data consistent over time

This page covers the design rules, then walks through three concrete enterprise schemas you can adapt: customer support, manufacturing/engineering, and a compliance knowledge base.

The three building blocks

A graph schema in Curiosity has three layers:

  • Nodes (entities) — the "things" in your domain: Customer, Ticket, Document, Device, Policy. Each node type is a C# class with a [Key] and zero or more [Property] fields.
  • Edges (relationships) — typed links between nodes: HasTicket, Mentions, ReportsTo. Edges are first-class — you traverse, facet, and search by them.
  • Properties (attributes) — scalar fields for display, filtering, sorting, and search/embedding input.

Start from user journeys

Before writing your first schema, answer these five questions:

  1. What are the top 5 questions users ask?
  2. What workflows do they execute?
  3. Which objects do they search for first?
  4. What do they click on next?
  5. Where do they need to navigate from there?

Those answers map directly to:

  • the primary node types,
  • the edges between them,
  • the filters and facets you must support.

If a question can't be answered with the schema you're drafting, the schema is wrong.

Keys: pick stable identity early

For each node type, define a stable key:

  • Prefer source IDs. They're stable, unique, and recognized by your users.
  • Synthetic keys are OK if you control allocation and store the mapping.
  • Deterministic hashes (canonicalize → hash) are acceptable when source IDs don't exist, but they're brittle under schema change — any tweak to the canonicalization recreates everything.
  • Avoid random IDs unless you never need to re-run ingestion against the same source.

The single biggest ingestion bug is unstable keys. If your connector produces duplicates, the key is the first thing to inspect.

Node vs property: a decision rule

Use a property when:

  • the value is only displayed or filtered on the current node;
  • you do not need to navigate to it as an entity;
  • the value has no metadata of its own.

Use a node + edge when:

  • you need cross-cutting filters (e.g., status across multiple types);
  • you need navigation ("show all tickets for this customer");
  • the value should have its own metadata (description, owner, lifecycle);
  • the value is shared across many records (avoid duplicating spelling variants).

A quick smell test: if you find yourself writing Where(n => n.GetString("Status") == "Open") everywhere, Status probably wants to be a node — and then you can do .In("HasStatus") against the open-status node and get a free facet.

Edges should read like sentences

Name edges so traversals read naturally:

  • Customer ─HasTicket─▶ Ticket ─ForProduct─▶ Product — reads like English.
  • Maintain bidirectional names when readability improves; the graph engine treats them symmetrically.
public static class Edges
{
    public const string HasTicket  = nameof(HasTicket);
    public const string TicketOf   = nameof(TicketOf);
    public const string ForProduct = nameof(ForProduct);
    public const string HasStatus  = nameof(HasStatus);
}

graph.Link(customer, ticket, Edges.HasTicket, Edges.TicketOf);

Worked example 1 — customer support

flowchart LR Customer -->|HasTicket| Ticket Ticket -->|ForProduct| Product Ticket -->|HasStatus| Status Ticket -->|HasMessage| Message Message -->|MentionsEntity| Entity Product -->|MadeBy| Manufacturer
Node Key Properties Indexed for search?
Customer Id Name, Tier, Region name only
Product Sku Name, Category name
Manufacturer Name Country name
Status Code Label, IsOpen label
Ticket Id Subject, Body, CreatedAt, Priority subject + body (text + vector)
Message Id Author, Text, SentAt text (vector if long)
Entity Name Kind name

Why this shape:

  • Hub entities (Customer, Product, Ticket) are what users search for; everything else is reachable from them.
  • Status as a node gives a stable facet across types and metadata about each status (e.g., "open" vs "resolved").
  • Messages are separate so they can be embedded individually (chunked) for better semantic retrieval.
  • Entity is filled in by NLP extraction; it lets users facet by mentioned product/component names.

Worked example 2 — manufacturing / engineering docs

flowchart LR Document -->|DescribesPart| Part Document -->|RevisedFrom| Document Part -->|PartOf| Assembly Assembly -->|InProduct| Product Document -->|AuthoredBy| Engineer Engineer -->|MemberOf| Team
Node Key Notes
Document DocNumber Versioned via RevisedFrom edge to predecessor.
Part PartNumber Canonical part.
Assembly AssemblyNumber Sub-assemblies modeled the same way (recursive).
Product Sku The shippable product.
Engineer Email Mapped from SSO.
Team Name SSO-mapped team for ACL.

The RevisedFrom edge models history without duplicating the schema. To answer "show me revisions of doc D-12345", traverse Document {DocNumber: D-12345} → In(RevisedFrom) until no more predecessors.

Worked example 3 — compliance knowledge base

flowchart LR Regulation -->|Cites| Regulation Policy -->|ImplementsRegulation| Regulation Control -->|UnderPolicy| Policy Control -->|TestedBy| Audit Audit -->|EvidenceFor| Document Document -->|OwnedBy| Team
Node Key Notes
Regulation Code E.g., SOC2-CC6.1, GDPR-Art32.
Policy Id Internal policy ID.
Control Id The implementation of a policy.
Audit Id A scheduled/completed audit cycle.
Document Id Evidence: PDFs, screenshots, signed attestations.
Team Name Who owns evidence; drives ReBAC.

This schema is small but powerful: from a regulation node, traverse outward to find every policy → control → audit → evidence in three hops.

Indexing decisions follow the schema

Once the shape is right, the search configuration writes itself:

  • Index titles, identifiers, and names as text.
  • Index long descriptive fields as embeddings with chunking.
  • Index status, priority, category, region as property facets.
  • Add related facets for the hub entities users will filter by (Customer, Product, Team).

See Search Optimization and Vector Search.

Schema evolution

Real schemas evolve. Plan for it:

  • Add a property — safe; existing nodes get null; backfill in a scheduled task if needed.
  • Add a new node or edge type — safe; register schema, ingest going forward.
  • Rename a property — non-trivial; add the new property, dual-write during a deprecation window, migrate readers, remove old property.
  • Change a key's type or domain — treat as a new node type; migrate, retire the old one.
  • Remove a property — drop it from the schema and rebuild the search index.

See Reindexing and re-embedding for the rebuild side, and Upgrades and migrations for the operational shape.

Common anti-patterns

  • Properties pretending to be nodes — repeating spelling variants in a Tag property when the user navigates by tag. Make Tag a node.
  • Edges pretending to be properties — storing a comma-separated list of related items in a property. Make them edges.
  • Over-normalizing — turning every value (every word, every author name) into a node. Keep it to things users navigate or facet.
  • Missing time — no [Timestamp] on event-like nodes means no recency sort and broken time facets.
  • Composite keys — keys that combine source ID + version + region. Pick one identity; model the others as edges or properties.

Where to go next

© 2026 Curiosity. All rights reserved.
Powered by Neko