Curiosity

RAG and agent architecture

How Curiosity Workspace structures Retrieval-Augmented Generation, AI tools, and agent-style workflows so they're grounded, permission-aware, citable, and auditable.

The shape we recommend

flowchart LR U[User] --> Chat[Chat view] Chat --> GW[Gateway] GW --> Orchestrator["Chat orchestrator"] Orchestrator -->|prompt + tool defs| LLM["LLM<br/>(OpenAI / Anthropic /<br/>Azure / local)"] LLM -->|tool calls| Tools["AI tools (C#)"] Tools -->|CreateSearchAsUserAsync| Search["Search engine<br/>(ACL-filtered)"] Tools -->|Q().StartAt(...)| Graph["Graph engine<br/>(ACL-filtered)"] Tools -->|AddSnippet| Snippets["Citation registry"] Tools --> LLM LLM --> Orchestrator Orchestrator -->|answer + citations| Chat Orchestrator -->|audit| AuditLog["Audit log"]

Three things happen on every turn:

  1. The orchestrator gives the LLM only the tools the user is allowed to call, no raw graph/search access.
  2. Tools that retrieve data do it through the user's security context, so the LLM never sees content the user can't see.
  3. Every retrieved chunk is registered as a snippet, and the LLM is instructed to cite it with a bracketed ID — [1], [2] — that becomes a clickable link in the UI.

The hard constraint: the LLM never decides what the user is allowed to read. The graph and search engines decide; the LLM gets a filtered view.

Building blocks

Custom endpoints

Server-side C# that wraps a specific retrieval or action. Endpoints are the building block for everything else — both AI tools and external integrations typically call into endpoint code (or share helpers with it). See Custom Endpoints.

AI tools

Annotated C# classes the LLM can invoke. Each method decorated with [Tool] becomes a callable action; each [Parameter] becomes a typed argument the LLM fills in.

public class TicketTools
{
    [Tool("Search the support-ticket knowledge base for tickets similar to the user's question.")]
    public static async Task<string> FindSimilarTickets(ToolScope scope,
        [Parameter("The symptom or question", required: true)] string query,
        [Parameter("Optional product SKU to scope the search", required: false)] string productSku)
    {
        // permission-aware retrieval
        var search = SearchRequest.For(query);
        search.BeforeTypesFacet = new([] { "Ticket" });
        var q = await scope.Graph.CreateSearchAsUserAsync(search, scope.CurrentUser, scope.CancellationToken);

        var results = q.Take(10).AsEnumerable().Select(n => {
            var text = scope.ChatAI.GetTextFromNode(n.UID, limit: 4_000);
            var id   = scope.AddSnippet(uid: n.UID, text: text);
            return new { snippetId = id, subject = n.GetString("Subject"), body = text };
        }).ToArray();

        scope.SetToolCallDisplayName($"Looked for tickets like '{query}'");
        return results.ToJson();
    }
}
return new TicketTools();

The ToolScope parameter is the workspace's contract with the tool:

  • scope.Graph — graph access.
  • scope.CurrentUser — the user the chat is running as.
  • scope.ChatAI.GetTextFromNode(uid, limit) — fetch indexed text for grounding.
  • scope.AddSnippet(uid, text) — register a citation; the integer returned becomes the bracket reference.
  • scope.SetToolCallDisplayName(...) — a human-readable label shown in the trace.
  • scope.CancellationToken — propagate the user's cancel.

See AI Tools.

The chat orchestrator

The orchestrator is the workspace's own runtime. It:

  • Knows which tools are visible to the user (admin tools only show up for admins).
  • Streams the LLM's response back to the chat UI.
  • Stops a tool call that exceeds its time/budget.
  • Logs the whole turn (prompt, tool calls, results, answer, citations) into the audit log.

You don't write the orchestrator; you parameterize it via LLM Configuration and Prompting Patterns.

Common patterns

Grounded Q&A (the default RAG shape)

  • LLM receives the user's question + a small set of retrieval tools.
  • LLM calls FindSimilar* / SearchDocs tools.
  • Tools return content + snippet IDs.
  • LLM synthesizes an answer with [1] [2] references.
  • UI hydrates each reference into a clickable source card.

Tool-using agent (multi-step work)

  • LLM is given retrieval and action tools (e.g., UpdateTicketStatus, AssignToTeam).
  • LLM calls retrieval tools, then action tools.
  • Action tools must be idempotent, bounded, and permission-aware.
  • The audit log captures every tool invocation for post-hoc review.

Permission-aware retrieval with graph scope

This is the pattern that gives Workspace its edge over raw vector search: bind the LLM's "where to look" to the graph.

// "Find tickets like this question, but only for tickets the user owns
//  on the product they're currently looking at"
search.TargetUIDs = scope.Graph.Q()
    .StartAt("Product", productSku)
    .In("ForProduct")
    .AsUIDEnumerable()
    .ToArray();

The vector search and BM25 both operate within that target set. Permissions are still applied on top.

Safety properties

Property How it's enforced
The LLM never sees content the user can't see. CreateSearchAsUserAsync(..., scope.CurrentUser, ...).
The LLM can only call tools that have been exposed. Tool registry is server-controlled; the prompt only contains tool definitions for the calling user.
Tool calls are auditable. Every invocation is logged with caller, args, result size, and trace ID.
Hallucinations are reduced. Tools register snippet IDs; the prompt instructs the LLM to cite only registered IDs.
Long-running tool calls are bounded. scope.CancellationToken carries the user's cancel; the orchestrator enforces a per-call timeout.

Anti-patterns

  • Calling Graph.CreateSearchAsync(...) (without AsUser) from a tool. The system context bypasses ACLs.
  • Embedding business rules in the prompt. Put them in tool code where they can be tested and audited.
  • Passing whole documents to the LLM. Use snippets and short summaries; vector retrieval already selected the relevant parts.
  • Exposing destructive actions as tools without confirmation. Mark them with conservative descriptions and prefer staged actions (propose → user confirms → execute).

Operational considerations

  • Cost: every chat turn calls the LLM at least once, plus an embedding call per tool retrieval. Budget accordingly; see Performance Tuning.
  • Latency: tool calls add round trips. Aim for ≤ 3 tools per turn for interactive chat; more is fine for back-of-house workflows.
  • Provider failure: configure a fallback provider, and let tools degrade gracefully when no LLM is available.
  • Per-tool metrics: GET /api/chatai/tools/metrics exposes counts, latencies, and error rates. See Monitoring.

See also

© 2026 Curiosity. All rights reserved.
Powered by Neko