LLM Agents and Integration
An LLM agent in Curiosity Workspace is the tool-use loop wrapped around the workspace's graph and search APIs. The LLM reasons about which tool to call next; deterministic endpoints do the work; the agent loops until it can answer.
This page is about the architecture. For the prompt templates that drive each step, see Prompting patterns → Tool use.
Architecture
Each box is a separate concern with a well-defined contract:
| Component | Responsibility | Stateless? |
|---|---|---|
| User turn | One question or instruction from the user. | n/a |
| LLM | Picks the next tool call or composes the final answer. | yes |
| Tool router | Validates the model's tool request, calls the right endpoint, returns JSON. | yes |
| Tool endpoints | Deterministic graph/search/fetch operations. Permission-aware. | mostly |
| Final answer | LLM-written summary with citation UIDs and links back to the source nodes. | n/a |
The arrow that matters most is the loop: tool result → LLM → next call. Cap it (8 calls is a reasonable default) so a confused model can't infinite-loop.
Tool registry
Each tool is a custom endpoint registered with the chat AI. The registration gives the model a name, a description, and a JSON-schema input. Keep tools:
- Small. One tool = one verb. Not "search and summarize"; that's two tools.
- Composable. A tool that returns UIDs is more useful than one that returns prose.
- Idempotent for reads. Calling the same read tool twice with the same args should return the same thing.
- Authenticated. Pass the user's identity through; never grant the agent more permissions than the user has.
Typical tool set for a support assistant:
| Tool | Inputs | Returns |
|---|---|---|
search(query, type, k) |
string, optional type, optional k | list of {uid, title, snippet, score} |
get_node(uid) |
uid | full properties of the node |
get_neighbors(uid, edge, k) |
uid, edge type, k | list of neighbor nodes |
find_similar(uid, k) |
uid, k | list of nodes similar by embedding |
ask_human(question) |
string | terminates the loop, surfaces the question to the UI |
The chat AI ships with these built-in; add domain-specific ones (open_ticket, lookup_invoice, …) as custom endpoints. See AI tools.
Permission handling
The user's identity threads through every tool call. The graph layer's ACL enforcement runs at retrieval time — CreateSearchAsUserAsync and Q().AsUser(uid) apply the user's permission view before results come back. The model never sees data the user can't see, which means it can't accidentally leak it.
If you build a custom tool, mirror this:
return await Graph.CreateSearchAsUserAsync(request, User.Id);
Never use the unscoped variant inside an AI tool.
Error handling
The agent loop must distinguish three error classes:
| Class | Example | Loop response |
|---|---|---|
| Tool error | Endpoint 5xx, validation fail. | Return error JSON; let the model try another approach. |
| Empty result | Search returned nothing. | Return {count: 0}; the model decides whether to retry differently. |
| Hard guardrail | User asked for restricted action. | Short-circuit the loop and return the refusal message directly. |
Don't catch and silently fix tool errors in the router — that hides genuine issues. Surface them to the model as structured errors and let it decide.
Cost and latency
Every loop iteration is a model call plus a tool call. A 5-step agent is 5 model calls + 5 endpoint calls. Two consequences:
- Pick the model carefully. A
haiku-class model for routing and asonnet-class for the final answer is a common split. - Cache aggressively. If the same
search("battery drain", "SupportCase")runs from different users, cache the read. - Stream the final answer. The user shouldn't watch a spinner while the model generates the summary.
See LLM configuration for provider setup and Metrics reference for the tool-call observability surface.
Worked example: support assistant
Trigger. A user opens a ticket in the support UI and asks "Why is my MBA-2024 battery draining overnight?"
Step 1. LLM calls search("battery drain MBA-2024", type="SupportCase", k=5).
Step 2. The tool returns 5 cases. LLM picks the most relevant 2 by reading snippets.
Step 3. LLM calls get_neighbors(uid=case_42, edge="Resolution", k=1) for each.
Step 4. The tool returns the resolutions: "firmware update v3.2 fixes overnight drain."
Step 5. LLM composes the answer: "Two recent cases for the MacBook Air 2024 ([case_42], [case_77]) report the same overnight drain. Both were resolved by updating to firmware v3.2."
Step 6. Loop ends. The router renders citations and posts the answer.
Total: 5 model calls, 4 tool calls, ~3 seconds. Well within the budget for a chat experience.
Anti-patterns
- Tools that return prose. Always return data. Prose belongs to the final answer step.
- Tools with side effects in the read path. A
searchthat also logs to a CRM is two tools fused; split them. - Tools that bypass permissions. No exceptions — even admin tools take the user's identity and enforce against admin role checks.
- Unbounded loops. Always cap iterations.
- Hidden state. Each tool call should carry everything it needs. If you need state, persist it as graph nodes the model can fetch.
Where to go next
- Prompting patterns — templates for the LLM steps.
- AI tools — registering custom tools.
- LLM configuration — picking and configuring providers.
- Custom endpoints — building the deterministic side.
- Grounded answer evaluation — measuring quality.