Curiosity
Two-column slide with a comparison table and a dashboard wireframe for cost and operations analysis.

Cost and operations

A 5-step agent makes 5 model calls. Model choice and tool design are your main cost levers.


Model selection by role:

Role Model
Tool routing / classification gpt-4o-mini, claude-haiku-4-5
Final answer synthesis gpt-4o, claude-sonnet-4-6
Air-gapped / no egress Local 70B on GPU

Consider using a smaller model for tool selection and a larger one only for the final answer.


Cost guardrails:

  • Set a daily token ceiling: Settings → AI Settings → Quotas
  • Set max_tokens per call (start at 1024)
  • Cap tool calls per turn (5 is a reasonable default)
  • Cache aggressively — identical queries in a session re-use previous results

Monitor per-tool metrics:

GET /api/chatai/tools/metrics

Returns per-tool call counts, latency, p95, and error rate. Wire to your monitoring stack.


Security checklist for tools:

  • Always use scope.CurrentUser for retrieval — never system context
  • Validate all inputs (treat LLM-supplied parameters as untrusted)
  • Log caller and target IDs for any mutation
  • Destructive actions: propose → confirm, never auto-execute

LLM configuration