Curiosity

Metrics reference

The workspace exposes per-endpoint and per-tool metrics over HTTP so you can wire them into your monitoring stack (Prometheus, Datadog, Grafana, the workspace's built-in admin views, etc.).

This page lists the routes and the exact response shapes. For configuring dashboards, see Administration → Monitoring.

Endpoint metrics

Every custom endpoint produces request-rate, latency, error, and query-tracker counters. Both routes return the same response shape (EndpointMetricsResult); the difference is scope.

`GET /api/endpoints/metrics?uid=

Metrics for a single endpoint, identified by its UID.

GET /api/endpoints/metrics/all

Aggregated metrics across all endpoints. Use this for top-level dashboards; use the per-endpoint route to drill in.

Response shape

{
  "RPS":                     [ 1.2, 0.9, 1.4, … ],
  "LatencyP95":              [ 142, 138, 151, … ],
  "ErrorRates":              [ 0.0, 0.0, 0.01, … ],
  "UniqueUsers":             [ 12, 11, 13, … ],
  "TotalCallsLastHour":      4321,
  "TotalErrorsLastHour":     7,
  "AverageLatencyLastHour":  124.5,
  "AggregatedQueryTracker": {
    "TouchedNodes":  192384,
    "TouchedEdges":  478211,
    "SimilarNodes":  12044,
    "Queries":       4321
  }
}
Field Type Units / meaning
RPS float[] Requests per second per bucket. Default bucket width is 1 minute, window 1 hour.
LatencyP95 float[] 95th-percentile latency per bucket, in milliseconds.
ErrorRates float[] Fraction [0, 1] of requests that errored per bucket.
UniqueUsers int[] Distinct authenticated users per bucket.
TotalCallsLastHour float Total call count over the window.
TotalErrorsLastHour float Total errored calls over the window.
AverageLatencyLastHour float Mean latency (ms) over the window.
AggregatedQueryTracker object Graph-query workload summary (see below).

The four arrays are aligned — index i of RPS, LatencyP95, ErrorRates, and UniqueUsers describes the same bucket.

EndpointQueryTrackerStats

Workload counters for graph and similarity queries that the endpoint executed during the window.

Field Type Meaning
TouchedNodes long Nodes visited by graph traversals.
TouchedEdges long Edges visited.
SimilarNodes long Nodes returned from vector retrieval.
Queries long Total graph queries executed (one endpoint call can run many).

These four counters are the early-warning signal for endpoints that scan too much of the graph. A high TouchedNodes-to-Queries ratio usually means a missing Take(...) or a StartAt(type) that should be StartAt(type, key).

Chat AI tool metrics

Tools registered with the chat AI surface get the same metric shape, exposed under their own routes.

`GET /api/chatai/tools/metrics?toolUID=

Metrics for a single chat-AI tool.

GET /api/chatai/tools/metrics/all

Aggregated metrics across every tool the chat AI can call.

Response shape is identical to EndpointMetricsResult above — slow or flaky tools degrade the entire chat experience, so plot these alongside endpoint metrics.

Authentication and scope

All metrics routes require an admin-scoped token. See Token scopes. External callers should rotate the token through a secret manager and not hard-code it in dashboards.

Sampling and retention

  • Metrics buckets are produced live in memory; the workspace retains the last rolling window (default 1 hour) at full resolution.
  • For longer retention, scrape the routes on your own interval and store the snapshots in your monitoring backend.
  • The Aggregated* route is computed on demand by summing the underlying per-endpoint counters — it's safe to poll but not free.

Wiring into monitoring

Prometheus

There is no built-in Prometheus exporter — write a small adapter that polls /api/endpoints/metrics/all once per minute and translates the response into gauge/counter metrics.

import requests, time
from prometheus_client import Gauge, start_http_server

calls   = Gauge("curiosity_endpoint_calls_last_hour", "")
errors  = Gauge("curiosity_endpoint_errors_last_hour", "")
latency = Gauge("curiosity_endpoint_avg_latency_ms_last_hour", "")

start_http_server(9100)

while True:
    r = requests.get("https://workspace/api/endpoints/metrics/all",
                     headers={"Authorization": f"Bearer {TOKEN}"}).json()
    calls.set(r["TotalCallsLastHour"])
    errors.set(r["TotalErrorsLastHour"])
    latency.set(r["AverageLatencyLastHour"])
    time.sleep(60)

Datadog / Grafana / OpenTelemetry

Any HTTP-pull integration works the same way. Polish the bucket arrays (RPS, LatencyP95, …) into time-series points by emitting (now - i * 60s, value) pairs.

Suggested alerts

  • Error budget burn. ErrorRates mean over the last 15 minutes > 1% on any endpoint.
  • Latency regression. LatencyP95 mean over the last 15 minutes exceeds a per-endpoint SLO (set during rollout).
  • Runaway query. AggregatedQueryTracker.TouchedNodes per call ratio increases by >2× week-over-week — usually a sign someone removed a Take(...) or widened a StartAt.
  • Tool failure rate. TotalErrorsLastHour / TotalCallsLastHour on chatai/tools/metrics/all > 5%.
© 2026 Curiosity. All rights reserved.
Powered by Neko