Curiosity

Scaling

This page is a practical scaling guide: how to size a workspace by tier, how to grow each of the three axes (data volume, ingestion throughput, query load), and how backup behaves as the corpus grows.

For per-query tuning see Performance tuning. For operational deployment see Deployment.

Scale tiers

Use these as starting points — every workspace's mix of nodes/edges/queries is different, but the tiers below are calibrated against real production deployments.

Tier Target users Graph size (nodes) Queries / day Single-VM CPU / RAM Notes
Pilot up to 50 up to 1M up to 50k 8 vCPU / 32 GB Single VM. SSD storage. Suitable for early adopters.
Small up to 500 up to 10M up to 500k 16 vCPU / 64 GB Single VM with read-only replica for failover.
Medium up to 5 000 up to 100M up to 5M 32 vCPU / 128 GB Add a dedicated GPU worker if running OCR/STT or large embedding models.
Large up to 25 000 up to 1B up to 50M 64+ vCPU / 256+ GB Multi-node cluster. NVMe storage. Dedicated workers for ingestion and embedding.
XL 25 000+ 1B+ 50M+ engagement-specific Talk to Curiosity for architecture review.

These tiers are about the workspace server itself. External LLM providers (OpenAI, Anthropic, etc.) have their own cost/throughput characteristics — see LLM configuration.

Vertical sizing

For single-VM deployments at Pilot through Medium tiers, the bottleneck moves from CPU to RAM to disk I/O as the graph grows. Rules of thumb:

Component Sizing heuristic
RAM Keep ≥ 50% of the hot indexes in memory. Indexes typically grow at 1.5× to 3× the raw text size.
CPU Search latency is CPU-bound. Provision for peak QPS × P95 latency × 1.5 headroom.
Disk SSD or NVMe is mandatory. Spinning disks turn 100 ms searches into 5 s.
Network 1 Gbps is plenty for most workspaces. 10 Gbps only if you're pushing > 100 MB/s of ingestion traffic.

Storage estimate

Rough storage = raw text size × (1 + index_factor + embedding_factor) where:

  • index_factor ≈ 1.5–3 (text indexes, NLP annotations, graph structure)
  • embedding_factorvector_dim × 4 bytes / avg_chunk_chars (e.g. 1536-dim float32 over 500-char chunks ≈ 12× raw text)

For a 100 GB raw text corpus with embeddings: expect 1.5–2 TB of storage.

Ingestion scaling

The graph is single-writer per workspace; the ingestion pipeline is what you scale.

Bottleneck Symptom Fix
Single connector slow One source's sync window growing each run. Parallelize: partition the source (by date, region, account) and run N connector processes.
Connector → workspace network Commit calls taking > 1 s. Larger SetAutoCommitCost, fewer commits with more nodes each.
Workspace queue backed up Latency between commit and visibility climbing. Add ingestion-worker capacity. Pause indexing during the load (PauseIndexing).
NLP / embedding pipeline Re-process queue takes hours/days after ingest spikes. Add GPU workers for embeddings. Increase batch sizes in pipeline config.
File extraction (OCR/STT) PDF/audio backlog growing. Add OCR/STT worker pool. GPU is 10–30× faster for STT.

Targets to aim for per worker:

  • Ingestion writes: 5 000–20 000 nodes/sec per worker (graph + simple indexes).
  • Embeddings (CPU, local model): 50–200 docs/sec.
  • Embeddings (external API): provider-bound. Batch aggressively.
  • OCR: 5–15 pages/sec CPU, 50–150 pages/sec GPU.
  • STT (Whisper small, GPU): 10–15× real-time.

Query scaling

Search is read-heavy. Three levers:

Lever Effect
Read replicas Distribute search load. Eventually consistent. See Read-only replicas.
Index sharding Smaller per-shard indexes → faster lookups. Adds operational cost.
Query-side caching (endpoints) Cache popular aggregates. Most "expensive" endpoints have a handful of common variants.
Hybrid blend tuning Vector branch is the more expensive one — tune α to keep latency in budget.
Graph-constrained search Pushing constraints into TargetUIDs collapses the candidate set the search engine touches.

The single biggest query-scaling win is almost always narrower start sets. A graph-scoped TargetUIDs cuts both candidate retrieval and ACL filtering. See Search ranking tuning.

Multi-tenant patterns

Two viable shapes:

  1. Tenant = workspace. Each tenant gets a dedicated workspace; full isolation, simple ACLs, trivial DR. Best when tenants don't share data and procurement is per-tenant.
  2. Tenant = graph subset. One workspace, every node and edge tagged by tenant, ACLs enforce isolation. Best when tenants share lookup data (catalogs, knowledge bases).

For (2), ensure:

  • A Tenant node per tenant; every user/team/data node links to it.
  • All user-facing search calls use CreateSearchAsUserAsync (permission-aware).
  • Tenant ID is in every endpoint's input — never trust a client-supplied tenant.
  • Quotas per tenant (rate limits on endpoint calls, ingestion volume).

Backup at scale

Tier Backup strategy
Pilot/Small Daily full snapshot. Restore tested quarterly.
Medium Daily full + hourly incremental. Restore tested monthly. Cross-region copy.
Large/XL Continuous snapshots + WAL-style streaming to standby region. RPO < 5 min targets.

Snapshot size grows with the corpus. At 1B+ nodes a full snapshot can take many hours — incremental + change-data-capture is the only path to short RPOs.

The backup configuration knobs live in the operations guide — see Backup and restore.

Disaster recovery

Define and test:

  • RPO (recovery point objective): how much data can you afford to lose?
  • RTO (recovery time objective): how quickly must you be back up?
  • Failover path: cold standby? warm replica? active-active?
  • Restore drill: at least quarterly. The first time you restore should not be during an outage.

Practical scaling checklist

  • Schema and indexes audited annually; drop fields that aren't queried.
  • Ingestion batched and idempotent; no full re-syncs in production.
  • Search calls graph-scoped wherever possible.
  • Hybrid search blend tuned with a golden set.
  • Embedding model and chunking pinned; re-embed jobs scheduled, not ad-hoc.
  • Endpoint metrics in your monitoring stack; alerts on latency, error, and query-tracker drift.
  • Backup snapshots cross-region; restore drilled.
  • Capacity reviewed quarterly against the tier table.

Where to go next

© 2026 Curiosity. All rights reserved.
Powered by Neko