Deployment
This page is the production-readiness checklist for a Curiosity Workspace deployment. It assumes you've already chosen a platform — for the platform-specific manifests, see Installation, Docker, Kubernetes, and the cloud guides.
Deployment goals
- Reliability: predictable uptime, fast recovery.
- Security: TLS, secrets discipline, scoped tokens, ReBAC enforced everywhere.
- Scalability: handle data growth and query load.
- Reproducibility: dev → staging → prod promotion is mechanical, not heroic.
Environments
Maintain three named environments with parity in shape but different scale and access:
| Environment | Purpose | Notes |
|---|---|---|
| Dev | Engineer-owned, may be on a laptop | Local Docker run, default ports, generated admin password. |
| Staging | Pre-production validation | Prod-shaped manifest, smaller capacity, isolated secrets, restore drills allowed. |
| Production | Real users | Restricted access, change control, no shell access by default. |
Promotion path: code/config changes land in source → tested in dev → deployed to staging → validated → promoted to prod.
What to version and promote
Treat these as deployable artifacts and version-control them:
- Connector code (your data ingestion programs).
- Custom endpoint and AI tool code (export from the workspace UI, store in git, redeploy on promotion).
- Custom interface bundles (Tesserae / H5).
- Schema migrations and ingestion pipeline definitions.
- Search index configuration (indexed fields, boosts, facets).
- NLP pipeline configuration (entity capture, embeddings field selection).
- The deployment manifest itself (Docker Compose / Kubernetes / Helm / Terraform).
The Workspace stores all UI-managed configuration inside the graph, so a configuration export + import lets you snapshot and promote workspaces. See Backup and restore.
Production checklist
Before flipping a workspace to "production":
Image and runtime
- Versioned image tag (
curiosityai/curiosity:vX.Y.Z), not:latest. - Container memory and CPU sized for embeddings (start at 16 GB / 8 vCPU; bigger for large corpora).
- Healthcheck on
/api/login/check. -
terminationGracePeriodSeconds≥ 60 so the workspace can flush before being killed.
Storage
- Persistent volume on SSD-backed block storage attached to
MSK_GRAPH_STORAGE. - Separate volume (or directory) for
MSK_GRAPH_BACKUP_FOLDER. - Backups scheduled, off-host, and tested by restoring to a sandbox.
- Volume expansion enabled so you can grow without downtime.
Networking and TLS
- TLS terminated at the proxy or inside the container; HSTS enabled.
-
MSK_PUBLIC_ADDRESSset to the user-facing URL. - No
0.0.0.0exposure without an authenticating front-end. - Egress allowlist documented (Docker registry, NuGet, your LLM provider).
Identity and secrets
-
MSK_ADMIN_PASSWORDset explicitly (defaultadmin/adminnever used). -
MSK_JWT_KEYset explicitly so tokens survive restarts. -
MSK_GRAPH_MASTER_KEYset explicitly and backed up — losing it means losing encrypted content. - All secrets injected from a secret manager (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, Vault).
- At least one SSO provider configured.
- Admin sign-in via SSO only; the local
adminaccount disabled after onboarding.
Permissions and tokens
- Connectors run on dedicated tokens with
ingestionscope only. - External integrations use endpoint tokens scoped to specific endpoints.
- Token rotation documented and scheduled.
Observability
- Stdout logs routed to your aggregator; audit log forwarded to your SIEM.
- Alerts on liveness, latency regressions, ingestion failures, container restart rate.
- Per-endpoint and per-tool metrics scraped into your monitoring system.
Disaster recovery
- Documented RPO and RTO targets.
- Restore drill completed within the past quarter.
- Secrets manager backups verified.
See the per-page details: Security, Backup and restore, Monitoring, Upgrades and migrations.
Reverse proxy patterns
Most production deployments terminate TLS at a proxy and forward HTTP to the workspace. A minimal NGINX server block:
server {
listen 443 ssl http2;
server_name workspace.example.com;
ssl_certificate /etc/letsencrypt/live/workspace.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/workspace.example.com/privkey.pem;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
client_max_body_size 100m;
proxy_read_timeout 300s;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Set MSK_PUBLIC_ADDRESS=https://workspace.example.com on the workspace so generated links use the proxy's hostname.
Rolling out a change
Recommended sequence for a non-trivial production change (image upgrade, schema migration, new SSO provider, …):
- Take a backup of the graph volume.
- Apply the change in staging; walk the post-restore validation checklist in Backup and restore.
- Promote to production during a low-traffic window.
- Watch Monitoring for 30 minutes after the rollout.
- Be prepared to roll back: revert the image tag (and configuration), restart, and restore the backup if data shape changed.
Next steps
- Observe health and performance: Monitoring
- Set permission boundaries early: Permissions
- Plan for failure: Backup and restore, Troubleshooting
- Run the right manifest for your platform: Installation