Backup and restore
A Curiosity Workspace deployment has three things you need to be able to restore from cold storage:
- The graph — the typed nodes, edges, and indexes that live under
MSK_GRAPH_STORAGE. This is the database. - The workspace configuration — search indexes, NLP pipelines, embedding/LLM provider settings, SSO config, scheduled tasks, custom endpoints, AI tools. Stored inside the graph, so it's covered by the graph backup.
- The secrets —
MSK_ADMIN_PASSWORD,MSK_JWT_KEY,MSK_LICENSE,MSK_GRAPH_MASTER_KEY, model-provider API keys, certificate material. Stored outside the graph in your secret manager.
You need a working copy of all three to restore a Workspace from scratch.
Strategy at a glance
| Tier | Target RPO | Target RTO | Approach |
|---|---|---|---|
| Local dev | 24h | best-effort | Periodic tar of MSK_GRAPH_STORAGE. |
| Staging | 24h | 1h | Daily volume snapshot + secrets in a secret manager. |
| Production | 1h | 15 min | Hourly snapshot + 15-min journal sync + redundant secrets manager + tested restore drill. |
What to back up
The graph (MSK_GRAPH_STORAGE)
Snapshot the directory MSK_GRAPH_STORAGE points at. The graph supports lock-free reads, so a snapshot taken with a CSI volume snapshot, an EBS snapshot, or a filesystem-level snapshot (LVM, ZFS, Btrfs) is consistent.
If your platform doesn't support snapshots, the workspace can write a consistent point-in-time backup to MSK_GRAPH_BACKUP_FOLDER:
- Set
MSK_GRAPH_BACKUP_FOLDERto a path that's mounted to durable storage (a separate volume, an S3 bucket via a filesystem driver, etc.). - Schedule a backup task under Settings → Tasks with type
Backup. Recommended frequency: hourly for production. - Copy the resulting backup files off-host on a schedule.
The journal (MSK_GRAPH_JOURNAL_FOLDER)
If set, the journal contains the write log used to recover from crashes. Including it in your backup tightens your RPO between snapshots — restore can replay journal entries created after the last snapshot.
Secrets and configuration
The graph backup includes most workspace configuration. Outside of the graph, back up:
- All
MSK_*secrets —MSK_ADMIN_PASSWORD,MSK_JWT_KEY,MSK_LICENSE,MSK_GRAPH_MASTER_KEY, and any provider-side secrets you reference via*_FILEvariables. - TLS material if certificates are mounted into the container (
MSK_CERT_FILE,MSK_CERT_FILE_PRIVATE_KEY). - The Docker/Helm/Compose manifest that runs the workspace, so the restored environment looks the same.
- Custom interface and connector source code — these live in your own git repositories; make sure those are mirrored.
Daily backup procedure (containerized)
#!/usr/bin/env bash
set -euo pipefail
TS=$(date -u +%Y-%m-%dT%H%M%SZ)
DEST=/srv/backups/curiosity/$TS
mkdir -p "$DEST"
# 1) Quiesce nothing — graph snapshots are lock-free, but if you have heavy
# write activity you may want to pause ingestion for the duration.
docker exec curiosity sync || true
# 2) Snapshot
tar -C /srv/curiosity -czf "$DEST/graph.tar.gz" curiosity
# 3) Capture secrets from the secret manager
vault kv get -format=json secret/curiosity > "$DEST/secrets.json"
# 4) Verify integrity
test -s "$DEST/graph.tar.gz" && test -s "$DEST/secrets.json"
# 5) Off-host
aws s3 cp --recursive "$DEST" "s3://my-curiosity-backups/$TS/"
For Kubernetes deployments, the equivalent is a VolumeSnapshot plus a Secret snapshot. For cloud-managed disks (EBS, Azure Disk, Persistent Disk), use the platform's snapshot API instead of tar.
Restore procedure
You're restoring three things in this order: secrets, graph storage, then the running workspace.
- Provision secrets in the destination secret manager, matching the names referenced by your manifest.
- Restore the graph volume:
mkdir -p /srv/curiosity tar -C /srv/curiosity -xzf /srv/backups/curiosity/<timestamp>/graph.tar.gz - Start the workspace with the same
MSK_GRAPH_STORAGEpath and the same secrets:docker run --name curiosity \ -p 127.0.0.1:8080:8080 \ -v /srv/curiosity:/data \ -e MSK_GRAPH_STORAGE=/data/curiosity \ -e MSK_GRAPH_MASTER_KEY="$(vault kv get -field=master_key secret/curiosity)" \ -e MSK_JWT_KEY="$(vault kv get -field=jwt_key secret/curiosity)" \ -e MSK_ADMIN_PASSWORD="$(vault kv get -field=admin_password secret/curiosity)" \ curiosityai/curiosity:<same-version-as-source> - Wait for startup — the workspace replays journal entries before accepting traffic. Watch
docker logs -f curiosity.
Restore on the same Workspace version
Always restore onto the same container image version the backup was taken on, then upgrade afterward. Restoring across major versions is not supported.
Validating a restore
After restore, walk this checklist before declaring the environment ready:
- Sign in works with the admin account from the backup.
- Users and teams are present under Settings → Accounts.
- Node counts match the source for each major type:
return Q().StartAt("Ticket").Count(); - Search returns results for a smoke query you know should match.
- Vector search returns results — embeddings survived the snapshot.
- SSO works — sign in via each configured provider.
- Scheduled tasks are present and enabled at the expected cadence.
- Custom endpoints compile and respond on
POST /api/endpoints/run/<name>. - AI tools respond inside the chat view.
Restore drills
A restore you've never tested is not a restore. We recommend a quarterly drill in production:
- Spin up an isolated staging cluster.
- Restore the latest backup into it.
- Walk the validation checklist.
- Tear it down.
A documented, dated drill — even one page — is what auditors will ask for.
Cross-environment migration
The same procedure works for promoting data from staging to production, or for cloning production into a sandbox. Two caveats:
- Re-encrypt secrets for the destination environment. Don't share
MSK_JWT_KEYbetween environments — it would let a session token from one work in the other. - Strip personally identifiable information before cloning production into shared dev environments. The graph has no built-in PII redaction; you can run a one-off cleanup endpoint after the restore.
See also
- Upgrades and migrations — the procedure for moving between Workspace versions.
- Reindexing and re-embedding — when you need to rebuild indexes after a restore.
- Configuration reference — every variable referenced on this page.