Security
This page sets the operational security baseline for a Curiosity Workspace deployment. Treat it as the minimum to satisfy; align with your organization's specific policies on top.
Security domains
| Domain | What it covers | Where to configure |
|---|---|---|
| Identity & auth | How users sign in, session lifetime, MFA | SSO providers |
| Authorization | What users can see and do once signed in | Permissions, Access Control Model |
| Data protection | Encryption in transit and at rest, secrets handling | This page + Configuration reference |
| Operational security | Patching, image hygiene, change control | Deployment, Upgrades |
| AI data handling | What leaves the workspace and reaches a model provider | This page + LLM Configuration |
| Audit & monitoring | Who did what, when | Monitoring |
Baseline practices
- TLS everywhere. Terminate TLS at a proxy or in-container; enable HSTS (
MSK_USE_HSTS=true) and HTTP→HTTPS redirect (MSK_REDIRECT_TO_HTTPS=true). - No default credentials. Always set
MSK_ADMIN_PASSWORDon first boot. After SSO is wired up, disable the localadminaccount. - Stable JWT key. Set
MSK_JWT_KEYexplicitly so tokens survive container restarts; rotate it deliberately, not accidentally. - At-rest encryption. Set
MSK_GRAPH_MASTER_KEYand back it up. Without it, encrypted properties cannot be decrypted after a restore. - Secret manager for every
MSK_*secret and every model-provider key (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager, Vault). - Least privilege on tokens. Use scoped API tokens for connectors and endpoint tokens for external systems. Never share admin tokens.
- Separate environments. Dev / staging / prod with distinct secrets and distinct JWT keys.
- Stronger auth for admins. Require SSO with MFA for any account that has admin permissions.
Network posture
- The workspace listens on a single HTTP port (
MSK_PORT, default8080). Terminate TLS in front of it. - Bind to
127.0.0.1in local development and to the proxy network in production. Never bind directly to a public interface without an authenticating proxy. - Set
MSK_PUBLIC_ADDRESSto the user-facing URL so generated links (SSO callbacks, email links) are correct. - Egress reaches your LLM/embedding provider. If your network policy requires it, route through
MSK_HTTP_PROXYand document the allowlist.
Data protection
- In transit: TLS to the workspace and to model providers (the workspace uses HTTPS when calling external APIs).
- At rest: graph storage is encrypted with
MSK_GRAPH_MASTER_KEY. Disk-level encryption (LUKS, EBS encryption, Azure Disk encryption, GCP CMEK) is recommended on top. - Backups: encrypt at the storage layer; replicate to a separate failure domain. See Backup and restore.
- Sensitive properties: mark properties that must be encrypted on disk with the SDK's encryption attribute (where supported by your schema).
Token discipline
The three token types and their use cases are documented in Token scopes. Two rules that fail-closed:
- Connectors get
ingestiontokens. Never grantadminto a connector token. - External systems get endpoint tokens. Path-scoped tokens are much smaller blast radius than API tokens.
Rotate tokens on a schedule. Rotating MSK_JWT_KEY invalidates every outstanding token at once — use this only when responding to a key-material compromise.
AI / model-provider data handling
Every call to an embedding or chat provider sends a payload to that provider. The choices that matter:
- Which fields are embedded. Configured under Settings → AI Settings → Embeddings. Don't embed fields that shouldn't leave the workspace.
- Which provider. Hosted (OpenAI, Anthropic), regional (Azure OpenAI), or self-hosted (a local OpenAI-compatible server). Self-hosted gives you the strongest data-residency guarantees.
- Provider data policies. Confirm in writing whether your provider trains on the data you send. Most enterprise tiers commit to no training.
The LLM Configuration page documents per-provider setup. For deeper context on what gets sent, see Multimodal Search (for OCR/STT payloads).
Endpoint security
Custom endpoints accept user input and run inside the workspace process. Defensive checklist:
- Validate every input at the top of the endpoint (
Body.FromJson<T>()only gets you typing; you still need to enforce limits, allowlists, and required fields). - Use
CreateSearchAsUserAsync(..., CurrentUser, ...)instead of the system-scoped variant unless you're knowingly exposing data the caller shouldn't normally see. - Avoid SSRF: if an endpoint makes outbound HTTP calls, allowlist destinations.
- Set request and response size limits. Don't return whole documents when a summary will do.
- Return safe errors. Surface a
traceIdand a generic message to the caller; log the full exception on the server side.
See Custom Endpoints.
AI tool security
Tools run within the user's security context and can call out to the graph and search engines. Two specific rules:
- Scope the tool to a single, clear intent. Vague tools are picked at the wrong times and end up exposing more than intended.
- Use
scope.CurrentUserfor retrieval. Permission-aware retrieval is the only thing standing between a tool call and a data leak.
See AI Tools and LLM Agents.
Audit and incident response
- Audit log forwarded to your SIEM (see Monitoring).
- Documented response plan for: token compromise, key compromise (
MSK_JWT_KEY,MSK_GRAPH_MASTER_KEY), unauthorized admin access, model-provider data leak. - Quarterly restore drill (see Backup and restore) — a recovery you've never tested is not a recovery.
Next steps
- Configure authorization boundaries: Permissions and Access Control Model (deep dive).
- Set up operational controls: Deployment.
- Pick the right token type for each integration: Token scopes.