Deploying on Kubernetes

Kubernetes is the recommended platform for production Curiosity Workspace deployments. The image is the same one used in Docker; what changes is how persistence, secrets, and traffic ingress are wired up.

This page walks through a complete StatefulSet-based deployment with TLS at an Ingress, plus options for Helm and Kustomize.

Why a StatefulSet

Workspace is a stateful single-writer application. It holds the graph in memory and persists it to disk under MSK_GRAPH_STORAGE. A StatefulSet gives you:

a stable pod name (curiosity-0) you can reference in backups and runbooks;
a PersistentVolumeClaim that follows the pod across reschedules;
ordered, controlled restarts on updates.

Deployment + ReadWriteOnce PVC also works for a single replica, but StatefulSet is the convention.

Prerequisites

A working Kubernetes cluster (kubectl context configured).
A storage class that supports ReadWriteOnce block storage. AWS EBS, Azure Disk, GCP Persistent Disk all work.
An Ingress controller (NGINX, Traefik, ALB, AGIC, etc.) and a way to terminate TLS in front of the workspace.
A secret store (sealed secrets, External Secrets Operator, Vault Agent, AWS Secrets Manager integration, …).
16+ GB RAM available on the target node, ideally with anti-affinity rules to keep the pod off noisy neighbors.

Complete manifest

This manifest deploys one Workspace pod with persistent storage, a Service, an Ingress with TLS, secrets, healthchecks, and resource requests. Save it as curiosity.yaml and apply with kubectl apply -f.

apiVersion: v1
kind: Namespace
metadata:
  name: curiosity
---
apiVersion: v1
kind: Secret
metadata:
  name: curiosity-secrets
  namespace: curiosity
type: Opaque
stringData:
  MSK_ADMIN_PASSWORD: <generate-with-secret-store>
  MSK_JWT_KEY:        <32+-bytes-from-secret-store>
  MSK_GRAPH_MASTER_KEY: <32+-bytes-from-secret-store>
  MSK_LICENSE:        <your-license-token>
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: curiosity-config
  namespace: curiosity
data:
  MSK_GRAPH_STORAGE:        "/data/curiosity"
  MSK_GRAPH_BACKUP_FOLDER:  "/data/backups"
  MSK_GRAPH_JOURNAL_FOLDER: "/data/journal"
  MSK_PUBLIC_ADDRESS:       "https://workspace.example.com"
  MSK_USE_HSTS:             "true"
  MSK_REDIRECT_TO_HTTPS:    "false"   # TLS is terminated at the Ingress
---
apiVersion: v1
kind: Service
metadata:
  name: curiosity
  namespace: curiosity
spec:
  selector:
    app: curiosity
  ports:
    - name: http
      port: 8080
      targetPort: 8080
  clusterIP: None     # headless: gives the pod a stable DNS name
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: curiosity
  namespace: curiosity
spec:
  serviceName: curiosity
  replicas: 1
  podManagementPolicy: OrderedReady
  updateStrategy:
    type: OnDelete         # require an explicit delete to roll the pod
  selector:
    matchLabels:
      app: curiosity
  template:
    metadata:
      labels:
        app: curiosity
    spec:
      terminationGracePeriodSeconds: 120
      securityContext:
        runAsUser: 10000
        runAsGroup: 10000
        fsGroup: 10000
        runAsNonRoot: true
      containers:
        - name: curiosity
          image: curiosityai/curiosity:v1.42.0
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 8080
          envFrom:
            - configMapRef:
                name: curiosity-config
            - secretRef:
                name: curiosity-secrets
          volumeMounts:
            - name: data
              mountPath: /data
          resources:
            requests:
              cpu: "4"
              memory: 16Gi
            limits:
              cpu: "8"
              memory: 32Gi
          readinessProbe:
            httpGet:
              path: /api/login/check
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            failureThreshold: 6
          livenessProbe:
            httpGet:
              path: /api/login/check
              port: http
            initialDelaySeconds: 120
            periodSeconds: 30
            failureThreshold: 5
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: [ReadWriteOnce]
        storageClassName: gp3           # change per platform
        resources:
          requests:
            storage: 200Gi
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: curiosity
  namespace: curiosity
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/proxy-body-size: "100m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
spec:
  ingressClassName: nginx
  tls:
    - hosts: [workspace.example.com]
      secretName: curiosity-tls
  rules:
    - host: workspace.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: curiosity
                port:
                  number: 8080

Apply, then watch readiness:

kubectl apply -f curiosity.yaml
kubectl -n curiosity rollout status statefulset/curiosity
kubectl -n curiosity get pods,svc,ingress

Tuning per platform

Platform	Recommended `storageClassName`	Notes
AWS EKS (EC2 nodes)	`gp3` (EBS CSI)	Provision the EBS CSI driver. See AWS.
AWS EKS (Fargate)	—	Fargate has no EBS — fall back to EFS (`ReadWriteMany`) and a `Deployment`.
Azure AKS	`managed-csi`	Default storage class works. See Azure.
Google GKE	`premium-rwo` or `standard-rwo`	Use the regional disk classes for HA. See GCP.
OpenShift	`ocs-storagecluster-ceph-rbd` (or platform default)	Add an appropriate `SecurityContextConstraint`. See OpenShift.

Helm and Kustomize

Curiosity does not yet ship an official Helm chart. Two community-friendly approaches:

Kustomize — keep the manifest above as a base, and overlay per environment with kustomization.yaml patches that change image:, replicas:, secrets, and resource limits.
Helm — wrap the manifest in a chart of your own. The variables that change between environments are: image tag, storage class and size, ingress hostname, secret references, resource requests/limits.

Operations

Backups: snapshot the PVC via your CSI's VolumeSnapshot or rely on MSK_GRAPH_BACKUP_FOLDER written to a separate PVC that you can replicate off-cluster. See Backup and restore.
Logs: the container writes to stdout/stderr; collect via your platform's aggregator (CloudWatch, Stackdriver, ELK).
Metrics: per-endpoint and per-tool metrics are available via /api/endpoints/metrics and /api/chatai/tools/metrics (admin-scoped). See Monitoring.
Scaling: the Workspace runs as a single writer per graph. Scale up the pod (more CPU/RAM) before scaling out.

Upgrades

Roll the pod by setting a new image tag and deleting the existing pod (because updateStrategy: OnDelete):

kubectl -n curiosity set image statefulset/curiosity \
  curiosity=curiosityai/curiosity:v1.43.0
kubectl -n curiosity delete pod curiosity-0

The new pod starts, attaches the existing PVC, and replays its journal before becoming ready. Always take a snapshot first and follow Upgrades and migrations.

Common pitfalls

ReadWriteMany storage with this manifest — the workspace is a single writer. Pin to ReadWriteOnce block storage unless you're explicitly running on a shared filesystem like EFS, and even then use a Deployment with 1 replica.
PVC that's too small — graph storage grows with data + indexes + embeddings. Resize the PVC ahead of capacity events; expanding a PVC online requires the storage class to allow it (allowVolumeExpansion: true).
Missing MSK_PUBLIC_ADDRESS — when behind an Ingress, generated links (SSO callbacks, email links) will be wrong without this set.
Aggressive liveness probe — boot can take a minute or two on cold caches. Keep initialDelaySeconds generous (120s) to avoid restart loops.
RollingUpdate with one replica — the new pod can't attach to the PVC while the old one holds it. Use OnDelete or set maxUnavailable: 1.