Curiosity

AWS S3

Ingests objects from one or more S3 buckets and prefixes. Each object becomes a _FileEntry with a _Blob; bucket prefixes are mapped onto the workspace folder tree.

variant=info text="Cloud storage" variant=secondary text="API key"

What gets ingested

Element Mapped to
S3 object _FileEntry + _Blob
Object key prefix (a/b/c/file.pdf) Folder hierarchy of _Folder nodes linked by _HasChild
Object metadata (size, ETag, last-modified) Properties on _FileEntry
Object body _Blob — extracted for text, OCR, tables in the background

Skips re-ingest when an object's size has not changed since the last sync.

Authentication

  • Type: AWS Access Key ID + Secret Access Key (IAM user with s3:ListBucket and s3:GetObject).
  • The workspace stores credentials encrypted; the connector creates an S3 client per-bucket on each run.
flowchart LR User([Admin]) -->|paste credentials| WS[Workspace] WS -->|stores encrypted| Vault[(Secrets)] Vault -->|sign requests| C[S3 client] C -->|ListObjectsV2| Bucket[(S3 bucket)] Bucket --> C C --> Graph[(Workspace graph)]

Access control mapping

Source Carried into the graph?
Per-object S3 ACLs No — not read by the connector.
Per-bucket policies No — connectors run as the configured IAM user.
Configured workspace users / access groups Yes — the PermissionsSetting configured by the admin (users + access groups) is applied to every ingested file.

If you need per-object permissions, model them outside S3 (e.g. via prefix-based access groups configured in Curiosity).

Sync cadence

  • Default cron: 0 23 * * * (daily at 23:00 UTC).
  • Incremental sync: size-based — objects whose size matches the previously ingested version are skipped without re-downloading the body.
Large buckets

For large buckets with frequent churn, partition by prefix and configure one S3 integration per prefix — each runs and commits independently.