Curiosity - AWS S3

AWS S3

Ingests objects from one or more S3 buckets and prefixes. Each object becomes a _FileEntry with a _Blob; bucket prefixes are mapped onto the workspace folder tree.

variant=info text="Cloud storage" variant=secondary text="API key"

What gets ingested

Element	Mapped to
S3 object	`_FileEntry` + `_Blob`
Object key prefix (`a/b/c/file.pdf`)	Folder hierarchy of `_Folder` nodes linked by `_HasChild`
Object metadata (size, ETag, last-modified)	Properties on `_FileEntry`
Object body	`_Blob` — extracted for text, OCR, tables in the background

Skips re-ingest when an object's size has not changed since the last sync.

Authentication

Type: AWS Access Key ID + Secret Access Key (IAM user with s3:ListBucket and s3:GetObject).
The workspace stores credentials encrypted; the connector creates an S3 client per-bucket on each run.

Access control mapping

Source	Carried into the graph?
Per-object S3 ACLs	No — not read by the connector.
Per-bucket policies	No — connectors run as the configured IAM user.
Configured workspace users / access groups	Yes — the `PermissionsSetting` configured by the admin (users + access groups) is applied to every ingested file.

If you need per-object permissions, model them outside S3 (e.g. via prefix-based access groups configured in Curiosity).

Sync cadence

Default cron: 0 23 * * * (daily at 23:00 UTC).
Incremental sync: size-based — objects whose size matches the previously ingested version are skipped without re-downloading the body.

Large buckets

For large buckets with frequent churn, partition by prefix and configure one S3 integration per prefix — each runs and commits independently.