AWS S3
Ingests objects from one or more S3 buckets and prefixes. Each object becomes a _FileEntry with a _Blob; bucket prefixes are mapped onto the workspace folder tree.
variant=info text="Cloud storage" variant=secondary text="API key"
What gets ingested
| Element | Mapped to |
|---|---|
| S3 object | _FileEntry + _Blob |
Object key prefix (a/b/c/file.pdf) |
Folder hierarchy of _Folder nodes linked by _HasChild |
| Object metadata (size, ETag, last-modified) | Properties on _FileEntry |
| Object body | _Blob — extracted for text, OCR, tables in the background |
Skips re-ingest when an object's size has not changed since the last sync.
Authentication
- Type: AWS Access Key ID + Secret Access Key (IAM user with
s3:ListBucketands3:GetObject). - The workspace stores credentials encrypted; the connector creates an S3 client per-bucket on each run.
Access control mapping
| Source | Carried into the graph? |
|---|---|
| Per-object S3 ACLs | No — not read by the connector. |
| Per-bucket policies | No — connectors run as the configured IAM user. |
| Configured workspace users / access groups | Yes — the PermissionsSetting configured by the admin (users + access groups) is applied to every ingested file. |
If you need per-object permissions, model them outside S3 (e.g. via prefix-based access groups configured in Curiosity).
Sync cadence
- Default cron:
0 23 * * *(daily at 23:00 UTC). - Incremental sync: size-based — objects whose size matches the previously ingested version are skipped without re-downloading the body.
Large buckets
For large buckets with frequent churn, partition by prefix and configure one S3 integration per prefix — each runs and commits independently.