Curiosity

Connector recipes

The curiosity-ai/connector-recipes repository contains a library of self-contained, runnable examples for ingesting common source formats — CSV, JSON, S3, SQL, REST APIs, MongoDB, Kafka, Parquet/Avro, GitHub GraphQL, PDFs, sitemaps, RSS — into a Curiosity knowledge-graph workspace.

Each recipe is intended as a starting point, not a finished connector: fork the one whose source format matches yours, keep the generic source reader, and rewrite only the schema and ingest method for your data.

Recipes are a developer-onboarding tool

Every recipe targets the same imaginary academic graph — students, universities, subjects, skills — and shows how multiple connectors cumulatively build one merged graph. That's the muscle you'll exercise on your real data.

What a recipe teaches

Every recipe answers the same three questions for one particular source format:

flowchart LR Q1[1 - How do I read this format?] --> Source[<Format>Source.cs] Q2[2 - How do I shape the graph?] --> Schema[Schema.cs] Q3[3 - How do I emit nodes & edges?] --> Ingest[<Domain>Ingest.cs] Source --> Glue[Program.cs - ~25 lines] Schema --> Glue Ingest --> Glue Glue --> Run[dotnet run]

The split is deliberate: the source-reader file is reusable across datasets; the schema and ingest files are the only parts you rewrite for your data.

<Sample>/
├── <Sample>.csproj
├── data/...                 # the sample input data
└── src/
    ├── <Format>Source.cs    ← generic: CSV / JSON / S3 / SQL reader   (KEEP)
    ├── Schema.cs            ← dataset-specific: [Node] / [Key] / edge constants  (REWRITE)
    ├── <Domain>Ingest.cs    ← dataset-specific: POCO + RegisterSchemaAsync + Ingest  (REWRITE)
    └── Program.cs           ← ~25-line glue: load → register → ingest → commit

All code lives in a single namespace: Curiosity.Library.Recipes.

Why multiple recipes ingest the same graph

Real-world knowledge rarely lives in one place. The information about a single concept — say, a university — is spread across an admissions spreadsheet, a skill-tagging JSON dump, an S3 bucket of materials, and a relational metadata DB. Each recipe owns part of the picture; they merge automatically on shared keys.

flowchart LR CSV[CsvSample\nStudents] JSON[JsonSample\nSkill taxonomy] S3[S3Sample\nBooks & subjects] SQL[SqlSample\nUniversities & faculty] REST[RestApiSample\nCourses & terms] PG[PostgresSample\nGrants] Mongo[MongoSample\nStudent profiles] GH[GitHubSample\nRepos & users] Map[Sitemap\nUniv. web pages] Kafka[Kafka\nLive enrollments] Parq[Parquet\nGrades] RSS[RSS\nUniv. news] CSV --> G[(One merged graph)] JSON --> G S3 --> G SQL --> G REST --> G PG --> G Mongo --> G GH --> G Map --> G Kafka --> G Parq --> G RSS --> G

When two recipes emit a node with the same [Key] (e.g. Skill.Name = "Python"), the workspace doesn't duplicate — it merges. New [Property] fields stack onto the same node and new edges attach to it.

Shared key	Seeded by	Enriched by
`Skill.Name`	`CsvSample`	`JsonSample`, `MongoSample`, `GitHubSample`
`Subject.Name`	`CsvSample`	`S3Sample`, `RestApiSample`, `ParquetSample`
`University.Name`	`CsvSample`	`SqlSample`, `SitemapSample`, `RssSample`
`Student.Id`	`CsvSample`	`MongoSample`, `KafkaSample`, `ParquetSample`
`Course.Code`	`RestApiSample`	`KafkaSample`, `ParquetSample`
`Faculty.Email`	`CsvSample` (Advisor), `SqlSample` (Faculty)	`RestApiSample`, `PostgresSample`

Running a recipe

Each recipe is an independent .NET 10 console app.

export CURIOSITY_URL=http://localhost:8080/
export CURIOSITY_API_TOKEN=<token from "Manage → API integrations">

# Run any subset, in any order — they merge automatically on shared keys.
dotnet run --project CsvSample
dotnet run --project JsonSample
dotnet run --project S3Sample

Verify the result in the workspace Manage → Shell:

return Q().EmitNeighborsSummary();

Two environment variables every recipe needs:

Variable	Purpose
`CURIOSITY_URL`	URL of the workspace (default `http://localhost:8080/`).
`CURIOSITY_API_TOKEN`	API token from Manage → API integrations → Create API Token with `ingestion` scope.

A few recipes need extra environment variables — they're documented per-recipe below.

Starting your own from a recipe

The five steps to convert any recipe into your own connector

Follow these in order — only the third and fourth steps actually require knowing your data.

Copy the recipe folder

Pick the recipe whose source format matches yours; copy the folder under a new name.

Keep `<Format>Source.cs` as-is

It's dataset-agnostic.

Rewrite `Schema.cs`

Define your node types with [Key] / [Property] and the edge constants you'll use to link them.

Rewrite `<Domain>Ingest.cs`

Three pieces: a POCO that mirrors a row/document, RegisterSchemaAsync listing every node type, and an Ingest method that emits nodes (graph.TryAdd / graph.AddOrUpdate) and edges (graph.Link(a, b, fwd, rev)).

Tweak `Program.cs`

Connector display name, default data path, and the type passed to the loader.

For composite keys (<University>/<Department>) build the string in the ingest method and assign it to a [Key] property. To reference a node whose key is built elsewhere in the run, use Node.FromKey(nameof(Nodes.YourType), keyValue).

Browse recipes

Each card links to a dedicated page with the schema, an example ingest method, and any source-specific config it needs.

Core samples — academic graph

CSV

Flat rows exploded into typed nodes — students, universities, subjects.

JSON

Nested documents with cross-references — skills, categories, prereqs.

S3

Object-store prefixes as document folders — subjects, books, authors.

SQL / SQLite

Relational joins into the graph — universities, faculty, programs.

REST API

Cursor-paginated SaaS endpoint with bearer-token auth — courses, terms.

PostgreSQL / MySQL

Watermark-based incremental sync on a production SQL server — grants.

MongoDB

Snapshot + change stream — schemaless student profiles into typed nodes.

GitHub GraphQL

GraphQL pagination over a naturally graph-shaped source — repos, PRs.

Sitemap crawl

Polite web crawl with URL canonicalization and content-hash dedup.

Kafka / CDC

Idempotent stream consumer with composite keys — live enrollments.

Parquet / Avro

Columnar lake files with column projection — student grades.

RSS / Atom

Polling feed connector with entry-ID dedup — university news.

Domain sample — industrial

PDF + Office

PDF/DOCX extraction, sidecar metadata, and embedding-friendly chunking.

Anatomy of a recipe (one diagram)

flowchart TB subgraph FormatLayer [Reusable: source format] Reader[<Format>Source.cs] end subgraph DomainLayer [Dataset-specific: schema + ingest] Schema[Schema.cs] Ingest[<Domain>Ingest.cs] end subgraph Glue [~25-line glue] Prog[Program.cs] end Prog -->|1 - load rows / docs| Reader Prog -->|2 - register schemas| Schema Prog -->|3 - emit nodes & edges| Ingest Prog -->|4 - CommitPendingAsync| Workspace[(Curiosity Workspace)] Ingest -.uses.-> Schema