Curiosity

Connector recipes

The curiosity-ai/connector-recipes repository contains a library of self-contained, runnable examples for ingesting common source formats — CSV, JSON, S3, SQL, REST APIs, MongoDB, Kafka, Parquet/Avro, GitHub GraphQL, PDFs, sitemaps, RSS — into a Curiosity knowledge-graph workspace.

Each recipe is intended as a starting point, not a finished connector: fork the one whose source format matches yours, keep the generic source reader, and rewrite only the schema and ingest method for your data.

Recipes are a developer-onboarding tool

Every recipe targets the same imaginary academic graph — students, universities, subjects, skills — and shows how multiple connectors cumulatively build one merged graph. That's the muscle you'll exercise on your real data.


What a recipe teaches

Every recipe answers the same three questions for one particular source format:

flowchart LR Q1[1 - How do I read this format?] --> Source[<Format>Source.cs] Q2[2 - How do I shape the graph?] --> Schema[Schema.cs] Q3[3 - How do I emit nodes & edges?] --> Ingest[<Domain>Ingest.cs] Source --> Glue[Program.cs - ~25 lines] Schema --> Glue Ingest --> Glue Glue --> Run[dotnet run]

The split is deliberate: the source-reader file is reusable across datasets; the schema and ingest files are the only parts you rewrite for your data.

<Sample>/
├── <Sample>.csproj
├── data/...                 # the sample input data
└── src/
    ├── <Format>Source.cs    ← generic: CSV / JSON / S3 / SQL reader   (KEEP)
    ├── Schema.cs            ← dataset-specific: [Node] / [Key] / edge constants  (REWRITE)
    ├── <Domain>Ingest.cs    ← dataset-specific: POCO + RegisterSchemaAsync + Ingest  (REWRITE)
    └── Program.cs           ← ~25-line glue: load → register → ingest → commit

All code lives in a single namespace: Curiosity.Library.Recipes.


Why multiple recipes ingest the same graph

Real-world knowledge rarely lives in one place. The information about a single concept — say, a university — is spread across an admissions spreadsheet, a skill-tagging JSON dump, an S3 bucket of materials, and a relational metadata DB. Each recipe owns part of the picture; they merge automatically on shared keys.

flowchart LR CSV[CsvSample\nStudents] JSON[JsonSample\nSkill taxonomy] S3[S3Sample\nBooks & subjects] SQL[SqlSample\nUniversities & faculty] REST[RestApiSample\nCourses & terms] PG[PostgresSample\nGrants] Mongo[MongoSample\nStudent profiles] GH[GitHubSample\nRepos & users] Map[Sitemap\nUniv. web pages] Kafka[Kafka\nLive enrollments] Parq[Parquet\nGrades] RSS[RSS\nUniv. news] CSV --> G[(One merged graph)] JSON --> G S3 --> G SQL --> G REST --> G PG --> G Mongo --> G GH --> G Map --> G Kafka --> G Parq --> G RSS --> G

When two recipes emit a node with the same [Key] (e.g. Skill.Name = "Python"), the workspace doesn't duplicate — it merges. New [Property] fields stack onto the same node and new edges attach to it.

Shared key Seeded by Enriched by
Skill.Name CsvSample JsonSample, MongoSample, GitHubSample
Subject.Name CsvSample S3Sample, RestApiSample, ParquetSample
University.Name CsvSample SqlSample, SitemapSample, RssSample
Student.Id CsvSample MongoSample, KafkaSample, ParquetSample
Course.Code RestApiSample KafkaSample, ParquetSample
Faculty.Email CsvSample (Advisor), SqlSample (Faculty) RestApiSample, PostgresSample

Running a recipe

Each recipe is an independent .NET 10 console app.

export CURIOSITY_URL=http://localhost:8080/
export CURIOSITY_API_TOKEN=<token from "Manage → API integrations">

# Run any subset, in any order — they merge automatically on shared keys.
dotnet run --project CsvSample
dotnet run --project JsonSample
dotnet run --project S3Sample

Verify the result in the workspace Manage → Shell:

return Q().EmitNeighborsSummary();

Two environment variables every recipe needs:

Variable Purpose
CURIOSITY_URL URL of the workspace (default http://localhost:8080/).
CURIOSITY_API_TOKEN API token from Manage → API integrations → Create API Token with ingestion scope.

A few recipes need extra environment variables — they're documented per-recipe below.


Starting your own from a recipe

The five steps to convert any recipe into your own connector

Follow these in order — only the third and fourth steps actually require knowing your data.

1

Copy the recipe folder

Pick the recipe whose source format matches yours; copy the folder under a new name.

2

Keep <Format>Source.cs as-is

It's dataset-agnostic.

3

Rewrite Schema.cs

Define your node types with [Key] / [Property] and the edge constants you'll use to link them.

4

Rewrite <Domain>Ingest.cs

Three pieces: a POCO that mirrors a row/document, RegisterSchemaAsync listing every node type, and an Ingest method that emits nodes (graph.TryAdd / graph.AddOrUpdate) and edges (graph.Link(a, b, fwd, rev)).

5

Tweak Program.cs

Connector display name, default data path, and the type passed to the loader.

For composite keys (<University>/<Department>) build the string in the ingest method and assign it to a [Key] property. To reference a node whose key is built elsewhere in the run, use Node.FromKey(nameof(Nodes.YourType), keyValue).


Browse recipes

Each card links to a dedicated page with the schema, an example ingest method, and any source-specific config it needs.

Core samples — academic graph

CSV

Flat rows exploded into typed nodes — students, universities, subjects.

JSON

Nested documents with cross-references — skills, categories, prereqs.

S3

Object-store prefixes as document folders — subjects, books, authors.

SQL / SQLite

Relational joins into the graph — universities, faculty, programs.

REST API

Cursor-paginated SaaS endpoint with bearer-token auth — courses, terms.

PostgreSQL / MySQL

Watermark-based incremental sync on a production SQL server — grants.

MongoDB

Snapshot + change stream — schemaless student profiles into typed nodes.

GitHub GraphQL

GraphQL pagination over a naturally graph-shaped source — repos, PRs.

Sitemap crawl

Polite web crawl with URL canonicalization and content-hash dedup.

Kafka / CDC

Idempotent stream consumer with composite keys — live enrollments.

Parquet / Avro

Columnar lake files with column projection — student grades.

RSS / Atom

Polling feed connector with entry-ID dedup — university news.

Domain sample — industrial

PDF + Office

PDF/DOCX extraction, sidecar metadata, and embedding-friendly chunking.


Anatomy of a recipe (one diagram)

flowchart TB subgraph FormatLayer [Reusable: source format] Reader[<Format>Source.cs] end subgraph DomainLayer [Dataset-specific: schema + ingest] Schema[Schema.cs] Ingest[<Domain>Ingest.cs] end subgraph Glue [~25-line glue] Prog[Program.cs] end Prog -->|1 - load rows / docs| Reader Prog -->|2 - register schemas| Schema Prog -->|3 - emit nodes & edges| Ingest Prog -->|4 - CommitPendingAsync| Workspace[(Curiosity Workspace)] Ingest -.uses.-> Schema