#
Sample Datasets
#
Sample Datasets
Sample datasets are a practical way to evaluate Curiosity Workspace and to teach teams the core patterns:
- ingest data into a graph schema
- configure search and facets
- enable embeddings and AI-assisted workflows
- build endpoints and interfaces
#
What makes a good sample dataset
A good sample dataset has:
- Multiple entity types (at least 3–5 node types)
- Relationships that support navigation and filtering (edges)
- Text fields suitable for search (titles, summaries, descriptions)
- A time dimension (timestamps) to test recency and time filters
- Enough volume to see ranking and performance behavior
#
Recommended sample dataset categories
- Support and case management
- tickets/cases, products, customers, teams, statuses
- Compliance and audit
- policies, controls, evidence, owners, exceptions
- Engineering knowledge base
- docs, code artifacts, services, incidents, runbooks
- Research
- papers, authors, topics, citations, institutions
#
How to use sample datasets in your docs/testing
Use sample datasets to validate:
- Graph navigation: do the relationships enable the workflows you want?
- Search relevance: can users find the right objects by keywords?
- Semantic recall: can vector search find “similar meaning” content?
- Facet usefulness: do the chosen facets match how users refine results?
#
Next steps
- Implement ingestion for a dataset: Connectors
- Make it searchable: Search → Text Search