# Entity Extraction

# Entity Extraction

Entity extraction identifies meaningful spans of text (entities) and turns them into structured outputs. Examples:

  • products and device names
  • people and organizations
  • identifiers (ticket IDs, asset tags)
  • locations and topics (depending on your domain)

In Curiosity Workspace, extraction is typically configured via NLP pipelines and models, then optionally linked into the graph.

# Extraction vs linking

  • Extraction finds entities in text.
  • Linking connects extracted entities to graph nodes (or creates nodes) so you can:
    • navigate from text to entities
    • filter by entities
    • use entities as grounding context for AI workflows

# Common model types (conceptual)

  • Dictionary/spotter models
    • match known terms (product catalog, customer names)
  • Pattern models
    • capture structured forms (IDs, serial formats, codes)
  • ML models
    • detect entities that cannot be enumerated (optional, domain-dependent)

# Recommended workflow

  1. Start with extraction on a single high-value field (e.g., Summary).
  2. Run experiments to evaluate:
    • precision (how many captures are correct?)
    • recall (how much is missed?)
  3. Iterate on model coverage and exclusions.
  4. Enable linking into the graph only when extraction is reliable enough.

# Common pitfalls

  • High false positives: pattern models can over-capture; add constraints and test broadly.
  • Ambiguous names: dictionary models need aliases and disambiguation strategy.
  • No evaluation loop: extraction needs iteration with real examples.

# Next steps