Curiosity Workspaces

# Custom NLP Rules

# Custom NLP Rules

Most production NLP setups require domain tuning. “Custom NLP rules” refers to the mechanisms you use to make extraction and linking align with your data and vocabulary.

# Why custom rules are needed

Off-the-shelf NLP is rarely enough for:

product catalogs (aliases, abbreviations, versions)
internal identifiers (ticket/asset formats)
domain jargon (acronyms, shorthand)
noisy text (chat logs, pasted stack traces)

# Common rule categories

Aliases and normalization
- map spelling variants and formatting differences to canonical forms
Dictionaries / spotters
- curate controlled vocabularies from your graph (or external lists)
Patterns
- capture IDs and semi-structured entities using shape/pattern rules
Exclusions
- prevent over-capture in common false-positive contexts

# Recommended process

Collect 50–200 representative text samples.
Run extraction experiments.
Add:
- missing synonyms/aliases
- pattern constraints
- exclusions for false positives
Re-run experiments and measure improvement.
Enable linking only when precision is acceptable for your use case.

# Next steps

Configure pipelines and extraction: Entity Extraction
Turn extracted concepts into facets for retrieval: Search Optimization