#
Custom NLP Rules
#
Custom NLP Rules
Most production NLP setups require domain tuning. “Custom NLP rules” refers to the mechanisms you use to make extraction and linking align with your data and vocabulary.
#
Why custom rules are needed
Off-the-shelf NLP is rarely enough for:
- product catalogs (aliases, abbreviations, versions)
- internal identifiers (ticket/asset formats)
- domain jargon (acronyms, shorthand)
- noisy text (chat logs, pasted stack traces)
#
Common rule categories
- Aliases and normalization
- map spelling variants and formatting differences to canonical forms
- Dictionaries / spotters
- curate controlled vocabularies from your graph (or external lists)
- Patterns
- capture IDs and semi-structured entities using shape/pattern rules
- Exclusions
- prevent over-capture in common false-positive contexts
#
Recommended process
- Collect 50–200 representative text samples.
- Run extraction experiments.
- Add:
- missing synonyms/aliases
- pattern constraints
- exclusions for false positives
- Re-run experiments and measure improvement.
- Enable linking only when precision is acceptable for your use case.
#
Next steps
- Configure pipelines and extraction: Entity Extraction
- Turn extracted concepts into facets for retrieval: Search Optimization