# Custom NLP Rules

# Custom NLP Rules

Most production NLP setups require domain tuning. “Custom NLP rules” refers to the mechanisms you use to make extraction and linking align with your data and vocabulary.

# Why custom rules are needed

Off-the-shelf NLP is rarely enough for:

  • product catalogs (aliases, abbreviations, versions)
  • internal identifiers (ticket/asset formats)
  • domain jargon (acronyms, shorthand)
  • noisy text (chat logs, pasted stack traces)

# Common rule categories

  • Aliases and normalization
    • map spelling variants and formatting differences to canonical forms
  • Dictionaries / spotters
    • curate controlled vocabularies from your graph (or external lists)
  • Patterns
    • capture IDs and semi-structured entities using shape/pattern rules
  • Exclusions
    • prevent over-capture in common false-positive contexts

# Recommended process

  1. Collect 50–200 representative text samples.
  2. Run extraction experiments.
  3. Add:
    • missing synonyms/aliases
    • pattern constraints
    • exclusions for false positives
  4. Re-run experiments and measure improvement.
  5. Enable linking only when precision is acceptable for your use case.

# Next steps