Catalyst

Catalyst Documentation

A powerful and scalable pipeline execution framework.

Introduction to Catalyst

Catalyst is a high-performance Natural Language Processing (NLP) library for C#. Inspired by the design of spaCy, Catalyst provides a fast and modern way to process text in .NET.

Get Started

Start your journey with Catalyst.

Explore Models

Explore the available language models.

View Samples

See real-world examples and code snippets.

API Reference

Browse the detailed API documentation.

Key Features

  • Fast & Modern: Built in pure C#, supporting .NET Standard 2.0 and later.
  • Cross-Platform: Runs on Windows, Linux, macOS, and ARM.
  • Efficient Tokenization: Non-destructive, RegEx-free tokenization capable of processing over 1 million tokens per second.
  • Comprehensive NLP Tasks: Support for Tokenization, Sentence Detection, Part-of-Speech (POS) Tagging, Named Entity Recognition (NER), Language Detection, and more.
  • Pre-trained Models: Easy access to pre-trained models for various languages.

Core Concepts

Understanding Catalyst requires familiarity with its core building blocks:

1. Document

The Document class represents a piece of text to be processed. It holds the original text and all the metadata generated during processing (tokens, spans, entities, etc.). See Document for more details.

var doc = new Document("The quick brown fox jumps over the lazy dog", Language.English);

2. Span

A Span represents a segment of a Document, typically a sentence. A Document can contain multiple spans.

3. Token

A Token is the basic unit of text, such as a word or punctuation mark, within a Span.

4. Pipeline

A Pipeline is a sequence of processing steps (models) that are applied to a Document. A typical pipeline includes a tokenizer, a sentence detector, and a POS tagger.

var nlp = await Pipeline.ForAsync(Language.English);
nlp.ProcessSingle(doc);

5. Language

Catalyst uses the Language enum to specify the language of a document or model. It supports a wide range of languages.

Getting Started

To use Catalyst, you need to install the Catalyst NuGet package.

Language Packages

All language-specific data and models are provided as separate NuGet packages. You can find all available packages here.

Before using a language, you must install its respective NuGet package and register it in your code.

# Example: Adding English language support via dotnet CLI
dotnet add package Catalyst.Models.English
using Catalyst;
using Catalyst.Models;
using Mosaik.Core;

// Register the English language models
Catalyst.Models.English.Register();

// Configure storage for lazy-loading models
Storage.Current = new DiskStorage("catalyst-models");

// Create a pipeline for English
var nlp = await Pipeline.ForAsync(Language.English);

// Create and process a document
var doc = new Document("Hello, world!", Language.English);
nlp.ProcessSingle(doc);

// Access the results
Console.WriteLine(doc.ToJson());

Storage

Catalyst uses a storage mechanism to load and cache models. By default, it can download models from an online repository or load them from a local disk using DiskStorage.

Storage.Current = new DiskStorage("catalyst-models");

Supported Languages

Below is a list of supported languages and their corresponding NuGet packages.

Language NuGet Package
English Catalyst.Models.English
French Catalyst.Models.French
German Catalyst.Models.German
Spanish Catalyst.Models.Spanish
Italian Catalyst.Models.Italian
Dutch Catalyst.Models.Dutch
Portuguese Catalyst.Models.Portuguese
Russian Catalyst.Models.Russian
Chinese Catalyst.Models.Chinese
Japanese Catalyst.Models.Japanese
... and many more See all packages

For a full list of supported languages, check the Languages folder in the repository or search NuGet for Catalyst.Models.

© 2026 Catalyst. All rights reserved.