WAL Replication

RocksDB-Sharp ships a small replication toolkit on top of the engine's GetUpdatesSince primitive. It lets you stream the Write-Ahead Log from a primary to one or more followers in (near) real time — sub-second lag for normal workloads, durable, and ordering-preserving.

This is what powers the ReplicationTest sample in the repository.

Upstream reference: Replication Helpers (GetUpdatesSince).

Building blocks

The library provides three small types in the RocksDbSharp namespace (under Replication/):

Type	Purpose
`ReplicationSource`	Wrap a primary `RocksDb`. Produces an initial checkpoint and a stream of WAL batches.
`ReplicationConsumer`	Wrap a follower `RocksDb`. Ingests WAL batches into the follower.
`AdaptiveCommitDelayController`	Back-pressure helper that throttles the primary based on follower lag.

flowchart LR P[Primary DB] -- "ReplicationSource.GetInitialState (checkpoint)" --> F[Follower DB] P -- "ReplicationSource.GetWalUpdates(seq)" --> F F -- "Reports last applied sequence" --> P P -- "AdaptiveCommitDelayController.DelayIfNeededAsync()" --> P

Configuring the primary

You need to keep WAL files around long enough for followers to catch up. Set SetWalTtlSeconds, SetMaxTotalWalSize, and call DisableFileDeletions() so background compaction doesn't garbage-collect a WAL a follower still needs.

var options = new DbOptions()
    .SetCreateIfMissing(true)
    .SetWalDir(walDir)
    .SetWalTtlSeconds(10)
    .SetMaxTotalWalSize(1024UL * 1024 * 10)
    .SetWalSizeLimitMB(1024UL * 1024 * 1)
    .SetWalCompression(Compression.Zstd);

using var primary = RocksDb.Open(options, dbPath);
primary.DisableFileDeletions();

Always re-enable file deletions on shutdown

DisableFileDeletions() prevents compaction from cleaning up obsolete SSTs. Pair it with EnableFileDeletions() (or just accept that the process owns the DB for its lifetime).

Shipping the initial state

Before streaming WAL, the follower needs to be at the same physical state as the primary at some sequence number. ReplicationSource.GetInitialState produces a checkpoint you can copy file-by-file:

var source = new ReplicationSource(primary);

using var session = source.GetInitialState("/tmp/rep-checkpoint");
foreach (var file in session.Files)        // walk the checkpoint directory
{
    SendOverWire(file);
}

The follower writes those files into its own data directory before opening:

ReplicationConsumer.IngestFile(receivedFile, followerPath);

This is just a normal RocksDB checkpoint (see Checkpoints guide) — hard-linked SSTs plus the manifest and current WAL.

Streaming the WAL

After the follower opens its DB, it asks for everything past its current sequence number:

ulong startSeq = followerDb.GetLatestSequenceNumber() + 1;

foreach (var batch in source.GetWalUpdates(startSeq))
{
    SendOverWire(batch);   // batch.SequenceNumber, batch.Data
}

On the follower:

var consumer = new ReplicationConsumer(followerDb);

while (await stream.MoveNext())
{
    var b = stream.Current;
    consumer.IngestBatch(b.SequenceNumber, b.Data);   // span overload, no alloc
}

IngestBatch deserialises the WAL bytes into a WriteBatch and applies it — preserving sequence numbers, ordering, and atomicity. Internally it's the same path as db.Write(new WriteBatch(bytes)).

Pooled batches (zero-allocation path)

For high throughput, GetPooledWalUpdates rents the byte buffer from ArrayPool<byte>.Shared so each batch costs no managed allocation. Consumers must return the buffer to the pool after the batch is on the wire.

foreach (var batch in source.GetPooledWalUpdates(startSeq))
{
    try
    {
        var span = batch.PooledData.AsSpan(0, batch.Length);
        Send(batch.SequenceNumber, span);
    }
    finally
    {
        ArrayPool<byte>.Shared.Return(batch.PooledData);
    }
}

Back-pressure with `AdaptiveCommitDelayController`

If the primary writes faster than the follower can apply, you want the primary to slow down — not the follower to fall apart. AdaptiveCommitDelayController adds a small delay to the primary based on follower lag.

var ctl = new AdaptiveCommitDelayController(
    replicaCount: 1,
    delayPerLagUnitMs: 5,
    lagUnit: 1000);          // 5 ms per 1000 sequence numbers behind

while (writing)
{
    primary.Put(k, v);

    if (++count % 10_000 == 0)
        await ctl.DelayIfNeededAsync();
}

The follower periodically reports its last-applied sequence number:

await client.ReportLastSyncSequenceNumber(replicaId: 0,
                                          followerDb.GetLatestSequenceNumber());

Putting it all together

A trimmed version of the sample in Tests/ReplicationTest/Program.cs:

// Primary side
var options = new DbOptions()
    .SetCreateIfMissing(true)
    .SetWalTtlSeconds(10)
    .SetMaxTotalWalSize(1024UL * 1024 * 10)
    .SetWalCompression(Compression.Zstd);

using var primary = RocksDb.Open(options, primaryPath);
primary.DisableFileDeletions();

var source = new ReplicationSource(primary);
var ctl    = new AdaptiveCommitDelayController(1, delayPerLagUnitMs: 5, lagUnit: 1000);

// Stream the initial state to the follower
using (var session = source.GetInitialState("/tmp/initial-state"))
{
    foreach (var f in session.Files)  SendFile(f);
}

// Stream live WAL
foreach (var batch in source.GetWalUpdates(startSeq: 0))
{
    SendBatch(batch);
}

// Follower side
using var follower = RocksDb.Open(new DbOptions().SetCreateIfMissing(true), followerPath);
var consumer = new ReplicationConsumer(follower);

while (await stream.MoveNext())
{
    var b = stream.Current;
    consumer.IngestBatch(b.SequenceNumber, b.Data);

    if (++n % 10_000 == 0)
    {
        await client.ReportLastSyncSequenceNumber(0, follower.GetLatestSequenceNumber());
    }
}

Inspecting WAL files

A RocksDbWalInspector (see Replication/RocksDbWalInspector.cs) walks WAL files on disk and emits their records. Useful for debugging out-of-band: replay, diff, or audit what was written without needing the primary to be running.

var inspector = new RocksDbWalInspector("/var/lib/myapp/state");
foreach (var record in inspector.Read())
{
    Console.WriteLine($"seq={record.SequenceNumber} size={record.Data.Length}");
}

Note: WAL compression must be off for the inspector to parse records — the primary in the example above is configured with SetWalCompression(Compression.Zstd), which the inspector can't read.

When to use it

Hot replicas / standby: stream the WAL to a remote node for fast failover.
Read fan-out: ship the WAL to many followers, route reads to whichever is closest.
Change data capture: every WriteBatch you ingest is the authoritative ordered change log — perfect input for CDC pipelines.

For simpler use cases — a single colocated follower over a shared filesystem — the secondary instance is the easier choice.