Block-based table options

The default SST file format in RocksDB is the block-based table — fixed-size data blocks plus an index, optionally guarded by a Bloom filter. The BlockBasedTableOptions class tunes that format.

These options live on the column family — attach them via ColumnFamilyOptions.SetBlockBasedTableFactory.

Upstream reference: Block Cache, Bloom Filter, Tuning Guide.

Building a table options object

var blockCache = Cache.CreateLru(capacity: 256UL * 1024 * 1024);
var bloom      = BloomFilterPolicy.Create(bitsPerKey: 10);

var table = new BlockBasedTableOptions()
    .SetBlockSize(16 * 1024)
    .SetBlockCache(blockCache)
    .SetFilterPolicy(bloom)
    .SetWholeKeyFiltering(true)
    .SetCacheIndexAndFilterBlocks(true)
    .SetPinL0FilterAndIndexBlocksInCache(true)
    .SetIndexType(BlockBasedTableIndexType.BinarySearch);

var cfOpts = new ColumnFamilyOptions().SetBlockBasedTableFactory(table);

Setters

Setter	What it does
`SetBlockSize(ulong)`	Target uncompressed size of a data block. Default 4 KiB. Bigger blocks → smaller index, more I/O per read.
`SetBlockSizeDeviation(int)`	Tolerance around `BlockSize` before starting a new block.
`SetBlockRestartInterval(int)`	Restart interval for prefix-encoded keys in a block.
`SetFilterPolicy(BloomFilterPolicy)`	Bloom filter for the table. Massive speed-up for `Get` of missing keys.
`SetNoBlockCache(bool)`	Disable the block cache entirely.
`SetBlockCache(Cache)`	Attach a shared block cache (typically LRU).
`SetWholeKeyFiltering(bool)`	Bloom the whole key (default) vs. prefix-only when a `prefix_extractor` is configured.
`SetFormatVersion(int)`	SST format version. Newer = better, but readable only by newer RocksDB.
`SetIndexType(BlockBasedTableIndexType)`	`BinarySearch`, `HashSearch`, or `TwoLevelIndexSearch`.
`SetCacheIndexAndFilterBlocks(bool)`	Store index/filter blocks in the block cache (recommended for large DBs).
`SetPinL0FilterAndIndexBlocksInCache(bool)`	Pin L0 index/filter blocks so they're never evicted.

Block cache

The block cache is a process-wide LRU that holds uncompressed blocks. Sharing one cache across all CFs gives the OS a clearer signal about working set size.

var shared = Cache.CreateLru(capacity: 512UL * 1024 * 1024);

var hot   = new BlockBasedTableOptions().SetBlockCache(shared);
var cold  = new BlockBasedTableOptions().SetBlockCache(shared);

For huge databases, also set SetCacheIndexAndFilterBlocks(true) so the index/filter blocks are subject to LRU eviction too — otherwise they grow with the DB and starve data blocks.

Upstream reference: Block Cache wiki.

Bloom filters

var bloom = BloomFilterPolicy.Create(bitsPerKey: 10);

var table = new BlockBasedTableOptions()
    .SetFilterPolicy(bloom)
    .SetWholeKeyFiltering(true);

bitsPerKey controls the trade-off:

Bits/key	False positive rate
6	~5%
10 (default)	~1%
16	~0.04%

Bloom filters are tiny relative to data and turn most missing-key reads into a single cache hit. For databases with lots of point lookups, always enable Bloom filtering.

Upstream reference: RocksDB Bloom Filter.

Index type

Type	Use it for
`BinarySearch` (default)	Most workloads.
`HashSearch`	Many point lookups on a fixed-prefix keyspace. Requires a `prefix_extractor`.
`TwoLevelIndexSearch`	Very large SST files where the index would otherwise dominate.

Choosing block size

Small blocks (4–8 KiB) reduce I/O amplification but bloat the index.
Large blocks (32–64 KiB) shrink the index and improve sequential scans, at the cost of more I/O per random read.

Rule of thumb: leave it at the default (4 KiB) unless profiling tells you otherwise.

Worked example: point-lookup heavy

var cache = Cache.CreateLru(capacity: 1024UL * 1024 * 1024);   // 1 GiB

var table = new BlockBasedTableOptions()
    .SetBlockSize(8 * 1024)
    .SetBlockCache(cache)
    .SetFilterPolicy(BloomFilterPolicy.Create(10))
    .SetCacheIndexAndFilterBlocks(true)
    .SetPinL0FilterAndIndexBlocksInCache(true);

var cfOpts = new ColumnFamilyOptions()
    .OptimizeForPointLookup(blockCacheSizeMb: 1024)
    .SetBlockBasedTableFactory(table);