Block-based table options
The default SST file format in RocksDB is the block-based table — fixed-size data blocks plus an index, optionally guarded by a Bloom filter. The BlockBasedTableOptions class tunes that format.
These options live on the column family — attach them via ColumnFamilyOptions.SetBlockBasedTableFactory.
Upstream reference: Block Cache, Bloom Filter, Tuning Guide.
Building a table options object
var blockCache = Cache.CreateLru(capacity: 256UL * 1024 * 1024);
var bloom = BloomFilterPolicy.Create(bitsPerKey: 10);
var table = new BlockBasedTableOptions()
.SetBlockSize(16 * 1024)
.SetBlockCache(blockCache)
.SetFilterPolicy(bloom)
.SetWholeKeyFiltering(true)
.SetCacheIndexAndFilterBlocks(true)
.SetPinL0FilterAndIndexBlocksInCache(true)
.SetIndexType(BlockBasedTableIndexType.BinarySearch);
var cfOpts = new ColumnFamilyOptions().SetBlockBasedTableFactory(table);
Setters
| Setter | What it does |
|---|---|
SetBlockSize(ulong) |
Target uncompressed size of a data block. Default 4 KiB. Bigger blocks → smaller index, more I/O per read. |
SetBlockSizeDeviation(int) |
Tolerance around BlockSize before starting a new block. |
SetBlockRestartInterval(int) |
Restart interval for prefix-encoded keys in a block. |
SetFilterPolicy(BloomFilterPolicy) |
Bloom filter for the table. Massive speed-up for Get of missing keys. |
SetNoBlockCache(bool) |
Disable the block cache entirely. |
SetBlockCache(Cache) |
Attach a shared block cache (typically LRU). |
SetWholeKeyFiltering(bool) |
Bloom the whole key (default) vs. prefix-only when a prefix_extractor is configured. |
SetFormatVersion(int) |
SST format version. Newer = better, but readable only by newer RocksDB. |
SetIndexType(BlockBasedTableIndexType) |
BinarySearch, HashSearch, or TwoLevelIndexSearch. |
SetCacheIndexAndFilterBlocks(bool) |
Store index/filter blocks in the block cache (recommended for large DBs). |
SetPinL0FilterAndIndexBlocksInCache(bool) |
Pin L0 index/filter blocks so they're never evicted. |
Block cache
The block cache is a process-wide LRU that holds uncompressed blocks. Sharing one cache across all CFs gives the OS a clearer signal about working set size.
var shared = Cache.CreateLru(capacity: 512UL * 1024 * 1024);
var hot = new BlockBasedTableOptions().SetBlockCache(shared);
var cold = new BlockBasedTableOptions().SetBlockCache(shared);
For huge databases, also set SetCacheIndexAndFilterBlocks(true) so the index/filter blocks are subject to LRU eviction too — otherwise they grow with the DB and starve data blocks.
Upstream reference: Block Cache wiki.
Bloom filters
var bloom = BloomFilterPolicy.Create(bitsPerKey: 10);
var table = new BlockBasedTableOptions()
.SetFilterPolicy(bloom)
.SetWholeKeyFiltering(true);
bitsPerKey controls the trade-off:
| Bits/key | False positive rate |
|---|---|
| 6 | ~5% |
| 10 (default) | ~1% |
| 16 | ~0.04% |
Bloom filters are tiny relative to data and turn most missing-key reads into a single cache hit. For databases with lots of point lookups, always enable Bloom filtering.
Upstream reference: RocksDB Bloom Filter.
Index type
| Type | Use it for |
|---|---|
BinarySearch (default) |
Most workloads. |
HashSearch |
Many point lookups on a fixed-prefix keyspace. Requires a prefix_extractor. |
TwoLevelIndexSearch |
Very large SST files where the index would otherwise dominate. |
Choosing block size
- Small blocks (4–8 KiB) reduce I/O amplification but bloat the index.
- Large blocks (32–64 KiB) shrink the index and improve sequential scans, at the cost of more I/O per random read.
Rule of thumb: leave it at the default (4 KiB) unless profiling tells you otherwise.
Worked example: point-lookup heavy
var cache = Cache.CreateLru(capacity: 1024UL * 1024 * 1024); // 1 GiB
var table = new BlockBasedTableOptions()
.SetBlockSize(8 * 1024)
.SetBlockCache(cache)
.SetFilterPolicy(BloomFilterPolicy.Create(10))
.SetCacheIndexAndFilterBlocks(true)
.SetPinL0FilterAndIndexBlocksInCache(true);
var cfOpts = new ColumnFamilyOptions()
.OptimizeForPointLookup(blockCacheSizeMb: 1024)
.SetBlockBasedTableFactory(table);