Reading code from Google is an invaluable way to learn how great minds think and write software. LevelDB code is a master class: it’s fast, lightweight, and designed for high-performance storage using log-structured merge-trees (LSM-trees).
LSM-trees only append records and do not update in place. This means multiple instances of a key can exist, but the most recent update represents the current version of that key. LSM maintains a set of files, each file storing a sorted range of keys. To find the most recent version of a key, LevelDB first scans memory and caches, then scans disk files from newest to oldest. The scan ends as soon as the key is found.
However, this can be very expensive if the database is large with many files, especially when the key doesn’t exist: every file may need to be checked. Multiple optimizations help reduce this overhead:
- Metadata Block: Each file has a metadata block listing its key range, allowing quick skips if the target key is out of range.
- Bloom Filters: A probabilistic data structure that answers “might be present” or “definitely not present.” If the filter says “not present,” the file is skipped immediately (no false negatives), though false positives can occur.
The probability of a false positive depends on the configured size of the Bloom filter. Below, we’ll explore how Bloom filters are implemented in LevelDB and how they’re used.
An interface that defines how a key filter operates. Much like GoLang idioms, Google interfaces are typically minimal, elegant, and well-defined, making them easy to implement without extra complexity.
The interface contains exactly three methods:
virtual const char* Name() const = 0;
virtual void CreateFilter(const Slice* keys, int n, std::string* dst) const = 0;
- Creates a filter for
n
keys and appends it todst
. - Note:
Slice
is somewhat likestd::string_view
, storing a pointer (const char*
) and a size, but it does not own the data.
- Creates a filter for
virtual bool KeyMayMatch(const Slice& key, const Slice& filter) const = 0;
- Checks whether the key may be present in the filter.
The file also provides a factory method that returns a concrete FilterPolicy
implementation, which becomes a BloomFilterPolicy
.
Implements the FilterPolicy
.
explicit BloomFilterPolicy(int bits_per_key) : bits_per_key_(bits_per_key) { // We intentionally round down to reduce probing cost a little bit k_ = static_cast<size_t>(bits_per_key * 0.69); // 0.69 ≈ ln(2) if (k_ < 1) k_ = 1; if (k_ > 30) k_ = 30; }
- The
explicit
keyword prevents implicit conversions (e.g., fromint
toBloomFilterPolicy
). bits_per_key
indicates how many bits are allocated for each key in the filter. The more bits allocated, the lower the collision probability (i.e., fewer false positives).- This choice is a trade-off: more bits reduce false positives but increase memory usage. The authors of LevelDB suggest using 10 bits per key, which yields about a 1% false positive rate.
k_
represents how many hash probes each key uses when setting bits in the filter. A higherk_
typically reduces false positives but increases the cost of building and querying the filter.
void CreateFilter(const Slice* keys, int n, std::string* dst) const override { // Compute bloom filter size (in both bits and bytes) size_t bits = n * bits_per_key_;
<span class="token comment">// For small n, we can see a very high false positive rate.</span> <span class="token comment">// Fix it by enforcing a minimum bloom filter length.</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>bits <span class="token operator"><</span> <span class="token number">64</span><span class="token punctuation">)</span> bits <span class="token operator">=</span> <span class="token number">64</span><span class="token punctuation">;</span> size_t bytes <span class="token operator">=</span> <span class="token punctuation">(</span>bits <span class="token operator">+</span> <span class="token number">7</span><span class="token punctuation">)</span> <span class="token operator">/</span> <span class="token number">8</span><span class="token punctuation">;</span> bits <span class="token operator">=</span> bytes <span class="token operator">*</span> <span class="token number">8</span><span class="token punctuation">;</span> <span class="token keyword">const</span> size_t init_size <span class="token operator">=</span> dst<span class="token operator">-</span><span class="token operator">></span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> dst<span class="token operator">-</span><span class="token operator">></span><span class="token function">resize</span><span class="token punctuation">(</span>init_size <span class="token operator">+</span> bytes<span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">;</span> dst<span class="token operator">-</span><span class="token operator">></span><span class="token function">push_back</span><span class="token punctuation">(</span><span class="token keyword">static_cast</span><span class="token operator"><</span><span class="token keyword">char</span><span class="token operator">></span><span class="token punctuation">(</span>k_<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// Remember # of probes</span> <span class="token keyword">char</span><span class="token operator">*</span> array <span class="token operator">=</span> <span class="token operator">&</span><span class="token punctuation">(</span><span class="token operator">*</span>dst<span class="token punctuation">)</span><span class="token punctuation">[</span>init_size<span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">int</span> i <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> i <span class="token operator"><</span> n<span class="token punctuation">;</span> i<span class="token operator">++</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token comment">// Use double-hashing to generate a sequence of hash values.</span> <span class="token comment">// See analysis in [Kirsch, Mitzenmacher 2006].</span> uint32_t h <span class="token operator">=</span> <span class="token function">BloomHash</span><span class="token punctuation">(</span>keys<span class="token punctuation">[</span>i<span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">const</span> uint32_t delta <span class="token operator">=</span> <span class="token punctuation">(</span>h <span class="token operator">>></span> <span class="token number">17</span><span class="token punctuation">)</span> <span class="token operator">|</span> <span class="token punctuation">(</span>h <span class="token operator"><<</span> <span class="token number">15</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// Rotate right 17 bits</span> <span class="token keyword">for</span> <span class="token punctuation">(</span>size_t j <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> j <span class="token operator"><</span> k_<span class="token punctuation">;</span> j<span class="token operator">++</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token keyword">const</span> uint32_t bitpos <span class="token operator">=</span> h <span class="token operator">%</span> bits<span class="token punctuation">;</span> array<span class="token punctuation">[</span>bitpos <span class="token operator">/</span> <span class="token number">8</span><span class="token punctuation">]</span> <span class="token operator">|</span><span class="token operator">=</span> <span class="token punctuation">(</span><span class="token number">1</span> <span class="token operator"><<</span> <span class="token punctuation">(</span>bitpos <span class="token operator">%</span> <span class="token number">8</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span> h <span class="token operator">+</span><span class="token operator">=</span> delta<span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span>
}
- Calculate the number of bits required (
n * bits_per_key_
) and ensure a minimum size (64 bits) for smalln
. - Convert that to bytes, then expand
dst
by that many bytes, initializing them to zero. - Append
k_
(casted tochar
), which indicates how many probes will be used. - For each key:
- Calculate a hash (
BloomHash
) to geth
. - Derive
delta
by rotatingh
(17 bits right, 15 bits left) and combining them with bitwise OR. - Double hashing: for each of the
k_
probes, pick a bit position (h % bits
) and set it, then updateh
by addingdelta
.
- Calculate a hash (
- Both numbers are coprime with 32, preserving randomness and preventing repetitive patterns.
- Shifting and OR-ing helps distribute the bits widely, lowering collision.
static uint32_t BloomHash(const Slice& key) { return Hash(key.data(), key.size(), 0xbc9f1d34); }
uint32_t Hash(const char data, size_t n, uint32_t seed) { // Similar to murmur hash const uint32_t m = 0xc6a4a793; const uint32_t r = 24; const char limit = data + n; uint32_t h = seed ^ (n * m);
<span class="token comment">// Pick up four bytes at a time</span> <span class="token keyword">while</span> <span class="token punctuation">(</span>limit <span class="token operator">-</span> data <span class="token operator">>=</span> <span class="token number">4</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> uint32_t w <span class="token operator">=</span> <span class="token function">DecodeFixed32</span><span class="token punctuation">(</span>data<span class="token punctuation">)</span><span class="token punctuation">;</span> data <span class="token operator">+</span><span class="token operator">=</span> <span class="token number">4</span><span class="token punctuation">;</span> h <span class="token operator">+</span><span class="token operator">=</span> w<span class="token punctuation">;</span> h <span class="token operator">*</span><span class="token operator">=</span> m<span class="token punctuation">;</span> h <span class="token operator">^</span><span class="token operator">=</span> <span class="token punctuation">(</span>h <span class="token operator">>></span> <span class="token number">16</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token comment">// Pick up remaining bytes</span> <span class="token keyword">switch</span> <span class="token punctuation">(</span>limit <span class="token operator">-</span> data<span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token keyword">case</span> <span class="token number">3</span><span class="token operator">:</span> h <span class="token operator">+</span><span class="token operator">=</span> <span class="token keyword">static_cast</span><span class="token operator"><</span>uint8_t<span class="token operator">></span><span class="token punctuation">(</span>data<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator"><<</span> <span class="token number">16</span><span class="token punctuation">;</span> <span class="token punctuation">[</span><span class="token punctuation">[</span>fallthrough<span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token keyword">case</span> <span class="token number">2</span><span class="token operator">:</span> h <span class="token operator">+</span><span class="token operator">=</span> <span class="token keyword">static_cast</span><span class="token operator"><</span>uint8_t<span class="token operator">></span><span class="token punctuation">(</span>data<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator"><<</span> <span class="token number">8</span><span class="token punctuation">;</span> <span class="token punctuation">[</span><span class="token punctuation">[</span>fallthrough<span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">;</span> <span class="token keyword">case</span> <span class="token number">1</span><span class="token operator">:</span> h <span class="token operator">+</span><span class="token operator">=</span> <span class="token keyword">static_cast</span><span class="token operator"><</span>uint8_t<span class="token operator">></span><span class="token punctuation">(</span>data<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">;</span> h <span class="token operator">*</span><span class="token operator">=</span> m<span class="token punctuation">;</span> h <span class="token operator">^</span><span class="token operator">=</span> <span class="token punctuation">(</span>h <span class="token operator">>></span> r<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">break</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token keyword">return</span> h<span class="token punctuation">;</span>
}
This function generates the initial hash for each key. It’s a variant of MurmurHash, which is fast, non-cryptographic, and has a low collision rate—perfect for Bloom filters.
m = 0xc6a4a793
: A multiplicative mixing constant chosen for strong dispersion (good “avalanche” behavior).r = 24
: Determines how many bits to shift for handling leftover bytes.h = seed ^ (n * m)
: The initial hash value, whereseed
is0xbc9f1d34
inBloomHash
.- Processing:
- Read four bytes at a time, decode them as
uint32_t
, add toh
, multiply bym
, then XORh
withh >> 16
. - If the input is not a multiple of 4 bytes, process the remaining 1–3 bytes by shifting and XORing with the constant
r
.
- Read four bytes at a time, decode them as
bool KeyMayMatch(const Slice& key, const Slice& bloom_filter) const override { const size_t len = bloom_filter.size(); if (len < 2) return false; const char* array = bloom_filter.data(); const size_t bits = (len - 1) * 8;
<span class="token comment">// Use the encoded k so that we can read filters generated by</span> <span class="token comment">// bloom filters created using different parameters.</span> <span class="token keyword">const</span> size_t k <span class="token operator">=</span> array<span class="token punctuation">[</span>len <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span> uint32_t h <span class="token operator">=</span> <span class="token function">BloomHash</span><span class="token punctuation">(</span>key<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">const</span> uint32_t delta <span class="token operator">=</span> <span class="token punctuation">(</span>h <span class="token operator">>></span> <span class="token number">17</span><span class="token punctuation">)</span> <span class="token operator">|</span> <span class="token punctuation">(</span>h <span class="token operator"><<</span> <span class="token number">15</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// Rotate right 17 bits</span> <span class="token keyword">for</span> <span class="token punctuation">(</span>size_t j <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> j <span class="token operator"><</span> k<span class="token punctuation">;</span> j<span class="token operator">++</span><span class="token punctuation">)</span> <span class="token punctuation">{</span> <span class="token keyword">const</span> uint32_t bitpos <span class="token operator">=</span> h <span class="token operator">%</span> bits<span class="token punctuation">;</span> <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token punctuation">(</span>array<span class="token punctuation">[</span>bitpos <span class="token operator">/</span> <span class="token number">8</span><span class="token punctuation">]</span> <span class="token operator">&</span> <span class="token punctuation">(</span><span class="token number">1</span> <span class="token operator"><<</span> <span class="token punctuation">(</span>bitpos <span class="token operator">%</span> <span class="token number">8</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">)</span> <span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span> h <span class="token operator">+</span><span class="token operator">=</span> delta<span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span>
}
- This is the symmetric operation to
CreateFilter
. - It reads the last byte of
bloom_filter
to obtaink
. - The total bits are
(len - 1) * 8
. - The same double hashing method is used to determine which bits should be set. If all the required bits are set, the key may be present; if any bit is missing, the key is definitely not present.
Each SST file holds a sorted run of key-value pairs, split into blocks. LevelDB generates a *Bloom filter for all the keys in that SST file, storing it in a filter block at the file’s end.
- Check the memtable: If the key is found here, return it immediately.
- Search SST files from newest to oldest:
- LevelDB caches metadata about each SST file, including its Bloom filter block.
- Before scanning the file’s blocks, LevelDB first checks the Bloom filter.
- If the filter says the key is not present, the file is skipped.
- If it says the key may be present, LevelDB proceeds to look inside the file.
By using Bloom filters, LevelDB avoids reading from many files unnecessarily, resulting in better performance.