Path-based VertexID #3251

arnetheduck · 2025-05-01T11:16:34Z

VertexID has 64 bits of values but currently, but the chain state database uses only a tiny fraction of that (~27 bits). Here, we split up the number space into a fixed portion statically allocated from the MPT path and a dynamic portion for leaves and storage slots.

The static portion simply allocates a (breadth-first) number based on the first nibbles in the address/path while any "deeper" paths instead get a dynamic VertexID like before.

Since the VertedID is path-based, we can more or less guess the VertexID of any node whose path we know based on the "average" depth of the state trie. When we're lucky, a single lookup is sufficient to find the node instead of a one-by-one traversal of each level.

Even in the case that a single lookup is not enough and the actual node is "deeper" than the guess, the starting point helps skip a few levels at least.

Tree depth is estimated by keeping track of hits and misses and occasionally making an adjustment in the direction of the most misses.

On average, this shaves 25% of the import speed for the first 15M blocks where the lookup depth is guessed to be 7 levels - deepening the trie by one more level (when more accounts eventually are added) would see even better performance.

Using 8 levels of statically assigned ids results in 2**32 bits left for dynamic ids / storage slots - this should by far be enough for any foreseeable lifetime of the application, specially because large parts of "current" usage of vertexid space is remains used by actual nodes.

The resulting lookup structure can be thought of as a hybrid between fully path-based lookupts and the current "sparse" id mapping.

blocks: 15721472, baseline: 102h33m7s, contender: 77h4m49s
Time (total): -25h28m18s, -24.84%

made with coffee sponsored by @0x-r4bbit :)

VertexID has 64 bits of values but currently, but the chain state database uses only a tiny fraction of that (~27 bits). Here, we split up the number space into a fixed portion statically allocated from the MPT path and a dynamic portion for leaves and storage slots. The static portion simply allocates a (bread-first) number based on the first nibbles in the address/path while any "deeper" paths instead get a dynamic VertexID like before. Since the VertedID is path-based, we can more or less guess the VertexID of any node whose path we know based on the "average" depth of the state trie. When we're lucky, a single lookup is sufficient to find the node instead of a one-by-one traversal of each level. Even in the case that a single lookup is not enough and the actual node is "deeper" than the guess, the starting point helps skip a few levels at least. Tree depth is estimated by keeping track of hits and misses and occasionally making an adjustment in the direction of the most misses. On average, this shaves 25% of the import speed for the first 15M blocks where the lookup depth is guessed to be 7 levels - deepening the trie by one more level (when more accounts eventually are added) would see even better performance. Using 8 levels of statically assigned ids results in 2**32 bits left for dynamic ids / storage slots - this should by far be enough for any foreseeable lifetime of the application, specially because large parts of "current" usage of vertexid space is remains used by actual nodes. The resulting lookup structure can be thought of as a hybrid between fully path-based lookupts and the current "sparse" id mapping. made with coffee sponsored by 0x-r4bbit

arnetheduck marked this pull request as draft May 1, 2025 11:17

arnetheduck changed the title ~~Static vid~~ Path-based VertexID May 1, 2025

arnetheduck force-pushed the static-vid branch 2 times, most recently from 0fa39b1 to 3935693 Compare May 3, 2025 06:23

arnetheduck force-pushed the static-vid branch from 3935693 to c1dc14f Compare May 3, 2025 06:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Path-based VertexID #3251

Path-based VertexID #3251

arnetheduck commented May 1, 2025 •

edited

Loading

Path-based VertexID #3251

Are you sure you want to change the base?

Path-based VertexID #3251

Conversation

arnetheduck commented May 1, 2025 • edited Loading

arnetheduck commented May 1, 2025 •

edited

Loading