Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
VertexID has 64 bits of values but currently, but the chain state database uses only a tiny fraction of that (~27 bits). Here, we split up the number space into a fixed portion statically allocated from the MPT path and a dynamic portion for leaves and storage slots.
The static portion simply allocates a (breadth-first) number based on the first nibbles in the address/path while any "deeper" paths instead get a dynamic VertexID like before.
Since the VertedID is path-based, we can more or less guess the VertexID of any node whose path we know based on the "average" depth of the state trie. When we're lucky, a single lookup is sufficient to find the node instead of a one-by-one traversal of each level.
Even in the case that a single lookup is not enough and the actual node is "deeper" than the guess, the starting point helps skip a few levels at least.
Tree depth is estimated by keeping track of hits and misses and occasionally making an adjustment in the direction of the most misses.
On average, this shaves 25% of the import speed for the first 15M blocks where the lookup depth is guessed to be 7 levels - deepening the trie by one more level (when more accounts eventually are added) would see even better performance.
Using 8 levels of statically assigned ids results in 2**32 bits left for dynamic ids / storage slots - this should by far be enough for any foreseeable lifetime of the application, specially because large parts of "current" usage of vertexid space is remains used by actual nodes.
The resulting lookup structure can be thought of as a hybrid between fully path-based lookupts and the current "sparse" id mapping.
made with coffee sponsored by @0x-r4bbit :)