Skip to content

Path-based VertexID #3251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from
Draft

Path-based VertexID #3251

wants to merge 1 commit into from

Conversation

arnetheduck
Copy link
Member

@arnetheduck arnetheduck commented May 1, 2025

VertexID has 64 bits of values but currently, but the chain state database uses only a tiny fraction of that (~27 bits). Here, we split up the number space into a fixed portion statically allocated from the MPT path and a dynamic portion for leaves and storage slots.

The static portion simply allocates a (breadth-first) number based on the first nibbles in the address/path while any "deeper" paths instead get a dynamic VertexID like before.

Since the VertedID is path-based, we can more or less guess the VertexID of any node whose path we know based on the "average" depth of the state trie. When we're lucky, a single lookup is sufficient to find the node instead of a one-by-one traversal of each level.

Even in the case that a single lookup is not enough and the actual node is "deeper" than the guess, the starting point helps skip a few levels at least.

Tree depth is estimated by keeping track of hits and misses and occasionally making an adjustment in the direction of the most misses.

On average, this shaves 25% of the import speed for the first 15M blocks where the lookup depth is guessed to be 7 levels - deepening the trie by one more level (when more accounts eventually are added) would see even better performance.

Using 8 levels of statically assigned ids results in 2**32 bits left for dynamic ids / storage slots - this should by far be enough for any foreseeable lifetime of the application, specially because large parts of "current" usage of vertexid space is remains used by actual nodes.

The resulting lookup structure can be thought of as a hybrid between fully path-based lookupts and the current "sparse" id mapping.

blocks: 15721472, baseline: 102h33m7s, contender: 77h4m49s
Time (total): -25h28m18s, -24.84%

made with coffee sponsored by @0x-r4bbit :)

@arnetheduck arnetheduck marked this pull request as draft May 1, 2025 11:17
@arnetheduck arnetheduck changed the title Static vid Path-based VertexID May 1, 2025
@arnetheduck arnetheduck force-pushed the static-vid branch 2 times, most recently from 0fa39b1 to 3935693 Compare May 3, 2025 06:23
VertexID has 64 bits of values but currently, but the chain state
database uses only a tiny fraction of that (~27 bits). Here, we split up
the number space into a fixed portion statically allocated from the MPT
path and a dynamic portion for leaves and storage slots.

The static portion simply allocates a (bread-first) number based on the
first nibbles in the address/path while any "deeper" paths instead get a
dynamic VertexID like before.

Since the VertedID is path-based, we can more or less guess the VertexID
of any node whose path we know based on the "average" depth of the state
trie. When we're lucky, a single lookup is sufficient to find the node
instead of a one-by-one traversal of each level.

Even in the case that a single lookup is not enough and the actual node
is "deeper" than the guess, the starting point helps skip a few levels
at least.

Tree depth is estimated by keeping track of hits and misses and
occasionally making an adjustment in the direction of the most misses.

On average, this shaves 25% of the import speed for the first 15M blocks
where the lookup depth is guessed to be 7 levels - deepening the trie by
one more level (when more accounts eventually are added) would see even
better performance.

Using 8 levels of statically assigned ids results in 2**32 bits left for
dynamic ids / storage slots - this should by far be enough for any
foreseeable lifetime of the application, specially because large parts
of "current" usage of vertexid space is remains used by actual nodes.

The resulting lookup structure can be thought of as a hybrid between
fully path-based lookupts and the current "sparse" id mapping.

made with coffee sponsored by 0x-r4bbit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant