State trie version v0 vs v1 #11607

qdm12 · 2022-06-07T00:52:32Z

qdm12
Jun 7, 2022

Hi there!

I'm Quentin working on the Polkadot implementation in Go Gossamer.

I have a few questions regarding the state trie upgrade from v0 to v1.

From reading around the Polkadot specification, substrate and the trie crate pull requests, I have understood so far the only difference between v0 and v1 is that trie proofs can be encoded nodes with their subvalue hash as their 'subvalue field'.

Now it seems the v1 decoding of proofs is retro-compatible with the v0 generated proofs, since new node headers were added and they don't conflict. On the other hand, v1 encoding is not compatible with v0 decoding.

My understanding is that the tip of the chain will have only v1 state trie runtimes, and that we only need to support decoding proof nodes when syncing the chain together with the v0 state trie runtime. So does that mean we can only implement the v1 state trie codec and have it run for both v0 and v1 encoded proof nodes? Or do we still need to support both versions?

Feel free to correct me or clarify on any of the points mentioned above! I might had missed some differences between v0 and v1.

Thank you in advance!

cheme · 2022-06-07T14:27:53Z

cheme
Jun 7, 2022
Collaborator

Hi Quentin,

From reading around the Polkadot specification, substrate and the trie crate pull requests, I have understood so far the only difference between v0 and v1 is that trie proofs can be encoded nodes with their subvalue hash as their 'subvalue field'.

Right the point is for all value with size >= 33 the value is its own node and we write its hash in the parent node. (it allows not having value in PoV for operation such as value removal and other more specific).
'value node' do not have headers or specific encoding like other trie nodes (just the plain value bytes).

Now it seems the v1 decoding of proofs is retro-compatible with the v0 generated proofs, since new node headers were added and they don't conflict. On the other hand, v1 encoding is not compatible with v0 decoding.

Yes new implementation accept both V0 nodes and V1.
Both headers are allowed and V0 becomes a subset of V1).
The new root host function pass this info as a parameter (

substrate/primitives/io/src/lib.rs

Line 271 in 3ca525d

fn root(&mut self, version: StateVersion) -> Vec<u8> {

).
So the choice is driven by the runtime.
V0 decoding not being compatible with V1 encoding is not really an issue (using a single decoder that works with both V0 and V1 is what we did, meaning that some stuff that would historically be rejected are now accepted, but not an issue).

My understanding is that the tip of the chain will have only v1 state trie runtimes,

Tip of the chain after state migration happens will only contains V1. But there will be a time where the state is V0 and new values are written in V1 (let's call it hybrid state). This hybrid state (containing values of size > 33 byte in some trie nodes and value nodes at the same time) is not a good thing (break warp sync for instance), and migrating should be done quickly.

and that we only need to support decoding proof nodes when syncing the chain together with the v0 state trie runtime.

a single codec is used doing both V0 or V1 decoding, it maybe be doable to remove the V0 variant from the V1 codec, but I think it is easier to keep both version decoding capability even after migrating, since a runtime is still allowed to call the root host function in V0 (but really shouldn't).

So does that mean we can only implement the v1 state trie codec and have it run for both v0 and v1 encoded proof nodes? Or do we still need to support both versions?

Yes, that's it, no need to distinguish at the codec level.

Also, worth mentioning, there is still a subtlety for compact proof: a special header can indicate that the value is not included in the proof (https://github.com/paritytech/trie/blob/aa3168d6de01793e71ebd906d3a82ae4b363db59/trie-db/src/trie_codec.rs#L140).

1 reply

qdm12 Jun 13, 2022
Author

Thanks so much for the detailed answer!

I have a few more questions if you don't mind, and sorry in advance if they don't all make sense:

Right the point is for all value with size >= 33 the value is its own node and we write its hash in the parent node. when you say the value is its own node, do you mean its own value hash? I'm not sure I fully understand what own node refers to.
we write its hash in the parent node: we write the subvalue hash in the parent node or the hash of the encoded node? Does that mean the parent becomes a 'leaf/branch containing hashes' as its value?
I thought the only difference was regarding to Merkle proof nodes encoding and decoding (generate and verify), but it also looks likee it's affecting the overall state trie? Should v1 generate nodes containing hashed sub-value for values > 32 bytes in all cases?
Substrate is currently not in the hybrid state, when about would you see the transition start and finish, if you may know?
It seems Merkle root hash did change with the 9.20 runtime, why is this? Is substrate already encoding nodes with the v1 codec?

cheme · 2022-06-13T20:37:14Z

cheme
Jun 13, 2022
Collaborator

Right the point is for all value with size >= 33 the value is its own node and we write its hash in the parent node. when you say the value is its own node, do you mean its own value hash? I'm not sure I fully understand what own node refers to.

In the leaf or branch the 'value part' of the encoded node is the value hash, but from our implementation point of view (and the proof) the value bytes content can be seen as a single node (without header or additional encoding) and is stored as a key value (with key being hash of value bytes).

Does that mean the parent becomes a 'leaf/branch containing hashes' as its value?

yes 👍 . And the hash points to the actual byte content of the value (which can be seen as a node, just without header as we already have all needed info).
Note that not all implementation needs to use the value as a separated node (my initial implementation didn't and just attach it to the parent node), but in proof Vec<Vec<u8>> will contains both regular encoded nodes and value bytes content (can be a bit problematic if implementation parse the encoded node first, in our case we store everything in a hashmap and then load/parse from the root).
For CompactProof there is the changes I mentioned earlier.

like it's affecting the overall state trie?

yes, thus the need for a migration: warning it also mean that child trie root will change after migration (can be an issue for some parachains).

but as long as runtime did not change to V1, no.

Should v1 generate nodes containing hashed sub-value for values > 32 bytes in all cases?

yes 👍 , but using v1 is driven by the parameter passed to the root host function.

Substrate is currently not in the hybrid state, when about would you see the transition start and finish, if you may know?

asap, but it is been a while since things are ready (so not sure). There will be first the test networks migrating (devops team is on the starting blocks but we need to write and configure the new runtime, this task was postponed a few time due to other more urgent runtime problematic but I have good hope we can proceed to do soon).

but the switch is a chain choice (for instance currently our testnet for smart contract is already using V1 as it was restarted recently).
So ability to process both V0 and V1 are need to be available to the runtime, and it is the runtime which calls the right one (while syncing the runtime version is hardcoded in the wasm call, but when in the transaction pool due to the way we calculate the root in our implementation we did need to read it out of the current wasm runtime version (new field)).

It seems Merkle root hash did change with the 9.20 runtime, why is this?

It should not have (except maybe on the smart contract testnet), I mean I did not hear of user facing state root issue if they did not upgrade to 9.20, trie-db version seems to be 0.23.1 in both 0.9.19 and 0.9.20.

New format was added in 0.23.0, switch to 0.23.1 in polkadot 0.9.17 and to 0.23.0 in polkadot 0.9.16.

What kind of the root hash difference do you see (is it for a specific chain, a specific block)?

Is substrate already encoding nodes with the v1 codec?

yes since polkadot 0:9.16.

1 reply

qdm12 Jul 7, 2022
Author

Thank you for your previous answer, it helped quite a bit! 👍

We are aiming at jumping from pre-0.9.16 to 0.9.20, that's why we have root hash changes.

I am back with more questions if that's ok with you; I'm still exploring Substrate (and its various crates here and there) code but I thought I would ask to avoid any misunderstanding on my side!

My approach so far has been to find the new runtime call functions in the spec using the new state version field (usually _version_2). Then I refer to the older _version_1 functions and change every trie function to accept a version (v0 or v1) argument. So far, that means trie insertion, trie root hashing, proof generation, proof verification and child trie insertion are versioned. As I understood, the trie can have both v0 and v1 nodes, so the version applies to the function call and each trie node, instead of the full trie, correct?
I am planning on differentiating v0 and v1 nodes by adding a version byte field for each node, do you think that's ok?
I am planning on detecting when a subvalue is to be removed using a reference counter (in case 2 nodes have the same >32B subvalue), do you foresee any problem with this? Does Substrate do that as well? 🤔
I am planning on storing the hashed subvalue -> subvalue mapping in a separate database, instead of in the trie. Do you see any problem with this approach?
Regarding the ESCAPE_HEADER (aka 'reserved for compact encoding' aka 00010000):
```
/// Escape header byte sequence to indicate next node is a
/// branch or leaf with hash of value, followed by the value node.
const ESCAPE_HEADER: Option<u8> = None;
```
is the 'value node' just the scale encoding of the node subvalue bytes? Also why is this header needed if leaf and branch with hashed subvalue already have their own header 001 and 0001 (or the opposite, why did we need those 2 new headers if we have this compact encoding escape header)

Many thanks! 👍

cheme · 2022-07-07T20:01:20Z

cheme
Jul 7, 2022
Collaborator

My approach so far has been to find the new runtime call functions in the spec using the new state version field (usually _version_2). Then I refer to the older _version_1 functions and change every trie function to accept a version (v0 or v1) argument. So far, that means trie insertion, trie root hashing, proof generation, proof verification and child trie insertion are versioned. As I understood, the trie can have both v0 and v1 nodes, so the version applies to the function call and each trie node, instead of the full trie, correct?

I am not sure I get your description.
But I will try to describe my implementation:

in trie crate, the state version is global: any write operation will be on a same version (all nodes uses the same version).
in substrate, only the call to storage root host function uses the state version. The thing being that the changes are only applied on the trie at this point. (trie is only modified when calling storage root host function)

I am planning on differentiating v0 and v1 nodes by adding a version byte field for each node, do you think that's ok?

I don't really understand why differentiating is needed at trie node level. (one only need the trie version when writing change in trie and the state version can be pass as parameter). In the rust trie crate implementation, nodes are not versioned, and the version is passed as paremeter of the codec when committing the changes (and then depending on version we use a subvalue node or not).
There is one tricky thing though: writing same value should update the node if value size is > 32 , state version is 1 and existing value is stored inline (produced with version 0).

I am planning on detecting when a subvalue is to be removed using a reference counter (in case 2 nodes have the same >32B subvalue), do you foresee any problem with this? Does Substrate do that as well? thinking

Using a reference counter is fine, that is what we do when using paritydb.
When using rocksdb, we prefix the value hash with its key so we do not have hash collision at a same block (we also do it when storing encoded nodes to avoid node collision).

I am planning on storing the hashed subvalue -> subvalue mapping in a separate database, instead of in the trie. Do you see any problem with this approach?

That is how things work in our case (we do the same for every encoded node of the trie).
Though using a separate database can be an issue as trie serializing (I don't remember how gossamer store the trie) should be consistent with the value both need to share the same db transaction. Using two different column of the same db should be fine.

is the 'value node' just the scale encoding of the node subvalue bytes?

The value node is just it's bytes. The proof is of type Vec< Vec > with the Vec being either value node or a trie encoded node. Then the proof is scale encoded so yes the subvalue will be scale encoded.

why is this header needed if leaf and branch with hashed subvalue already have their own header

For a leaf (or branch) containing a hash pointing to a subvalue node, there is two case:

the value is not added to the proof (eg we only query hash of a value): then the compact proof writes the encoded node.
When reading we see the header and read the hash.
the value is added in the proof. Then the value node is replaced by an inline 0 length value. And the node get encoded as a leaf with 0 len value (so no 001 header or 0001 header, just the V0 one).
Here if we read we will see a valid node with 0 length inline value.
Adding ESCAPE_HEADER indicate there is a value attached next, and after reading it we replace the 0 length value by its hash.

We could have kept the 001 or 00001 header and just omit the hash, but since the hash is encoded directly without size and not in terminal position (for branches) it is not doable.

2 replies

cheme Jul 7, 2022
Collaborator

Actually I am not sure of gossamer trie update strategie (iirc at some point there was the whole trie loaded in memory), depending on how storage is today, it may make sense to attach version to trie node (eg if all node are written independently of update state or written in an asynchronous manner).
In that case the version is only needed for nodes that contains a value with len > 32, but it may be easier to put it everywhere indeed.

qdm12 Jul 11, 2022
Author

Thanks for your prompt answers, again 🚀

Indeed, it looks like the trie can be versioned entirely, there is no need to version every trie call / trie node.
However, regarding the 'hybrid state', I'm a a bit confused:

But there will be a time where the state is V0 and new values are written in V1 (let's call it hybrid state). This hybrid state (containing values of size > 33 byte in some trie nodes and value nodes at the same time) is not a good thing (break warp sync for instance), and migrating should be done quickly.

How should we prepare code for this? Doesn't this mean we would have a state trie with both v0 and v1 nodes?

the value is not added to the proof

Ah this makes sense, the value should already be in the receiver database, and it can look it up from the hash given.

Adding ESCAPE_HEADER indicate there is a value attached next

Since a leaf node without a value is not permitted (AFAIK), why do we need this escape header? Couldn't we just assume it comes next? Also what is the advantage to have the value bytes after the leaf node, instead of inlined in that particular case?

I am not sure of gossamer trie update strategie (iirc at some point there was the whole trie loaded in memory)

Regarding Gossamer, the whole trie is still in memory although I'm aiming at changing that, most likely right after the v1 state trie upgrade. But I don't think having a versioning per node would help much, I'm trying to keep it as simple as possible.

cheme · 2022-07-11T16:29:45Z

cheme
Jul 11, 2022
Collaborator

About the hybrid state, the way we handle it is that we switch/choose trie implementation when calling storage root only (that is the only time we do insert operation on trie).

Doesn't this mean we would have a state trie with both v0 and v1 nodes?

yes, that is why both implementation (in our case we share a lot of code), need to be compatible (V1 can read V0 node so it is safe to use it for any read only operation).

Ah this makes sense, the value should already be in the receiver database, and it can look it up from the hash given.

yes and sometime leaf are included in proof (to read the partial key) without reading value. Also, on this trie format, get_hash host function do not access the value anymore (before it was get_value then hash).

Since a leaf node without a value is not permitted

Without value is not permitted but 0 length value can be.
The thing is leaf node without value is not valid encoding (could be manage somehow if branch value was in terminal position and we use the fact that the node is a Vec encoded so we know its size).
A thing with compact proof is that it is implemented over a generic codec so it does not touch the trie node encoding (when it was implemented the implementer did not have to touch only line of codec code and it can apply on any codec).

Also what is the advantage to have the value bytes after the leaf node, instead of inlined in that particular case?
No added value, could have been done this way indeed (gain one byte indeed). Value is a node in this case (there is a hashing cost and it is a node from the non compact proof). But yes would have won a byte per value in proof. Similarily I am not 100% happy with the decision to put the header in from of node with value present (value missing may be less numberous), do not really remember why (could be the fact that in this case we still got strictly smaller node when in the other case some node would be bigger than originally).

Regarding Gossamer, the whole trie is still in memory although I'm aiming at changing that, most likely right after the v1 state trie upgrade. But I don't think having a versioning per node would help much, I'm trying to keep it as simple as possible.

Yes you are right, could only help if there was parasitic writes of value (like some V0 are loaded and written without actual changes happening).

5 replies

qdm12 Jul 11, 2022
Author

Ok thanks for all this! So about that hybrid state again 😅

Some runtime functions (such as ext_trie_blake2_256_root_version_2, ext_trie_blake2_256_ordered_root_version_2, ext_trie_blake2_256_verify_proof_version_2) each use a local trie, so the trie can be versioned entirely (or through an argument to the write functions).

Now other runtime functions such as ext_default_child_storage_root_version_2 and ext_storage_root_version_2 use the state trie, which might have v0 and/or v1 nodes, and so the trie cannot be entirely versioned. In this situation, if a node with a subvalue field of 32 bytes is encountered, how do we know if this is a hash of a subvalue (v1) or an inlined subvalue (v0)? Should we just look it up in the database, and, if not found, assume it's an inlined subvalue?

I think my 🧠 is finally getting there, thank you so much for the help so far.

cheme Jul 11, 2022
Collaborator

if a node with a subvalue field of 32 bytes is encountered, how do we know if this is a hash of a subvalue (v1) or an inlined subvalue (v0)? Should we just look it up in the database, and, if not found, assume it's an inlined subvalue?

The headers of the node are differents (if hash of subvalue is used, the new headers variants are used). (see

substrate/primitives/trie/src/node_header.rs

Line 67 in 8310936

NodeHeader::HashedValueBranch(nibble_count) => encode_size_and_prefix(

)

qdm12 Jul 12, 2022
Author

Ah that works because substrate lazy loads+decodes each trie node using its encoding hash digest to lookup its encoding from database.

In our current situation, Gossamer keeps all nodes in memory and they don't necessarily have their encoding with them. A workaround would be to keep the header byte in each node (or version v0/v1), but I might as well just store the node encoding digest <-> node encoding in the database as substrate does, to lower our memory usage which is too high anyway.

A few additional questions:

Should it be only writing the same key value to the trie which would upgrade a node in the trie? Would reading a node also upgrade it (asking since I read we can simulate it by reading all keys in storage once. in Trie version migration pallet #10073)
Do you know in the end why we gradually migrate our state trie to then run a full migration, instead of running a full migration only?

As usual, thank you for the discussion 🙏

cheme Jul 12, 2022
Collaborator

Yes, remember that you may only really need to attach the version to the nodes containing a value in your case (I mean as a quick solution without touching too much your model, not sure what should be your strategy here).

Should it be only writing the same key value to the trie which would upgrade a node in the trie? Would reading a node also upgrade it (asking since I read we can simulate it by reading all keys in storage once. in Trie version migration pallet #10073)

Only writing, actually the trie migration pallet read and write all the key value once (may be incorrect documentation).

Do you know in the end why we gradually migrate our state trie to then run a full migration, instead of running a full migration only?

for relay chain doing the full migration between two blocks is possible (I firstly implement that), I don't remember but was bellow the minute for sure, maybe 20 second, not sure.
Still it is not standard, running it progressively is transparent (not something that user need to be aware of, there is no change of root between two blocks).
One could run all in a single block, but since we got the multiple block implementation for parachain, it is not needed.

But the real reason is to migrate parachain: we are limited by the size of the proof for the pvf, so we cannot migrate in one block and get the whole state transition verified at once.

Maybe I misunderstood the question, just we do a full migration that is over multiple blocks (we don't intent to let state in hybrid state and wait for state to change over time, as hybrid state breaks some functionalities, and generally staying in hybrid state is a bad idea).

qdm12 Jul 13, 2022
Author

Awesome, awesome 👍

you may only really need to attach the version to the nodes containing a value in your case

Yes indeed, I'll write this down, a good optimization! Although for now to see clearer & clean up our (a bit messy historically) trie code, I recently changed all nodes to use the same model so that extra byte will have to apply to all nodes. We should eventually go back to a different model per node variant for optimization purposes later on.

Now, re-quoting myself:

I might as well just store the node encoding digest <-> node encoding in the database as substrate does, to lower our memory usage which is too high anyway.

Do you have any thoughts on #11824 ? TLDR: I just want to store subvalue digest <-> > 32B subvalue for both v0 and v1 nodes, storing their variant bits + bit indicating if 32B subvalue is a hashed subvalue or inlined

cheme · 2022-07-11T16:36:30Z

cheme
Jul 11, 2022
Collaborator

Note that changing trie compact format is something doable but would require client upgrade (it is compiled in wasm in cumulus PoV and I think is not yet used in rpc), still some work to deploy/do

0 replies

dimartiro · 2023-08-10T23:53:10Z

dimartiro
Aug 10, 2023

@cheme I'm finishing Quentin's implementation of trie V1 for gossamer and I have a question.

Is there any chance where we have to encode a trie built on V1 using V0 encoding? I've been looking into all substrate code and I think you are always assuming that we cannot go back from V1 to V0, is that correct?

What happen if we apply a runtime change that's start using V1 (for example in a fork) and then we discard that fork and have to re apply a previous runtime that is using V0? what is the expected behavior for that scenario?

Thanks!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

State trie version v0 vs v1 #11607

{{title}}

Replies: 6 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

State trie version v0 vs v1 #11607

qdm12 Jun 7, 2022

Replies: 6 comments · 9 replies

cheme Jun 7, 2022 Collaborator

qdm12 Jun 13, 2022 Author

cheme Jun 13, 2022 Collaborator

qdm12 Jul 7, 2022 Author

cheme Jul 7, 2022 Collaborator

cheme Jul 7, 2022 Collaborator

qdm12 Jul 11, 2022 Author

cheme Jul 11, 2022 Collaborator

qdm12 Jul 11, 2022 Author

cheme Jul 11, 2022 Collaborator

qdm12 Jul 12, 2022 Author

cheme Jul 12, 2022 Collaborator

qdm12 Jul 13, 2022 Author

cheme Jul 11, 2022 Collaborator

dimartiro Aug 10, 2023

qdm12
Jun 7, 2022

Replies: 6 comments 9 replies

cheme
Jun 7, 2022
Collaborator

qdm12 Jun 13, 2022
Author

cheme
Jun 13, 2022
Collaborator

qdm12 Jul 7, 2022
Author

cheme
Jul 7, 2022
Collaborator

cheme Jul 7, 2022
Collaborator

qdm12 Jul 11, 2022
Author

cheme
Jul 11, 2022
Collaborator

qdm12 Jul 11, 2022
Author

cheme Jul 11, 2022
Collaborator

qdm12 Jul 12, 2022
Author

cheme Jul 12, 2022
Collaborator

qdm12 Jul 13, 2022
Author

cheme
Jul 11, 2022
Collaborator

dimartiro
Aug 10, 2023