Skip to content

Commit

Permalink
ReshardingV3 memtrie
Browse files Browse the repository at this point in the history
  • Loading branch information
shreyan-gupta committed Nov 14, 2024
1 parent 5368857 commit fd5b42c
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 6 deletions.
Binary file added neps/assets/nep-0568/NEP-SplitState.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
44 changes: 38 additions & 6 deletions neps/nep-0568.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,45 @@ post-processing, as long as the chain's view reflects a fully resharded state.
stateless validation mechanisms.
* State Sync: Nodes must be able to sync the states of the child
shards post-resharding.
* Cross-Shard Traffic: Receipts sent to the parent shard may need to be
reassigned to one of the child shards.
* Cross-Shard Traffic: Receipts and buffered receipts sent to the parent shard may
need to be reassigned to one of the child shards.
* Receipt Handling: Delayed, postponed, buffered, and promise-yield receipts
must be correctly distributed between the child shards.
* ShardId Semantics: The shard identifiers will become abstract identifiers
where today they are number in the 0..num_shards range.
* Congestion Info: CongestionInfo in the chunk header would be recalculated for the child
shards at the resharding boundary. Proof must be compatible with Stateless Validation.

### State Storage - Mem Trie
### State Storage - MemTrie

MemTrie is the in-memory representation of the trie that the runtime uses for all trie accesses. This is kept in sync with the Trie representation in state.

For the purposes of resharding, we need an efficient way to split the MemTrie into two child tries based on the boundary account. This splitting happens at the epoch boundary when the new epoch is expected to have the two child shards. The set of requirements around MemTrie splitting are:
* MemTrie splitting needs to be "instant", i.e. happen efficiently within the span of one block. The child tries need to be available for the processing of the next block in the new epoch.
* MemTrie splitting needs to be compatible with stateless validation, i.e. we need to generate a proof that the memtrie split proposed by the chunk producer is correct.
* The proof generated for splitting the MemTrie needs to be compatible with the limits of the size of state witness that we send to all chunk validators. This prevents us from doing things like iterating through all trie keys for delayed receipts etc.

With ReshardingV3 design, there's no protocol change to the structure of MemTries, however the implementation constraints required us to introduce the concept of a Frozen MemTrie. More details are in the [implementation](#state-storage---memtrie-1) section below.

Based on the requirements above, we came up with an algorithm to efficiently split the parent trie into two child tries. Trie entries can be divided into three categories based on whether the trie keys have an account_id prefix and based on the total number of such trie keys. Splitting of these keys are handled in different ways.

#### TrieKey with AccountID prefix

This category includes most of the trie keys like `TrieKey::Account`, `TrieKey::ContractCode`, `TrieKey::PostponedReceipt`. For these keys, we can efficiently split the trie based on the boundary account trie key. In the example below, "pass" was the split key, note that we only need to read all the intermediate nodes that form a part of the split key and nothing more. The accessed nodes form a part of the state witness. This limits the size of the witness to effectively O(depth) of trie.

![Splitting Trie diagram](assets/nep-0568/NEP-SplitState.png)

#### Singleton TrieKey

This category includes the trie keys `TrieKey::DelayedReceiptIndices`, `TrieKey::PromiseYieldIndices`, `TrieKey::BufferedReceiptIndices`. Notably, these are just a single entry (or O(num_shard) entries) in the trie and hence are small enough to read and modify for the children tries efficiently.

#### Indexed TrieKey

This category includes the trie keys `TrieKey::DelayedReceipt`, `TrieKey::PromiseYieldTimeout` and `TrieKey::BufferedReceipt`. The number of entries for these keys can potentially be arbitrarily large and it's not feasible to iterate through all the entries. In pre-stateless validation world, where we didn't care about state witness size limits, for ReshardingV2 we could just iterate over all delayed receipts and split them into the respective child shards.

For ReshardingV3, these are handled by either of the two strategies
- `TrieKey::DelayedReceipt` and `TrieKey::PromiseYieldTimeout` are handled by duplicating entries across both child shards as each entry could belong to either of the child shards. More details in the [Delayed Receipts](#delayed-receipt-handling) and [Promise Yield](#promiseyield-receipt-handling) sections below.
- `TrieKey::BufferedReceipt` are independent of the account_id and therefore can be sent to either of the child shards, but not both. We copy the buffered receipts and the associated metadata to the child shard with the lower index. More details in the [Buffered Receipts](#buffered-receipt-handling) section below.

### State Storage - Flat State

Expand Down Expand Up @@ -137,9 +168,9 @@ supporting smooth transitions without altering storage structures directly.

### Cross Shard Traffic

### Receipt Handling - Delayed, Postponed, PromiseYield

### Receipt Handling - Buffered
### Delayed Receipt Handling
### PromiseYield Receipt Handling
### Buffered Receipt Handling

### ShardId Semantics

Expand All @@ -160,6 +191,7 @@ In this NEP, we propose updating the ShardId semantics to allow for arbitrary id
The section should return to the examples given in the previous section, and explain more fully how the detailed proposal makes those examples work.]
```
### State Storage - MemTrie

### State Storage - State mapping

Expand Down

0 comments on commit fd5b42c

Please sign in to comment.