From bfe52dbbcacf7be46d246f0f7c827e20d219c5e5 Mon Sep 17 00:00:00 2001 From: walnut-the-cat <122475853+walnut-the-cat@users.noreply.github.com> Date: Tue, 14 Nov 2023 11:19:25 -0800 Subject: [PATCH] Update nep-0508.md minor changes --- neps/nep-0508.md | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/neps/nep-0508.md b/neps/nep-0508.md index 9706f60d6..25266ade1 100644 --- a/neps/nep-0508.md +++ b/neps/nep-0508.md @@ -6,8 +6,8 @@ Status: Draft DiscussionsTo: https://github.com/near/nearcore/issues/8992 Type: Protocol Version: 1.0.0 -Created: 2022-09-19 -LastUpdated: 2023-09-19 +Created: 2023-09-19 +LastUpdated: 2023-11-14 --- ## Summary @@ -16,7 +16,7 @@ This proposal introduces a new implementation for resharding and a new shard lay In essence, this NEP is an extension of [NEP-40](https://github.com/near/NEPs/blob/master/specs/Proposals/0040-split-states.md), which was focused on splitting one shard into multiple shards. -We are introducing resharding v2, which supports one shard splitting into two within one epoch at a pre-determined split boundary. The NEP includes performance improvement to make resharding feasible under the current state as well as actual resharding in mainnet and testnet (To be specific, spliting shard 3 into two). +We are introducing resharding v2, which supports one shard splitting into two within one epoch at a pre-determined split boundary. The NEP includes performance improvement to make resharding feasible under the current state as well as actual resharding in mainnet and testnet (To be specific, spliting the largest shard into two). While the new approach addresses critical limitations left unsolved in NEP-40 and is expected to remain valid for foreseable future, it does not serve all usecases, such as dynamic resharding. @@ -36,13 +36,11 @@ Currently, NEAR protocol has four shards. With more partners onboarding, we star ### High level requirements -* Resharding should work even when validators stop tracking all shards. -* Resharding should work after stateless validation is enabled. -* Resharding should be fast enough so that both state sync and resharding can happen within one epoch. +* Resharding must be fast enough so that both state sync and resharding can happen within one epoch. * Resharding should work efficiently within the limits of the current hardware requirements for nodes. * Potential failures in resharding may require intervention from node operator to recover. -* No transaction or receipt should be lost during resharding. -* Resharding should work regardless of number of existing shards. +* No transaction or receipt must be lost during resharding. +* Resharding must work regardless of number of existing shards. * No apps, tools or code should hardcode the number of shards to 4. ### Out of scope @@ -84,7 +82,7 @@ The implementation heavily re-uses the implementation from [NEP-40](https://gith ### Flat Storage -The old implementaion of resharding relied on iterating over the full trie state of the parent shard in order to build the state for the children shards. This implementation was suitable at the time but since then the state has grown considerably and this implementation is now too slow to fit within a single epoch. The new implementation relies on iterating through the flat storage in order to build the children shards quicker. Based on benchmarks, splitting shard 3 by using flat storage can take around 15 min without throttling and around 3 hours with throttling to maintain block production rate. +The old implementaion of resharding relied on iterating over the full trie state of the parent shard in order to build the state for the children shards. This implementation was suitable at the time but since then the state has grown considerably and this implementation is now too slow to fit within a single epoch. The new implementation relies on iterating through the flat storage in order to build the children shards quicker. Based on benchmarks, splitting the largest shard by using flat storage can take around 15 min without throttling and around 3 hours with throttling to maintain block production rate. The new implementation will also propagate the flat storage for the children shards and keep it up to the with the chain until the switch to the new shard layout in the next epoch. The old implementation didn't handle this case because the flat storage didn't exist back then. @@ -148,6 +146,13 @@ The Stateless Validation requires that chunk producers provide proof of correctn In this NEP we propose that resharding should be rolled out first, before stateless validation. We can then safely roll out the resharding logic and solve the above mentioned issues separately. +## Future fast-followups +### Resharding should work even when validators stop tracking all shards. +As mentioned above under 'Integration with State Sync' section, initial release of resharding v2 will happen before the full implementation of state sync and we plan to tackle the integration between resharding and state sync after the next shard split (Won't need a separate NEP as the integration does not require protocol change.) + +### Resharding should work after stateless validation is enabled. +As mentioned above under 'Integration with Statelss Validation' section, initial release of resharding v2 will happen before the full implementation of stateless validation and we plan to tackle the integration between resharding and stateless validation after the next shard split (May need a separate NEP depending on implemetnation detail.) + ## Future possibilities ### Dynamic resharding