Skip to content

Commit

Permalink
state sync and stateless validation
Browse files Browse the repository at this point in the history
  • Loading branch information
wacban authored Oct 31, 2023
1 parent 3f850d2 commit b621aab
Showing 1 changed file with 18 additions and 8 deletions.
26 changes: 18 additions & 8 deletions neps/nep-0508.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Currently, NEAR protocol has four shards. With more partners onboarding, we star
* ~~Resharding should not require additional hardware from nodes.~~
* This needs to be assessed during test
* Resharding should be fault tolerant
* Chain must not stall in case of resharding failure. TODO - this seems impossible under current assumptions because the shard layout for an epoch is committed to the chain before resharding is fininished
* Chain must not stall in case of resharding failure. TODO - this seems impossible under current assumptions because the shard layout for an epoch is committed to the chain before resharding is finished
* A validator should be able to recover in case they go offline during resharding.
* For now, our aim is at least allowing a validator to join back after resharding is finished.
* No transaction or receipt should be lost during resharding.
Expand Down Expand Up @@ -103,15 +103,19 @@ When resharding, extra care should be taken when handling receipts in order to e

### New shard layout

A new shard layout will be determined and will be scheduled and executed in the production networks. The new shard layout will maintain the same boundaries for shards 0, 1 and 2. The heaviest shard today - Shard 3 will be split by introducing a new boundary account. The new boundary account will be determined by analysis the storage and gas usage within the shard and selecting a point that will divide the shard roughly in half in accordance to the mentioned metrics. Other metrics can also be used.
A new shard layout will be determined and will be scheduled and executed in the production networks. The new shard layout will maintain the same boundaries for shards 0, 1 and 2. The heaviest shard today - Shard 3 - will be split by introducing a new boundary account. The new boundary account will be determined by analysing the storage and gas usage within the shard and selecting a point that will divide the shard roughly in half in accordance to the mentioned metrics. Other metrics can also be used.

### Fixed shards

Fixed shards is a feature of the protocol that allows for assigning specific accounts and all of their recursive sub accounts to a predetermined shard. This feature is only used for testing, it was never used in production and there is no need for it in production. This feature unfortunately breaks the contiguity of shards. A sub account of a fixed shard account can fall in the middle of account range that belongs to a different shard. This property of fixed shards makes it particularly hard to reason about and implement efficient resharding. In order to simplify the code and new resharding implementation the fixed shards feature was removed ahead of this NEP.
Fixed shards is a feature of the protocol that allows for assigning specific accounts and all of their recursive sub accounts to a predetermined shard. This feature is only used for testing, it was never used in production and there is no need for it in production. This feature unfortunately breaks the contiguity of shards. A sub account of a fixed shard account can fall in the middle of account range that belongs to a different shard. This property of fixed shards makes it particularly hard to reason about and implement efficient resharding.

This was implemented ahead of this NEP.

### Transaction pool

The transaction pool is sharded e.i. it groups transactions by the shard where each should be converted to a receipt. The transaction pool was previously sharded by the ShardId. Unfortunately ShardId is insufficient to correctly identify a shard across a resharding event as ShardIds change domain. The transaction pool was migrated to group transactions by ShardUId instead and a transaction pool resharding was implemented to reassign transaction from parent shard to children shards right before the new shard layout takes effect. This was implemented ahead of this NEP.
The transaction pool is sharded e.i. it groups transactions by the shard where each should be converted to a receipt. The transaction pool was previously sharded by the ShardId. Unfortunately ShardId is insufficient to correctly identify a shard across a resharding event as ShardIds change domain. The transaction pool was migrated to group transactions by ShardUId instead and a transaction pool resharding was implemented to reassign transaction from parent shard to children shards right before the new shard layout takes effect.

This was implemented ahead of this NEP.

## Security Implications

Expand All @@ -131,11 +135,17 @@ The transaction pool is sharded e.i. it groups transactions by the shard where e

## Integration with State Sync

TBD
There are two known issues in the integration of resharding and state sync:
* When syncing the state for the first epoch where the new shard layout is used. In this case the node would need to apply the last block of the previous epoch. It cannot be done on the children shard as on chain the block was applied on the parent shards and the trie related gas costs would be different.

Check failure on line 139 in neps/nep-0508.md

View workflow job for this annotation

GitHub Actions / markdown-lint

Lists should be surrounded by blank lines [Context: "* When syncing the state for t..."]

neps/nep-0508.md:139 MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "* When syncing the state for t..."]
* When generating proofs for incoming receipts. The proof for each of the children shards contains only the receipts of the shard but it's generated on the parent shard layout and so may not be verified.

In this NEP we propose that resharding should be rolled out first, before any real dependency on state sync is added. We can then safely roll out the resharding logic and solve the abovementioned issues separately.

## Integration with Stateless Validation

TBD
The Stateless Validation requires that chunk producers provide proofs of correctness of the transition function from one state root to another. That proof for the first block after the new shard layout takes place will need to prove that the entire state split was correct as well as the state transition.

In this NEP we propose that resharding should be rolled out first, before stateless validation. We can then safely roll out the resharding logic and solve the abovementioned issues separately.

## Future possibilities

Expand Down Expand Up @@ -163,12 +173,12 @@ Other useful features that can be considered as a follow up:

* Number of shards is expected to increase.
* Underlying trie structure and data structure are not going to change.
* Resharding will create dependency on flat storage and state sync.
* Resharding will create dependency on flat storage, flat state snapshots and state sync. TODO - what dependency on state sync?

### Negative

* The resharding process is still not fully automated. Analyzing shard data, determining the split boundary, and triggering an actual shard split all need to be manually curated by a person.
* During resharding, a node is expected to do more work as it will first need to copy a lot of data around the then will have to apply changes twice (for the current shard and future shard).
* During resharding, a node is expected to do more work as it will first need to copy a lot of data around the then will have to apply changes twice (for the current shard and the future shard).
* Increased potential for apps and tools to break without proper shard layout change handling.

### Backwards Compatibility
Expand Down

0 comments on commit b621aab

Please sign in to comment.