-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dip-213] decoupled execution #214
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
--- | ||
dip: 213 | ||
title: Decoupled execution | ||
authors: Zekun Li (@zekun000), Yu Xia (@yuxiamit) | ||
status: Draft | ||
type: Standard | ||
created: 08/09/2021 | ||
last updated: 09/09/2021 | ||
issue: https://github.com/diem/dip/issues/213 | ||
--- | ||
|
||
# Summary | ||
|
||
The current [DiemBFTv4](https://developers.diem.com/papers/diem-consensus-state-machine-replication-in-the-diem-blockchain/2021-08-17.pdf) consensus core engine forms agreement on transaction ordering jointly with execution results. | ||
That is, a transaction (block) is **finalized** in the history of transactions at some position 'x' at the same time that validators vote on a ledger state `LedgerInfo` resulting from executing the history-prefix ending with the transaction (block) at position 'x'. The upside of this approach is simplicity. | ||
The down-side is that the coupling execution with ordering stalls each other and limits throughput. | ||
This DIP proposes to separate them to unlock better throughput without compromising security. | ||
|
||
``` | ||
Current paradigm: Order → Execute → Commit → Order → Execute → Commit | ||
|
||
Proposed paradigm: Order → Order → Order | ||
↓ ↓ | ||
Execute → Execute | ||
↓ ↓ | ||
Commit → Commit | ||
``` | ||
|
||
If the time of ordering, execution and commit for a block is O, E, C correspondingly, the throughput should change from 1/(O + E + C) to 1/max(O, E, C) which not only unlocks throughput benefits immediately, | ||
but makes bottlenecks more visible and could guide future optimizations. | ||
|
||
# Description | ||
|
||
In [DiemBFTv4](https://developers.diem.com/papers/diem-consensus-state-machine-replication-in-the-diem-blockchain/2021-08-17.pdf), a block (proposal) goes through different steps before it's finalized. The steps are | ||
``` | ||
Consensus stage | ||
1. Proposed | ||
2. Executed | ||
3. Voted (including execution result) | ||
4. QuorumCertified | ||
5. 2-chain Certified (Finalized) | ||
``` | ||
|
||
The proposed change is to decouple execution from consensus and pipeline the process with different stages. A block would go through | ||
4 stages each with its own steps: | ||
``` | ||
1. Consensus stage | ||
* Proposed | ||
* Voted (no execution result) | ||
* QuorumCertified | ||
* 2-chain Certified (Ordered) | ||
2. Execution stage | ||
3. Voting stage (for execution result) | ||
4. Commit stage (Finalized) | ||
``` | ||
The stages run in parallel, utilizing available compute resources to increase overall throughput. | ||
That is, the system could commit B1, sign B2, execute B3, and order B4, all at the same time. Note, while throughput increases due to parallel execution, latency of an individual transaction (block) finalization does not decrease, as the stages are inherently sequential. In fact, individual transaction latency may slightly increase since commit occurs less "eagerly". | ||
|
||
# Required Changes | ||
|
||
## Consensus | ||
`StateComputer` is the trait that consensus uses for `compute`, `commit` or `sync_to` blocks. | ||
An `ordering_state_computer` is implemented to bypass execution and send blocks to next stage when ordered. | ||
|
||
## BlockStateManager | ||
A `BlockStateManager` is implemented to manage different stages of ordered blocks, it's a queue of blocks with stage markers. | ||
- `Ordered` block is sent to execution and advanced to `Executed` after receiving execution result. | ||
- `Executed` block is sent to safety rules and advanced to `Signed` after receiving a signature. | ||
- Signature on `Signed` block is broadcasted to all validators. At all stages except `Aggregated`, signatures from other validators on the block are collected. | ||
- A block in either `Executed` or `Signed` stage is advanced to `Aggregated` after receiving a quorum of signatures. | ||
- `Aggregated` block is popped from the queue and sent to persistent storage. | ||
|
||
## Sync | ||
When the node receives the SyncInfo carried in Proposal/Vote messages, it checks the highest round numbers compared to local round and may decide to fast-forward via state sync protocol. It's triggered if the gap between | ||
the local highest round of ordered block and the remote highest round of a committed block becomes large (current threshold is 10). Upon state sync, | ||
the node fast-forwards to the same state (both ledger state and pending blocks) as the remote peer. | ||
|
||
## Backpressure | ||
To prevent consensus-ordering going too fast and creating an ever-growing backlog for other stages, back pressure is implemented to stop ordering from making progress if the gap between the highest committed block and highest ordered block is large. | ||
|
||
## Reconfiguration | ||
zekun000 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Recall that a reconfiguration transaction must be the last transaction of an epoch. Before this proposal, consensus-ordering is aware of | ||
reconfiguration and leader proposes empty suffix. After the proposal, execution on each node recognizes this type of transaction and BlockStateManager stops processing any blocks succeeding the reconfiguration block. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does this mean that the consensus stage continues moving forward, oblivious of the reconfiguration transaction, until either execution catches up with the reconfig tx or back-pressure prevents progress? if yes, perhaps add this clarification to the text? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it moves forward until it hits backpressure and then blocks until next epoch |
||
Ordering stage contineus moving forward until backpressure is triggered. Ordered blocks after the reconfiguration block are discarded and transactions inside those blocks would be retried in next epoch. | ||
Imagine a few block batches certified by ledger info, [1, 2] [r, 3], [4], [5, 6], r represents a block that contains reconfiguration transaction. The ledger info is aggregated on block 3 (transactions in 3 are skipped) and 4, 5, 6 are discarded. | ||
After the reconfiguration block gets committed, the old epoch instance is dropped, a new epoch is instantiated via a virtual genesis block derived from the last ledger info in previous epoch. | ||
|
||
## Upgrade | ||
zekun000 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
To enable the feature in an existing blockchain requires coordination via reconfiguration, on-chain consensus config would be updated to a newer version supporting the upgrade protocol. | ||
|
||
## Client | ||
The finality proof is aggregated in the same format as before (LedgerInfoWithSignatures), so this change is client agnostic. | ||
|
||
# Future opportunities | ||
The current suggested implementation implements execution signature aggregation via a simple broadcast, a leader based mechanism could be implemented to reduce the network cost | ||
by compromising one-hop latency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest "certification of correct execution", as 2f+1 have agreed that this is the correct execution output