Skip to content

Commit

Permalink
docs: fix typo (celestiaorg#4180)
Browse files Browse the repository at this point in the history
<!--
Please read and fill out this form before submitting your PR.

Please make sure you have reviewed our contributors guide before
submitting your
first PR.
-->

## Overview

fix typo for docs/architecture/adr-018-network-upgrades.md

<!-- 
Please provide an explanation of the PR, including the appropriate
context,
background, goal, and rationale. If there is an issue with this
information,
please provide a tl;dr and link the issue. 
-->

Signed-off-by: yangquanshi <[email protected]>
  • Loading branch information
yangquanshi authored Jan 6, 2025
1 parent d58af4a commit e4c9f2d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/architecture/adr-018-network-upgrades.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ The height of the v1 -> v2 upgrade will initially be supplied via CLI flag (i.e.
- Given the uncertainty in scheduling, the system must be able to handle changes to the upgrade height that most commonly would come in the form of delays. Embedding the upgrade schedule in the binary is convenient for node operators and avoids the possibility for user errors. However, binaries are static. If the community wished to push back the upgrade by two weeks there is the possibility that some nodes would not rerun the new binary thus we'd get a split between nodes running the old schedule and nodes running the new schedule. To overcome this, proposers will only propose a version change in the first round of each height, thus allowing transactions to still be committed even under circumstances where there is no consensus on upgrading. Secondly, we define a range in which nodes will attempt to upgrade the app version and failing this will continue to run the current version. Lastly, the binary will have the ability to manually specify the app version height mapping and override the built-in values either through a flag or in the `app.toml` config. This is expected to be used in testing and in emergency situations only. Another example to keep in mind is if a quorum outright rejects an upgrade. If some of the validators are for the change they should have some way to continue participating in the network. Therefore we employ a range that nodes will attempt to upgrade and afterwards will continue on normally with the new binary however running the older version.
- The system needs to be tolerant of unexpected faults in the upgrade process. This can be:
- The community/contributors realize there is a bug in the new version after the binary has been released. Node operators will need to downgrade back to the previous version and restart their node.
- There is a halting bug in the migration or in processing of the first transactions. This most likely would be in the form of an apphash mismatch. This becomes more problematic with delayed execution as the block (with v2 transactions) has already been committed. Immediate execution has the advantage of the apphash mismatch being realised before the data is committed. It's still however feasible to over come this but it involves nodes rolling back the previous state and re-exectuing the transactions using the v1 state machine (which will skip over the v2 transactions). This means node operators should be able to manually override the app version that the proposer will propose with. Lastly, if state migrations occurred between v2 and v1, a reverse migration would need to be performed which would make things especially difficult. If we are unable to fallback to the previous version and continue then the other option is to remain halted until the bug is patched and the network can update and continue
- There is a halting bug in the migration or in processing of the first transactions. This most likely would be in the form of an apphash mismatch. This becomes more problematic with delayed execution as the block (with v2 transactions) has already been committed. Immediate execution has the advantage of the apphash mismatch being realised before the data is committed. It's still however feasible to over come this but it involves nodes rolling back the previous state and re-executing the transactions using the v1 state machine (which will skip over the v2 transactions). This means node operators should be able to manually override the app version that the proposer will propose with. Lastly, if state migrations occurred between v2 and v1, a reverse migration would need to be performed which would make things especially difficult. If we are unable to fallback to the previous version and continue then the other option is to remain halted until the bug is patched and the network can update and continue
- There is a bug that is detected that could halt the chain but hasn't yet. There are other things we can develop to combat such scenarios. One thing we can do is develop a circuit breaker similar to the designs proposed in [Cosmos SDK](https://github.com/cosmos/cosmos-sdk/tree/main/x/circuit). This can disable certain message types or modules either in `CheckTx` or `ProcessProposal`. This violates the consistency property between `PrepareProposal` and `ProcessProposal` but so long as a quorum are the same, will still allow the chain to progress (inconsistency here can be interpreted as byzantine).

### Future Work: Signaled Upgrade Height
Expand Down

0 comments on commit e4c9f2d

Please sign in to comment.