-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Further investigate execution #3
Comments
Is there any difference for this between the Go and Rust implementations of Tendermint? |
The tendermint-rs repository doesn't contain a fullnode implementation yet. But the implementation aims to be compatible to the go one. (So the answer will be no unless we can merge our changes upstream of course). I think there is another implementation in Rust by Codechain. Might be a good idea to look into that too. |
High level (and unfinished) overview on state execution and what places would need changes in tendermint to execute Txs (and update the AppHash/state root) in the same block. ProblemWhy bother and change the current model?In the current execution model, blocks are executed against the app only after they are committed. Full block verification (incl. state) always needs access to transactions of the previous block: state(1) = InitialState For more info on Execute see: https://docs.tendermint.com/master/spec/blockchain/blockchain.html#execution This causes all sorts of confusion when implementing an app module on top of this logic. In the context of LazyLedger particularly but basically in any chain / application where we want to generate fraud proofs the "one off" between block height and state height causes further issues. First, (state) fraud proofs are also generated one-off, meaning their generation is also delayed one block. Second, this means light clients that want to verify a fraud proof need to wait for and download an extra block/header as they will find the updated root only there. In this model a light client that receives a (state) fraud proof for block at height h from a (full) node would need to verify that against the block h-1 as the Tx are included in the previous block. Why changing the current model might still be undesired?The more changes we add to the current tendermint codebase, the more lazyledger-core diverges from tendermint, the more time consuming it will be to keep up with (critical) fixes from upstream. While it is true that we will make changes to Tendermint anyways, the changes regarding the execution model might be less contained and touch not a single but several go packages. ChangesTheoretically changing Tendermint to immediate seems trivial: during the consensus steps of a round (propose, prevote, precommit), we need to update the state root before we proceed further (IMO the earlier we do so, the better). Practically, the changes will certainly be a bit more elaborate. Another weirdness of the current implementation is that Block execution can happen from two different places: either from the Consensus or the Blockchain Reactor. At the same time the Blockchain reactor can modify the state of the consensus reactor (NOTE: This design deserves a proper review and refactoring independent from what we are trying to achieve here...). Block execution modifies the mempool and both the mempool as well as block execution itself have read/write access to the app via ABCI. "Seal Phase"Once all transactions for the current height are executed, we need to update the state root (aka AppHash in tendermint). (For the lack of a better name, let's call this seal phase for now). We need to figure out when it’s reasonable (and safe to do this). My understanding is that we could execute *before *we even propose a block (first step) and there is no reason to wait until the before the precommit. As both @ebuchman and @zmanian suggested to do this before the precommit instead. So I’m probably missing something important here... A high level overview of the tendermint steps: (Image credits: https://docs.google.com/presentation/d/1k0CbJQBHoMJkuOXdAKHFw9dcaXES_3dFYnCp1I1bm2M/edit#slide=id.g3acacab17a_0_528) Note that all steps come with timeouts which are critical for the consensus to make progress. It is critical that these timeouts need to be generous enough that we don’t end up timing out and skipping rounds without making any real progress. Hence, we might have to increase the timeout in between steps where execution happens. IMO the block is proposed directly with all Tx executed and state root up to date. This means increasing timeout_propose. Of course this can mean the nodes might execute blocks which don’t end up being committed to (and waste valuable computing time) but I don’t think this will be a problem in practice. Consensus ReactorThis is where the meat is. We need to teach the reactor to only propose blocks with state roots updated according to the current transactions included in that block. (See the tendermint spec and the code for more details.) Currently, the BlockExecutor runs on Commit and updates the mempool. The executor gets called by the blockchain reactor (which is a mess, e.g. look at this). The current battle tested blockchain reactor (used in the hub) is v0 and I’ve hoped we can avoid touching it. We could peek into the refactored versions (v1 and v2) in case the changes we need to make exceed a few lines. ReplayNeed to update replay logic accordingly. Here and in a few other places where blocks are replayed. Blockchain Reactor
MempoolWhat do we do with CheckTx?
Gas ModelThe current Gas model relies on running CheckTx (via ABCI). Tx are added to the mempool as a mempoolTx with gasWanted set to the corresponding ResponseCheckTx’s GasWanted. My understanding is that we’d rather want users to pay only for the gas that was actually consumed instead of refunding unused gas (see this issue). Validator UpdatesCurrently validator sets are tracked in this way by tendermint: transactions that affect the validator set proposed in block H will be committed in the block H+1, and therefore block H+2 will be committed by the new val set. As state gets executed immediately, validator changes also happen immediately. This has two implications: first, we can get rid of Header.ValidatorHash as it isn’t the app’s output from the previous block anymore (and it simply represents the current validators that signed the block). The current validators (of height H) can be part of the state root. Similarly Header.NextValidatorsHash needs to be changed to point to the validator set that will commit the next block H+1. These changes substantially change the light-client model (see the spec for more details on the current model). ABCIABCI connects the tendermint state machine with the application (this is the actual business logic state machine; e.g. in LazyLedger the app would be the PoS logic). TODO RPCThe light client uses 2 RPC end-points to get the signed headers and validator sets for given heights. Need to update the validator endpoint according to above changes to the validator sets tracked in the header/state: https://docs.tendermint.com/master/rpc/#/Info/validators This should be a very simple change as the RPC only reads Further Note on dependencies between different componentsThis diagram illustrates read/write access of components to different kinds of state: (Thanks to Sean aka @brapse for the diagram) The problem here is that some of the read/write access above happens in go-routines and access is often guarded via mutexes. We have to be careful not to modify this in a way that could deadlock. Voiced concerns
|
I'm also dropping @adlerjohn's summary on how we can do fraud proofs in the current model: Summary for keeping deferred execution:
Since transactions and intermediate state roots aren't interleaved, but rather in separate reserved namespaces in the NMT, it's trivial to just shift which block the intermediate state roots are for. |
Transcluding essence of comment from other discussion thread: Immediate execution is needed for certain fee models, including Ethereum-style unused-gas refunds and in-protocol non-proportional fee burning.
Fee burning is especially important, and seeing a rise in usage in modern blockchain designs because it provides a sink for coins, instead of only the sources that the original Bitcoin design has (new coin issuance with block rewards). Sinks are invaluable as they create intrinsic demand for the coin (much like how taxes cannot be supplanted by printing money). |
I'm 99% sure that Libra does immediate execution and invests a fairly large amount of code speculating execution states if multiple competing proposals exist at any height. If we are executing blocks then ABCI will need the ability to potential abort an execution and revert state. |
Thanks for this write up! Just a couple thoughts/questions:
LOL this is why we rewrote the blockchain reactor. Probably best to focus on v2 ...
I don't think we should get rid of Header.ValidatorHash. We want Tendermint light clients to work independent of any application, so saying "it can be part of the state root" doesn't help because then the pure Tendermint light client needs to be able to read arbitrary application roots.
Can't all of this still be done, just off-by-one? Correct proposers should never include invalid txs in the first place (if CheckTx is written correctly), so apps can just punish proposers that do by slashing. And otherwise can't fee burning and gas refunds just happen as part of execution after the block is committed? |
Not in an incentive-compatible manner. In the case of gas refunds, the block proposer doesn't know how much gas a transaction will actually use (and thus, how much they'll actually get paid) unless they execute it immediately. So checking for transaction validity (does this user have enough balance to pay for the transaction's gas limit) is sufficient for safety but not incentive-compatibility. In the case of fee burning (or invalid transactions in general), block proposers can't include two transactions from the same account in the same block without executing them immediately. So proposers that do immediate execution have access to more transactions to potentially include in their blocks (i.e. more profit), and so all proposers will do immediate execution. Since every validator can be a proposer eventually, every validator will therefore do immediate execution all the time too. Not accounting for this in-protocol means smaller validators that can't afford to do immediate execution will eventually get pushed out by bigger validators. |
When I wrote this issue V2 didn't seem ready (I even had a branch which deleted the other reactors; but back then tendermint didn't seem to make blocks 😬 I was in touch with @erikgrinaker, can't remember what the issue was). We are planning to switch as soon as possible though: #13
I guess in the light of using tendermint (consensus) for LL where (tendermint) application state is not arbitrary in the sense that we optimize for one minimal core application (and devs will write their application logic client side / using some exec environment). But that also means that we can't use the tendermint light client as is (and just add a data availability) but need to re-write some of the core logic there too. |
Does this really matter though? If it only costs them what's used, and they only get paid for that, isn't it fine? As long as the block gas limit is also based on what's actually used, which is maybe the real issue here: cosmos/cosmos-sdk#2150 Eg. for ETH miners, my understanding is they take whats in their mempool, sort by gasprice, and execute until they get to the gas limit. But they wouldn't like execute a transaction, realize the total gas used is too low, and then roll it back and try another one.
Not sure I understand. We currently support multiple txs from the same account in a single block by making sure CheckTx is smart enough for it. Why would fee burning be any different? Can't we ensure in CheckTx that users have enough balance to send two txs with full fee burning? Or is there some other sort of dependency between the txs that could invalidate the second tx only after full execution? |
Also, one possible but maybe hacky approach at least to these fee issues is by doing full execution in CheckTx (?!) |
v2 had a race condition where it tried to switch to the consensus reactor before it had been hooked up to the switch, and then just silently gave up on everything. It also had an issue with spinning on closed channels, using 100% CPU. The former has been fixed on |
It's not, actually! Consider the case where there are three txs:
with a block gas limit of
That's right, and you can't do this unless you immediately execute transactions.
The first transaction could empty the user's account, making the second one invalid.
I mean, if you just make CheckTx do full execution then that does kind of get you close to immediate execution, sure. There's still the issue of the |
Usually CheckTx is designed to check things that could decrement the users balance. So in principle this kind of thing shouldn't happen, but of course in practice if control over account funds are eg given to a contract then all bets are off.
Good to know we at least have a stop gap. Fascinating to explore these boundaries of ABCI. The current design has lasted almost 5 years now, might be time to really rethink things for the next 5. As for the off-by-one, that's more of a convenience issue right now since clients have to download an extra block? Or is there something more fundamental? |
I have to think about this some more, but I don't see any inherent issue other than light clients having to wait for and download an extra block (and, you know, being hacky, which is a big issue 😂). Technically, if you're using state transition fraud proofs, then the last intermediate state root is the state root after applying all the transactions in the block, so light clients can just use that instead of the Tendermint One issue might be timing. The current timeouts between each stage of the Tendermint consensus process assume that transaction execution is pipelined. But I guess that's a system parameter, so not tied to Tendermint itself. |
We decided this is out of scope at least until abci++ lands in tendermint for real. We can reconsider this next year. |
…heck (#1186) This PR addresses the vulnerabilities identified by govulncheck in [PR #1179](https://github.com/celestiaorg/celestia-core/actions/runs/7629759074/job/20786518441?pr=1179). It upgrades the affected modules to the versions recommended by govulncheck. ``` Vulnerability #1: GO-2024-2466 Denial of service in github.com/go-git/go-git/v5 and gopkg.in/src-d/go-git.v4 More info: https://pkg.go.dev/vuln/GO-2024-2466 Module: github.com/go-git/go-git/v5 Found in: github.com/go-git/go-git/[email protected] Fixed in: github.com/go-git/go-git/[email protected] Example traces found: Error: #1: test/e2e/generator/generate.go:402:36: generator.gitRepoLatestReleaseVersion calls git.PlainOpenWithOptions, which calls filesystem.NewStorage Error: #2: test/e2e/generator/generate.go:402:36: generator.gitRepoLatestReleaseVersion calls git.PlainOpenWithOptions Error: #3: test/e2e/generator/generate.go:407:30: generator.gitRepoLatestReleaseVersion calls git.Repository.TagObjects Vulnerability #2: GO-2024-2456 Path traversal and RCE in github.com/go-git/go-git/v5 and gopkg.in/src-d/go-git.v4 More info: https://pkg.go.dev/vuln/GO-2024-2456 Module: github.com/go-git/go-git/v5 Found in: github.com/go-git/go-git/[email protected] Fixed in: github.com/go-git/go-git/[email protected] Example traces found: Error: #1: test/e2e/generator/generate.go:402:36: generator.gitRepoLatestReleaseVersion calls git.PlainOpenWithOptions, which calls filesystem.NewStorage Error: #2: test/e2e/generator/generate.go:402:36: generator.gitRepoLatestReleaseVersion calls git.PlainOpenWithOptions Error: #3: test/e2e/generator/generate.go:407:30: generator.gitRepoLatestReleaseVersion calls git.Repository.TagObjects === Informational === There are 2 vulnerabilities in modules that you require that are neither imported nor called. You may not need to take any action. See https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck for details. Vulnerability #1: GO-2024-2453 Timing side channel in github.com/cloudflare/circl More info: https://pkg.go.dev/vuln/GO-2024-2453 Module: github.com/cloudflare/circl Found in: github.com/cloudflare/[email protected] Fixed in: github.com/cloudflare/[email protected] Vulnerability #2: GO-2023-[17](https://github.com/celestiaorg/celestia-core/actions/runs/7629759074/job/20786518441?pr=1179#step:5:18)65 Leaked shared secret and weak blinding in github.com/cloudflare/circl More info: https://pkg.go.dev/vuln/GO-[20](https://github.com/celestiaorg/celestia-core/actions/runs/7629759074/job/20786518441?pr=1179#step:5:21)23-1765 Module: github.com/cloudflare/circl Found in: github.com/cloudflare/[email protected] Fixed in: github.com/cloudflare/[email protected] Your code is affected by 2 vulnerabilities from 1 module. Share feedback at https://go.dev/s/govulncheck-feedback. exit status 3 make: *** [Makefile:254: vulncheck] Error 1 Error: Process completed with exit code 2. ```
In order to fix the go vulnerabilities that are fixed in the new patch: ``` Vulnerability #1: GO-2024-2610 Errors returned from JSON marshaling may break template escaping in html/template More info: https://pkg.go.dev/vuln/GO-2024-2610 Standard library Found in: html/[email protected] Fixed in: html/[email protected] Example traces found: #1: test/fuzz/rpc/jsonrpc/server/handler.go:30:15: server.Fuzz calls http.ServeMux.ServeHTTP, which eventually calls template.Template.Execute #2: rpc/jsonrpc/server/http_server.go:256:15: server.maxBytesHandler.ServeHTTP calls http.HandlerFunc.ServeHTTP, which eventually calls template.Template.ExecuteTemplate Vulnerability #2: GO-2024-2600 Incorrect forwarding of sensitive headers and cookies on HTTP redirect in net/http More info: https://pkg.go.dev/vuln/GO-2024-2600 Standard library Found in: net/[email protected] Fixed in: net/[email protected] Example traces found: #1: rpc/jsonrpc/client/http_json_client.go:213:34: client.Client.Call calls http.Client.Do #2: libs/cli/setup.go:89:26: cli.Executor.Execute calls cobra.Command.Execute, which eventually calls http.Client.Get #3: p2p/upnp/upnp.go:205:20: upnp.getServiceURL calls http.Get Vulnerability #3: GO-2024-2599 Memory exhaustion in multipart form parsing in net/textproto and net/http More info: https://pkg.go.dev/vuln/GO-2024-2599 Standard library Found in: net/[email protected] Fixed in: net/[email protected] Example traces found: #1: rpc/jsonrpc/server/http_server.go:62:16: server.Serve calls http.Server.Serve, which eventually calls textproto.Reader.ReadLine #2: rpc/jsonrpc/server/http_server.go:62:16: server.Serve calls http.Server.Serve, which eventually calls textproto.Reader.ReadMIMEHeader Vulnerability #4: GO-2024-2598 Verify panics on certificates with an unknown public key algorithm in crypto/x509 More info: https://pkg.go.dev/vuln/GO-2024-2598 Standard library Found in: crypto/[email protected] Fixed in: crypto/[email protected] Example traces found: #1: libs/autofile/group.go:479:30: autofile.GroupReader.Read calls bufio.Reader.Read, which eventually calls x509.Certificate.Verify Your code is affected by 4 vulnerabilities from the Go standard library. ```
This issue is to track changes needed (and tradeoffs) which will enable to update the AppHash (state root) before consensus on a block.
Roughly:
Related: tendermint/tendermint#2483
Also:
The text was updated successfully, but these errors were encountered: