-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make LogPoller more robust against local finality violations #12605
Conversation
I see you updated files related to
|
eecd637
to
e65effb
Compare
e65effb
to
7a59051
Compare
15cb3bf
to
4e434a9
Compare
4e434a9
to
e0869c3
Compare
I see you added a changeset file but it does not contain a tag. Please edit the text include at least one of the following tags:
|
612c78c
to
72a93a4
Compare
bb8240a
to
ad14f8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I follow. How does validating whether blocks are finalized fix the issues with local finality violations? I think I miss the bigger picture here
Sorry, I should have provided more explanation on that in the PR description. The flow of
(omitting some inconsequential steps, these are the main ones) The scenario where a local finality violation can cause problems is when the primary rpc server returns The bottom line is, we only ever save finalized blocks to the db... the rpc failover scenario opened a loophole where we could save an unfinalized block for a block number we were previously told was finalized. This mostly closes that loophole. (The only potential edge case is if the requests sent in a single batch get executed out of order, and one or more of the blocks we're requesting gets re-org'd out and then finalized while the batch in being processed. That seems very unlikely, but we have another fix coming soon at the lower (multinode layer) level that should cover that scenario too.) I was originally thinking mostly about fixing things in step 4a, where it requests the logs. If the failover happens between steps 1 and 4a then it can get back unfinalized logs, which is also a problem. But this fix should take care of both that situation and one where it fails over between steps 4a and 4b, because if |
f561a6b
to
7ba8818
Compare
4060d19
to
9a88e65
Compare
getCurrentBlockMaybeHandleReorg is called just before the for loop over unfinalized blocks begins, and at the end of each iteration. Simplifying by moving them both to the beginning of the for loop
This fixes 2 bugs on develop branch in this test, and removes some unused commented code. First Bug ========= The first bug was causing a false positive PASS on develop branch, which was obscuring a (very minor) bug in BackupPoller that's been fixed in this PR. The comment here about what the test was intended to test is still correct: // Only the 2nd batch + 1 log from a previous batch should be backfilled, because we perform backfill starting // from one block behind the latest finalized block Contrary to the comment, the code was returning 2 logs from the 1st batch (Data=9 & Data=10), plus 9 of 10 logs from the 2nd batch. This was incorrect behavior, but the test was also checking for the same incorrect behavior (looking for 11 logs with first one being Data=9) instead of what's described in the comment. The bug in the production code was that it starts the Backup Poller at Finalized - 1 instead of Finalized. This is a harmless "bug", just unnecessarily starting a block too early, since there's no reason for backup logpoller to re-request the same finalized logs that's already been processed. Now, the code returns the last log from the 1st batch + all but one logs from the 2nd batch, which is correct. (It can't return the last log because that goes beyond the last safe block.) So the test checks that there are 10 logs with first one being Data=10 (last log from the first batch.) Second Bug ========== The second bug was passing firstBatchBlock and secondBatchBlock directly to markBlockAsFinalized() instead of passing firstBatchBlock - 1 and secondBatchBlock - 1. This was only working because of a bug in the version of geth we're currently using: when you request the pending block from simulated geth, it gives you back the current block (1 block prior) instead of the current block. (For example, in the first case, even though we wanted block 11, the latest current block, we request block 12 and get back block 11.) This has been fixed in the latest version of geth... so presumably if we don't fix this here the test would have started failing as soon as we upgrade to the latest version of geth. It doesn't change any behavior of the test for the present version of geth, just makes it more clear that we want block 11 not 12.
…om "latest" batchFetchBlocks() will now fetch the "finalized" block along with the rest of each batch, and validate that all of the block numbers (aside from the special when "lateest" is requested) are <= the finalized block number returned. Also, change backfill() to always save the last block of each batch of logs requested, rather than the last block of the logs returned. This only makes a difference if the last block requested has no logs matching the filter, but this change is essential for being able to safely change lastSafeBlockNumber from latestFinalizedBlock - 1 to latestFinalizedBlock
9a88e65
to
70a17ee
Compare
70a17ee
to
738716d
Compare
Quality Gate passedIssues Measures |
@@ -554,6 +599,77 @@ func Test_latestBlockAndFinalityDepth(t *testing.T) { | |||
}) | |||
} | |||
|
|||
func Test_FetchBlocks(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
The main focus of this PR is to help LogPoller detect, prevent "local finality violations". These are cases where the multinode layer fails over from one rpc several to another and the second one is behind on its view of the chain. This can make it look like there is a finality violation even when there was not any global finality violation on the chain.
Primary change:
batchFetchBlocks() will now fetch the "finalized" block along with the rest of each batch,
and validate that all of the block numbers (aside from the special case when "latest" is requested)
are <= the finalized block number returned.
Also in this PR are two refactors for reducing code complexity:
Change backfill() to always save the last block of each batch of logs requested, rather than the last block of the logs returned.
(This only makes a difference if the last block requested has no logs matching the filter, but this change will allow us to eliminate
lastSafeBlockNumber = latestFinalizedBlock - 1 in an upcoming PR in favor of latestFinalizedBlock which simplifies the overall LogPoller
implementation. It also gets us a step closer to being able to use "finalized" for the upper range of the final batch request for logs, but that
comes with some additional complexities which still need to be worked out.)
Start Backup LogPoller on
lastProcessed.FinalizedBlockNumber
instead oflastProcessed.FinalizedBlockNumber
- 1. This was a harmless "bug" where it was starting one block too early, but entirely separate from the previous change.Minor refactor to remove code duplication (condensing 3 replicated code blocks calling
getCurrentBlockMaybeHandleReorg
down to 2)