Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too small MaxGossipDelay results in disconnecting from all peers in testground #1224

Open
evan-forbes opened this issue Feb 13, 2024 · 1 comment
Labels
cat 🐈 T:Bug Type: Bug (confirmed) testground WS: Big Blonks 🔭 Improving consensus critical gossiping protocols

Comments

@evan-forbes
Copy link
Member

If we use the default parameters using the v2 mempool in testground tests results in the tests not finishing. After increasing the MaxGossipDelay (to 20-30 seconds, although lower values have not been tried and could work), the tests reliably finish. More information is needed to properly debug this, however looking at the logs we see that nodes are unable to find peers. When this occurs, the tests often reach a low height and stops being able to reach consensus.

More data should be collected to determine the exact cause of such a bug, specifically data around peering. atm, testground doesn't have access to the standard metrics so we've resorted to using the tracer.

@evan-forbes evan-forbes added T:Bug Type: Bug (confirmed) cat 🐈 WS: Big Blonks 🔭 Improving consensus critical gossiping protocols testground labels Feb 13, 2024
@cmwaters
Copy link
Contributor

So just from looking at the code, there doesn't seem to be any path which would cause the node to disconnect with their peer. There are three places that this currently happens:

  1. If the WantTx has a txhash that is the incorrect size
  2. If SeenTx has a txhash that is the incorrect size
  3. If the peers sends a message that is not a SeenTx, WantTx, or Tx

If we can't see any of these errors then I'm not sure how the CAT pool is disconnecting from peers (unless because of some timeout). I also can't explain why extending the timeout solves the problem

@evan-forbes evan-forbes self-assigned this Feb 21, 2024
@evan-forbes evan-forbes removed their assignment Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat 🐈 T:Bug Type: Bug (confirmed) testground WS: Big Blonks 🔭 Improving consensus critical gossiping protocols
Projects
None yet
Development

No branches or pull requests

2 participants