Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-shot DKG #3776

Merged
merged 2 commits into from
Feb 8, 2024
Merged

Single-shot DKG #3776

merged 2 commits into from
Feb 8, 2024

Conversation

lukasz-zimnoch
Copy link
Member

@lukasz-zimnoch lukasz-zimnoch commented Feb 7, 2024

Refs: #3770
Depends on: #3775

The currently used DKG retry mechanism based on random exclusion turned out to be ineffective for a higher number of participating operators. Such retries have a very small chance of success and produce a lot of unnecessary network traffic that consumes bandwidth and CPU excessively.

Here we aim to improve the situation. First, we are making DKG a single-shot process that fails fast if the result cannot be produced during the first attempt. Second, we are doubling down the announcement period to maximize participation chances for all selected operators, even those at the edge of the network. Last but not least, we are reducing the submission delay that is preserved between operators attempting to submit the final result on-chain.

All those changes combined allow us to achieve shorter DKG iterations that can be timed out quicker. This way, we will be able to repeat DKG more often, with different operator sets.

Last but not least, we are also changing the re-transmission strategy for the resultSigningState which was still using StandardRetransmissionStrategy with retransmissions occurring on each tick. All DKG states use the BackoffRetransmissionStrategy strategy which leads to a sparse distribution of retransmissions and is considered more lightweight. There is no point in making an exception for the resultSigningState. This should reduce network load in case one of the participants fails at the end of the protocol.

The currently used DKG retry mechanism based on random exclusion turned
out to be ineffective for a higher number of participating operators.
Such retries have a very small chance for success and produce a lot of
unnecessary network traffic that consumes bandwidth and CPU excessively.

Here we aim to improve the situation. First, we are making DKG a single-shot
process which fails fast if the result cannot be produced during the
first attempt. Second, we are doubling down the announcement period to
maximize participation chance for all selected operators, even those being
at the edge of the network. Last but not least, we are reducing the submission
delay that is preserved between operators attempting to submit the final
result on-chain.

All those changes combined allow to achieve shorter DKG iterations that can
be timed out quicker. This way, we will be able to repeat DKG more often, with
different operator sets.
Copy link

github-actions bot commented Feb 7, 2024

Solidity API documentation preview available in the artifacts of the https://github.com/keep-network/keep-core/actions/runs/7820364827 check.

All DKG states use the `BackoffRetransmissionStrategy` strategy.
There is no point to make an exception for the `resultSigningState`.
This should reduce network load in case one of the participants fails
at the end of the protocol.
Copy link

github-actions bot commented Feb 7, 2024

Solidity API documentation preview available in the artifacts of the https://github.com/keep-network/keep-core/actions/runs/7820549852 check.

Copy link
Member

@pdyraga pdyraga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, left just one comment.

@@ -28,7 +28,7 @@ const (
// is used to calculate the submission delay period that should be respected
// by the given member to avoid all members submitting the same DKG result
// at the same time.
dkgResultSubmissionDelayStepBlocks = 15
dkgResultSubmissionDelayStepBlocks = 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned this is too low. The gas price bump happens after one minute by default so if the initial estimation was not successful, there will be not enough time to do even one price bump.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested that yesterday on Sepolia where gas prices were quite high and volatile. Haven't observed any problem so I think we should be good. It is important to reduce DKG timeout to the minimum and this factor strongly commits to that value. We will monitor the situation on mainnet and increase that if necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that the price bump will be actually done. The only downside is that the next member may front-run with their own transaction. This may lead to a collision sometimes but is not harmful to the protocol.

Base automatically changed from fix-electrum-mem-leak to main February 8, 2024 15:37
@lukasz-zimnoch lukasz-zimnoch marked this pull request as ready for review February 8, 2024 15:37
@tomaszslabon tomaszslabon merged commit 17735c2 into main Feb 8, 2024
29 checks passed
@tomaszslabon tomaszslabon deleted the single-shot-dkg branch February 8, 2024 15:54
lukasz-zimnoch added a commit that referenced this pull request Feb 12, 2024
This pull request backports
#3776 to the
`releases/mainnet/v2.0.0-m7` branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants