Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reshare timing issue #1211

Merged
merged 11 commits into from
Dec 18, 2024
Merged

Fix reshare timing issue #1211

merged 11 commits into from
Dec 18, 2024

Conversation

JesseAbram
Copy link
Member

@JesseAbram JesseAbram commented Dec 11, 2024

this is pretty much a mirror of #1201 into master

also tagging #1210 this is the error that was related to that issue

Adds a few regression tests and a better way to mock reshares by manipulating storage

@@ -295,7 +295,7 @@ pub async fn validate_new_reshare(
.await?
.ok_or_else(|| ValidatorErr::ChainFetch("Not Currently in a reshare"))?;

if chain_data.block_number != reshare_data.block_number.saturating_sub(1)
if chain_data.block_number != reshare_data.block_number
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sub was removed as now it matches the running testnet

@JesseAbram JesseAbram marked this pull request as ready for review December 17, 2024 15:52
@@ -74,8 +74,5 @@ pub const MAX_SIGNERS: u8 = 15;
/// Threshold for those signers
pub const SIGNER_THRESHOLD: u8 = 2;

/// For testing to line up chain mock data and reshare_test
pub const TEST_RESHARE_BLOCK_NUMBER: u32 = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i am happy to say goodbye to this one

Copy link
Contributor

@ameba23 ameba23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this way of doing things without using the chainspec, as you can modify things for just one test without meddling with all the other tests. I really hope this now means the reshare test will pass consistently, because i am pretty disappointed that it still fails, after a lot of fiddling around, as does the jumpstart test sometimes.

@@ -101,14 +132,19 @@ async fn test_reshare_basic() {
if new_signer_ids != old_signer_ids {
break Ok(new_signer_ids);
}
if i > 120 {
if i > 240 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Was this needed to get the tests to pass? Disappointingly we are still often getting this timeout error, and i kinda thought it could not be that the reshare takes more than two minutes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not, I changed it when I was doing to reshares so it took more time, but I mean doesn't hurt

break Err("Timed out waiting for reshare");
}
i += 1;
tokio::time::sleep(std::time::Duration::from_secs(1)).await;
}
.unwrap();

// wait for roatate keyshare
tokio::time::sleep(std::time::Duration::from_secs(10)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah now i see what you mean about rotate keyshare happening after signers have changed. Yeah totally makes sense that you have to wait here a block before checking the keyshare has changed.

let storage_address_reshare_data = entropy::storage().staking_extension().reshare_data();
let value_reshare_info =
ReshareInfo { block_number, new_signers: vec![alice_stash.public().encode()] };
// Add another reshare
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment made me think you are doing a third reshare here - but this is part of the second reshare right? We need to modify next_signers and reshare_data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I can try to find better comments

call_set_storage(&api, &rpc, call).await;

// wait for roatate keyshare
tokio::time::sleep(std::time::Duration::from_secs(60)).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit brittle - how come one minute here but above we have 4 mins plus 10 seconds for the key rotation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes on the second I am just checking that the rotate key share happend, got to 60 seconds from trial and error, the above one exists if finished early so I didn't try to make it shorter

return Ok(());
}

let deadline = sp_io::offchain::timestamp().add(Duration::from_millis(2_000));
let deadline = sp_io::offchain::timestamp().add(Duration::from_millis(10_000));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No harm in leaving this here i guess, but in theory it should not be needed outside of the hot fix because resharing happens in a separate task

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, gonna leave it tho cuz why not

@@ -876,7 +858,7 @@ pub mod pallet {
// trigger reshare at next block
let current_block_number = <frame_system::Pallet<T>>::block_number();
let reshare_info = ReshareInfo {
block_number: current_block_number + sp_runtime::traits::One::one(),
block_number: current_block_number - sp_runtime::traits::One::one(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems kinda weird that we subtract one here and then add one again in the propagation pallet. But since it works, definitely don't touch it.

@JesseAbram JesseAbram merged commit 8072702 into master Dec 18, 2024
8 checks passed
@JesseAbram JesseAbram deleted the fix-reshare-timing-issue branch December 18, 2024 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants