-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow offchain worker requests to all TSS nodes in entropy-tss
test environment
#1147
Conversation
…e mock OCW requests
I am a bit stuck with this. DKG is working fine but the reshare protocol has problems. To be specific its the aux gen protocol when run from within the 'reshare' protocol: The DKG protocol is composed of the key init, reshare and aux gen protocol - so we know all of these will work with this setup. 'Reshare' is composed of reshare and aux gen. It seems from looking at logs that the reshare runs, but only two of the three parties finalize successfully, then during aux gen a connection gets dropped. I tried switching things around so the aux gen protocol gets run before the reshare, and it seems that aux gen is the problem as a connection gets dropped and it seems only one peer runs the protocol loop. There are also issues with mocking the DKG using pre-generated keyshares which we do in the signing tests, as now when the jump start extrinsic is submitted, it starts a real DKG, and things get messy when the mock DKG confirmations are also submitted. But i think this should be not to hard to fix. |
I had a hunch that the problems i am seeing are related to problems introduced by my recent PR: #1136 I have now tried rolling back to before that commit and re-applying these changes and running the tests, but i am seeing exactly the same behavior as on this branch. So thats not the problem. Here is the relevant part of the test logging output:
|
Had a look at this problem with @HCastano thismorning. He made a suggestion to try running the DKG with an additional reshare immediately after the DKG finishes. To see if it is something to do with the test conditions for the reshare test rather than the reshare protocol itself. I tried doing this, but got into a muddle with new holders and old holders, and was having synedrion's But if i run an extra aux gen protocol at the end of the DKG protocol, it does work. |
* Add pre-jumpstart option to chainspec and helper fn, update tests * Dont include JumpStartDetails in staking pallet config, only ids of the jumpstarted signer set * Update mock config for propagation and registry pallets * Comments, tidy * Add new field to attestation pallet mock * Update client tests * Fix issue with test helper fn * Add pre-generated keyshares directly to the kvdb, rather than using the unsafe API * Update integratiion test chain spec * Fixes for pre-jumpstarted state in tests - mocking jump start no longer works * Update reshare test * Update client tests * Typo * Tidy test * Tidy test * Fix for deadline issue * Rm unused variable * Fix local storage for rotate network key endpoint * Update test * Clippy * Clippy * Fix remaining tests * Fix tests, tidy * Update generated keyshares * Update script to generate keyshares * Update script to generate keyshares readme * Tidy * Undo testnet reshare fix
/// Note that since this function does not reside in entropy-tss, cfg(test) will be false when the | ||
/// TSS nodes are set up, meaning the unsafe API will not be enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not introduce a testing
feature and change the cfg clause to be #[cfg(any(feature = "testing", test))]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good idea
let server_info = query_chain(&api, &rpc, query, None).await.unwrap().unwrap(); | ||
signers.push(server_info); | ||
let mut i = 0; | ||
let new_signer_ids = loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let new_signer_ids = loop { | |
// Wait up to 2min for reshare to complete: check once every second if we have a new set of signers. | |
let new_signer_ids = loop { |
maybe can be helpful for readers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, but doesn't subxt
provide facilities to subscribe to changes to some storage item? Setup a callback that gets called when the event happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep you are right, i think this is the event we need to wait for:
entropy-core/pallets/staking/src/lib.rs
Line 374 in fddd9e4
SignersRotation(Vec<<T as pallet_session::Config>::ValidatorId>), |
But im not sure how to wait for an event without it being in the context of submitting an extrinsic as we do here for example:
entropy-core/crates/client/src/user.rs
Line 167 in fddd9e4
let result = |
We can get an events client for a specific block or for the latest block with eg: EventsClient::at_latest
. But if the event has not yet happened, i guess we would have to keep calling it each block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean to resolve this conversation - github did it when i pressed 'commit suggestion'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was merely an idea sparked by a remote memory! Feel free to pursue (or not) in a separate PR. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this it doesn't look like it's supported: paritytech/subxt#479
Co-authored-by: David <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks fairly reasonable, but I need another look before approving
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember there was a README
or something as part of the script that generates these. Does that need to be updated in any way to account for these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vec<TestNodeProcess<EntropyConfig>>, | ||
OnlineClient<EntropyConfig>, | ||
LegacyRpcMethods<EntropyConfig>, | ||
Vec<String>, | ||
Vec<PartyId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point we should probably move these into a struct, or at least typedef it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good idea. But could turn into quite a big change, because once having a struct we may as well pass it to test helper functions.
Generally, we could bundle api and rpc together into a struct and use it in all client functions.
The validator_ips
and validator_ids
we actually very rarely use because we know what are going to be. So i think they could be ditched from here and replaced with constants.
I would maybe do that in a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validator_ips and validator_ids we actually very rarely use because we know what are going to be. So i think they could be ditched from here and replaced with constants.
When I was trying to parallelize the tests I was using them because they would change between runs. Ofc if you open a PR with them hardcoded as they are now it won't be a problem, but it would be a bonus if we did provide some flexibility for them.
// This key is associated with a constant key share generation from DETERMINISTIC_KEY_SHARE_EVE | ||
pub const EVE_VERIFYING_KEY: EncodedVerifyingKey = [ | ||
// This key is associated with a constant key share generation from DETERMINISTIC_KEY_SHARE_NETWORK | ||
pub const PREGENERATED_NETWORK_VERIFYING_KEY: EncodedVerifyingKey = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just renamed for clarity
@@ -0,0 +1,103 @@ | |||
// Copyright (C) 2023 Entropy Cryptography Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the test file which makes the diff hard to read. But basically this test no longer includes a reshare.
Disappointingly, the reshare test failed on this branch in thismorning with a timeout waiting for reshare. I can't recreate the fail locally - the test always passes. There is something odd happening with |
I'd just like to give kudos for the excellent PR description here. Allows even people unfamiliar with the code to read and understand. 🥇 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work 💪
Definitely not an easy PR to get through the finish line!
Vec<TestNodeProcess<EntropyConfig>>, | ||
OnlineClient<EntropyConfig>, | ||
LegacyRpcMethods<EntropyConfig>, | ||
Vec<String>, | ||
Vec<PartyId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validator_ips and validator_ids we actually very rarely use because we know what are going to be. So i think they could be ditched from here and replaced with constants.
When I was trying to parallelize the tests I was using them because they would change between runs. Ofc if you open a PR with them hardcoded as they are now it won't be a problem, but it would be a bonus if we did provide some flexibility for them.
@@ -130,25 +128,33 @@ pub async fn create_clients( | |||
/// A way to specify which chainspec to use in testing | |||
#[derive(Copy, Clone, PartialEq)] | |||
pub enum ChainSpecType { | |||
/// The development chainspec, which has 3 TSS nodes | |||
Development, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no more use for the --dev
chainspec is tests anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is never used in tests - im not sure whether we might find some use for it in the future. I removed it because i thought from the comment that there wasn't much point in having a 3 node setup available because signing would not be possible. But i realise now that the comment is just outdated and that that is the chainspec we use for docker-compose etc.
Happy to put it back in if you think it would be useful.
let server_info = query_chain(&api, &rpc, query, None).await.unwrap().unwrap(); | ||
signers.push(server_info); | ||
let mut i = 0; | ||
let new_signer_ids = loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this it doesn't look like it's supported: paritytech/subxt#479
), | ||
mock_signer_rotate: (true, mock_signer_rotate_data, vec![get_account_id_from_seed::<sr25519::Public>("Charlie//stash")]), | ||
mock_signer_rotate: (true, mock_signer_rotate_data, vec![get_account_id_from_seed::<sr25519::Public>("Dave//stash")]), | ||
jump_started_signers, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we've added a field to the chainspec we may need a breaking changes entry in the CHANGELOG
.
FYI @entropyxyz/information-technology
* master: Bump thiserror from 2.0.4 to 2.0.6 in the patch-dependencies group (#1206) Downgrade parity-scale-codec as version we currently use has been yanked (#1205) Bump clap from 4.5.22 to 4.5.23 in the patch-dependencies group (#1202) Add oracle data pointer check (#1194) Bump tokio from 1.41.1 to 1.42.0 (#1196) Allow offchain worker requests to all TSS nodes in `entropy-tss` test environment (#1147) Bump the patch-dependencies group with 3 updates (#1195)
This PR sets the TSS node endpoints associated with all chain nodes in our test setup for
entropy-tss
.It means that http requests coming from the propagation pallet will be made to all four TSS nodes. Previously, only Alice got a request from the chain and the other 3 TSS server's had similar requests made by an http client in the test code itself.
This had the advantage that we could test the handling of bad inputs given to these endpoints. But a lot of care had to be taken to make the mock requests appear at the right time, and we have had problems with the tests occasionally failing, eg: #1119
I am hoping that we can make the tests more reliable and more closely emulate what will happen in production by doing this.
This PR also makes a change to the genesis config making it possible to start in a pre-jumpstarted state to test signing. For an explainantion see the (merged) sub-PR which introduces that change: #1162
A couple of related changes:
testing_utils::helpers::spawn_tss_nodes_and_start_chain
. This has the advantage that it sets up both the chain and TSS nodes in one function, making a bit less boilerplate code in the tests. But it has a gotcha, which is that because it lives intesting-utils
and notentropy-tss
,cfg(test)
is false meaning the unsafe api does not get loaded. So it can only be used in the tests which don't need the unsafe API. I could move it intoentropy-tss
, but then we would not be able to use it in integration tests.next_signers
for the reshare to match what signers get chosen, but there is no guarantee that this will continue to work if the initial chain state changes. So now we only test resharing following a mock DKG where we know the initial signers beforehand.