Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow offchain worker requests to all TSS nodes in entropy-tss test environment #1147

Merged
merged 17 commits into from
Dec 5, 2024

Conversation

ameba23
Copy link
Contributor

@ameba23 ameba23 commented Nov 5, 2024

This PR sets the TSS node endpoints associated with all chain nodes in our test setup for entropy-tss.

It means that http requests coming from the propagation pallet will be made to all four TSS nodes. Previously, only Alice got a request from the chain and the other 3 TSS server's had similar requests made by an http client in the test code itself.

This had the advantage that we could test the handling of bad inputs given to these endpoints. But a lot of care had to be taken to make the mock requests appear at the right time, and we have had problems with the tests occasionally failing, eg: #1119

I am hoping that we can make the tests more reliable and more closely emulate what will happen in production by doing this.

This PR also makes a change to the genesis config making it possible to start in a pre-jumpstarted state to test signing. For an explainantion see the (merged) sub-PR which introduces that change: #1162

A couple of related changes:

  • There is now only one set of pre-generated keyshares, as we know beforehand who the initial signer set will be as it is part of the chainspec.
  • I've used a different helper function when setting up a lot of our tests: testing_utils::helpers::spawn_tss_nodes_and_start_chain. This has the advantage that it sets up both the chain and TSS nodes in one function, making a bit less boilerplate code in the tests. But it has a gotcha, which is that because it lives in testing-utils and not entropy-tss, cfg(test) is false meaning the unsafe api does not get loaded. So it can only be used in the tests which don't need the unsafe API. I could move it into entropy-tss, but then we would not be able to use it in integration tests.
  • We had a test which did a DKG, then signs a message, then a reshare, then signs another message. The problem with this is that if doing a real DKG, the signer set is not known at genesis. Before, we had chosen the next_signers for the reshare to match what signers get chosen, but there is no guarantee that this will continue to work if the initial chain state changes. So now we only test resharing following a mock DKG where we know the initial signers beforehand.

@ameba23 ameba23 marked this pull request as draft November 5, 2024 11:46
@ameba23
Copy link
Contributor Author

ameba23 commented Nov 7, 2024

I am a bit stuck with this.

DKG is working fine but the reshare protocol has problems.

To be specific its the aux gen protocol when run from within the 'reshare' protocol:

The DKG protocol is composed of the key init, reshare and aux gen protocol - so we know all of these will work with this setup.

'Reshare' is composed of reshare and aux gen. It seems from looking at logs that the reshare runs, but only two of the three parties finalize successfully, then during aux gen a connection gets dropped.

I tried switching things around so the aux gen protocol gets run before the reshare, and it seems that aux gen is the problem as a connection gets dropped and it seems only one peer runs the protocol loop.

There are also issues with mocking the DKG using pre-generated keyshares which we do in the signing tests, as now when the jump start extrinsic is submitted, it starts a real DKG, and things get messy when the mock DKG confirmations are also submitted. But i think this should be not to hard to fix.

@ameba23
Copy link
Contributor Author

ameba23 commented Nov 11, 2024

I had a hunch that the problems i am seeing are related to problems introduced by my recent PR: #1136

I have now tried rolling back to before that commit and re-applying these changes and running the tests, but i am seeing exactly the same behavior as on this branch. So thats not the problem.

Here is the relevant part of the test logging output:

  2024-11-11T09:45:06.271344Z  INFO entropy_tss::signing_client::protocol_transport: Got ws connection, with message: SubscribeMessage { session_id: Reshare { verifying_key: [2, 24, 213, 93, 160, 174, 33, 208, 3, 29, 227, 227, 175, 53, 23, 229, 3, 220, 55, 68, 234, 118, 51, 129, 176, 175, 185, 68, 247, 93, 250, 222, 233], block_number: 21 }, public_key: 2cbc68e8bf0fbc1c28c282d1263fc9d29267dc12a1044fb730e8b65abc37524c (5D5Mw6Wb...), signature: 8259ea5bbf1308232f884f7eaf85d494c664e318ef21f1ee26b5575c0fa542789c2ca902e78fb74447016c549ff35b4cc67353360041b295da14ab0e00348781 }
    at crates/threshold-signature-server/src/signing_client/protocol_transport.rs:159

  2024-11-11T09:45:06.271898Z  INFO entropy_tss::signing_client::protocol_transport: Got ws connection, with message: SubscribeMessage { session_id: Reshare { verifying_key: [2, 24, 213, 93, 160, 174, 33, 208, 3, 29, 227, 227, 175, 53, 23, 229, 3, 220, 55, 68, 234, 118, 51, 129, 176, 175, 185, 68, 247, 93, 250, 222, 233], block_number: 21 }, public_key: 946140d3d5ddb980c74ffa1bb64353b5523d2d77cdf3dc617fd63de9d3b66338 (5FRFqLyd...), signature: 9aa6c7204d2135cadb94878b4117617a9336c4becd2ceb04903d570a7ad6fa38d223dec1ca25f4279a49191296a8cfeb386117be6b33e912e55fc1e22aaba288 }
    at crates/threshold-signature-server/src/signing_client/protocol_transport.rs:159

  2024-11-11T09:45:06.272088Z  INFO entropy_tss::signing_client::protocol_transport: Got ws connection, with message: SubscribeMessage { session_id: Reshare { verifying_key: [2, 24, 213, 93, 160, 174, 33, 208, 3, 29, 227, 227, 175, 53, 23, 229, 3, 220, 55, 68, 234, 118, 51, 129, 176, 175, 185, 68, 247, 93, 250, 222, 233], block_number: 21 }, public_key: 946140d3d5ddb980c74ffa1bb64353b5523d2d77cdf3dc617fd63de9d3b66338 (5FRFqLyd...), signature: b4a22cbf3746335a7a757824c63cbeb01a31a3fa428d4a860c47ae790257a118bdabe88fe31e4ad8a854d8d49b6770c7fd192a7a88813753dc116a16bd726b82 }
    at crates/threshold-signature-server/src/signing_client/protocol_transport.rs:159

  2024-11-11T09:45:06.272212Z  INFO entropy_protocol::execute_protocol: Executing reshare
    at crates/protocol/src/execute_protocol.rs:343
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: d4731f8e-5b02-4f75-8218-6a612164990a, uri: /validator/reshare, method: POST

  2024-11-11T09:45:06.272554Z  INFO entropy_protocol::execute_protocol: Executing reshare
    at crates/protocol/src/execute_protocol.rs:343
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: 829e963e-68a0-4c46-b628-a42aa0c1bba4, uri: /validator/reshare, method: POST

  2024-11-11T09:45:06.272964Z  INFO entropy_protocol::execute_protocol: Executing reshare
    at crates/protocol/src/execute_protocol.rs:343
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: badb679b-ba0f-4287-bc39-f65952c250f2, uri: /validator/reshare, method: POST

  2024-11-11T09:45:06.274708Z  INFO entropy_protocol::execute_protocol: Finished reshare
    at crates/protocol/src/execute_protocol.rs:361
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: 829e963e-68a0-4c46-b628-a42aa0c1bba4, uri: /validator/reshare, method: POST

  2024-11-11T09:45:06.274719Z  INFO entropy_protocol::execute_protocol: Starting aux gen
    at crates/protocol/src/execute_protocol.rs:365
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: 829e963e-68a0-4c46-b628-a42aa0c1bba4, uri: /validator/reshare, method: POST

  2024-11-11T09:45:10.401568Z  INFO entropy_protocol::execute_protocol: Finished reshare
    at crates/protocol/src/execute_protocol.rs:361
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: d4731f8e-5b02-4f75-8218-6a612164990a, uri: /validator/reshare, method: POST

  2024-11-11T09:45:10.401590Z  INFO entropy_protocol::execute_protocol: Starting aux gen
    at crates/protocol/src/execute_protocol.rs:365
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: d4731f8e-5b02-4f75-8218-6a612164990a, uri: /validator/reshare, method: POST

  2024-11-11T09:45:17.100457Z  WARN entropy_tss::signing_client::api: Websocket connection closed unexpectedly MessageAfterProtocolFinish
    at crates/threshold-signature-server/src/signing_client/api.rs:140

@ameba23
Copy link
Contributor Author

ameba23 commented Nov 21, 2024

Had a look at this problem with @HCastano thismorning. He made a suggestion to try running the DKG with an additional reshare immediately after the DKG finishes. To see if it is something to do with the test conditions for the reshare test rather than the reshare protocol itself.

I tried doing this, but got into a muddle with new holders and old holders, and was having synedrion's no entry found for key error.

But if i run an extra aux gen protocol at the end of the DKG protocol, it does work.

@ameba23 ameba23 self-assigned this Nov 26, 2024
ameba23 and others added 5 commits November 26, 2024 13:01
* Add pre-jumpstart option to chainspec and helper fn, update tests

* Dont include JumpStartDetails in staking pallet config, only ids of the jumpstarted signer set

* Update mock config for propagation and registry pallets

* Comments, tidy

* Add new field to attestation pallet mock

* Update client tests

* Fix issue with test helper fn

* Add pre-generated keyshares directly to the kvdb, rather than using the unsafe API

* Update integratiion test chain spec

* Fixes for pre-jumpstarted state in tests - mocking jump start no longer works

* Update reshare test

* Update client tests

* Typo

* Tidy test

* Tidy test

* Fix for deadline issue

* Rm unused variable

* Fix local storage for rotate network key endpoint

* Update test

* Clippy

* Clippy

* Fix remaining tests

* Fix tests, tidy

* Update generated keyshares

* Update script to generate keyshares

* Update script to generate keyshares readme

* Tidy

* Undo testnet reshare fix
@ameba23 ameba23 marked this pull request as ready for review November 26, 2024 16:27
Comment on lines +30 to +31
/// Note that since this function does not reside in entropy-tss, cfg(test) will be false when the
/// TSS nodes are set up, meaning the unsafe API will not be enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not introduce a testing feature and change the cfg clause to be #[cfg(any(feature = "testing", test))]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea

let server_info = query_chain(&api, &rpc, query, None).await.unwrap().unwrap();
signers.push(server_info);
let mut i = 0;
let new_signer_ids = loop {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let new_signer_ids = loop {
// Wait up to 2min for reshare to complete: check once every second if we have a new set of signers.
let new_signer_ids = loop {

maybe can be helpful for readers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, but doesn't subxt provide facilities to subscribe to changes to some storage item? Setup a callback that gets called when the event happens?

Copy link
Contributor Author

@ameba23 ameba23 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep you are right, i think this is the event we need to wait for:

SignersRotation(Vec<<T as pallet_session::Config>::ValidatorId>),

But im not sure how to wait for an event without it being in the context of submitting an extrinsic as we do here for example:

We can get an events client for a specific block or for the latest block with eg: EventsClient::at_latest. But if the event has not yet happened, i guess we would have to keep calling it each block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to resolve this conversation - github did it when i pressed 'commit suggestion'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was merely an idea sparked by a remote memory! Feel free to pursue (or not) in a separate PR. :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this it doesn't look like it's supported: paritytech/subxt#479

Copy link
Collaborator

@HCastano HCastano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks fairly reasonable, but I need another look before approving

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember there was a README or something as part of the script that generates these. Does that need to be updated in any way to account for these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crates/threshold-signature-server/src/helpers/tests.rs Outdated Show resolved Hide resolved
pallets/staking/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines +35 to +39
Vec<TestNodeProcess<EntropyConfig>>,
OnlineClient<EntropyConfig>,
LegacyRpcMethods<EntropyConfig>,
Vec<String>,
Vec<PartyId>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point we should probably move these into a struct, or at least typedef it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good idea. But could turn into quite a big change, because once having a struct we may as well pass it to test helper functions.

Generally, we could bundle api and rpc together into a struct and use it in all client functions.

The validator_ips and validator_ids we actually very rarely use because we know what are going to be. So i think they could be ditched from here and replaced with constants.

I would maybe do that in a followup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validator_ips and validator_ids we actually very rarely use because we know what are going to be. So i think they could be ditched from here and replaced with constants.

When I was trying to parallelize the tests I was using them because they would change between runs. Ofc if you open a PR with them hardcoded as they are now it won't be a problem, but it would be a bonus if we did provide some flexibility for them.

// This key is associated with a constant key share generation from DETERMINISTIC_KEY_SHARE_EVE
pub const EVE_VERIFYING_KEY: EncodedVerifyingKey = [
// This key is associated with a constant key share generation from DETERMINISTIC_KEY_SHARE_NETWORK
pub const PREGENERATED_NETWORK_VERIFYING_KEY: EncodedVerifyingKey = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just renamed for clarity

@@ -0,0 +1,103 @@
// Copyright (C) 2023 Entropy Cryptography Inc.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the test file which makes the diff hard to read. But basically this test no longer includes a reshare.

@ameba23
Copy link
Contributor Author

ameba23 commented Dec 3, 2024

Disappointingly, the reshare test failed on this branch in thismorning with a timeout waiting for reshare. I can't recreate the fail locally - the test always passes. There is something odd happening with rotate_network_keyshare being called an extra time, but that doesn't actually cause anything bad to happen.

@dvdplm
Copy link
Contributor

dvdplm commented Dec 3, 2024

I'd just like to give kudos for the excellent PR description here. Allows even people unfamiliar with the code to read and understand. 🥇

Copy link
Collaborator

@HCastano HCastano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 💪

Definitely not an easy PR to get through the finish line!

Comment on lines +35 to +39
Vec<TestNodeProcess<EntropyConfig>>,
OnlineClient<EntropyConfig>,
LegacyRpcMethods<EntropyConfig>,
Vec<String>,
Vec<PartyId>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validator_ips and validator_ids we actually very rarely use because we know what are going to be. So i think they could be ditched from here and replaced with constants.

When I was trying to parallelize the tests I was using them because they would change between runs. Ofc if you open a PR with them hardcoded as they are now it won't be a problem, but it would be a bonus if we did provide some flexibility for them.

@@ -130,25 +128,33 @@ pub async fn create_clients(
/// A way to specify which chainspec to use in testing
#[derive(Copy, Clone, PartialEq)]
pub enum ChainSpecType {
/// The development chainspec, which has 3 TSS nodes
Development,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there no more use for the --dev chainspec is tests anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is never used in tests - im not sure whether we might find some use for it in the future. I removed it because i thought from the comment that there wasn't much point in having a 3 node setup available because signing would not be possible. But i realise now that the comment is just outdated and that that is the chainspec we use for docker-compose etc.

Happy to put it back in if you think it would be useful.

let server_info = query_chain(&api, &rpc, query, None).await.unwrap().unwrap();
signers.push(server_info);
let mut i = 0;
let new_signer_ids = loop {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this it doesn't look like it's supported: paritytech/subxt#479

),
mock_signer_rotate: (true, mock_signer_rotate_data, vec![get_account_id_from_seed::<sr25519::Public>("Charlie//stash")]),
mock_signer_rotate: (true, mock_signer_rotate_data, vec![get_account_id_from_seed::<sr25519::Public>("Dave//stash")]),
jump_started_signers,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we've added a field to the chainspec we may need a breaking changes entry in the CHANGELOG.

FYI @entropyxyz/information-technology

@ameba23 ameba23 merged commit 9c6e97e into master Dec 5, 2024
8 checks passed
@ameba23 ameba23 deleted the peg/dont-mock-propagation branch December 5, 2024 12:35
ameba23 added a commit that referenced this pull request Dec 9, 2024
* master:
  Bump thiserror from 2.0.4 to 2.0.6 in the patch-dependencies group (#1206)
  Downgrade parity-scale-codec as version we currently use has been yanked (#1205)
  Bump clap from 4.5.22 to 4.5.23 in the patch-dependencies group (#1202)
  Add oracle data pointer check (#1194)
  Bump tokio from 1.41.1 to 1.42.0 (#1196)
  Allow offchain worker requests to all TSS nodes in `entropy-tss` test environment  (#1147)
  Bump the patch-dependencies group with 3 updates (#1195)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants