-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report unstable peers from TSS #1228
base: master
Are you sure you want to change the base?
Conversation
This is going to be the `offender` account ID which gets submitted on-chain.
This way we have access to the `api` which we can use to submit an offline report on-chain.
@@ -139,6 +139,8 @@ async fn handle_socket_result(socket: WebSocket, app_state: AppState) { | |||
if let Err(err) = handle_socket(socket, app_state).await { | |||
tracing::warn!("Websocket connection closed unexpectedly {:?}", err); | |||
// TODO here we should inform the chain that signing failed | |||
// | |||
// TODO (Nando): Report error up here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the answer here is no since this would be bubbled up when we try and make a (failed) ws/
connection anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't totally thought through but seems reasonable. The only errors we could catch here but not elsewhere are relating to incoming connections which do not relate to an existing listener. Eg: connections from random people or with a bad 'subscribe' message. I think we can just ignore these.
As for (2):
|
Probably the most common problem will be that a TSS server is just not available for whatever reason. I think its worth noting that the relayer node will immediately know this, because they will be unable to make the
Really here they should tell the other TSS servers that they can abandon the signing protocol. There is no point in waiting for this one to connect, as they have failed to receive the signature request. Otherwise, if the unresponsive node is due to make an outgoing ws connection, its hard to say how long we should wait for them before reporting unresponsiveness. Are they offline or just being slow? |
This is the WIP follow up to #1215.
In this PR the plan is to have TSS servers use the reporting extrinsic introduced in
#1215 whenever they experience some sort of "offline" or connection failure from a peer
during signing.
There are two main areas of work for this:
For (1), I've just been going through the signing flow for now, and basically annotating
the one function where I think all the important connections are made.
For (2), I'm not really sure how to approach this yet. So far I've just been testing this
manually using hacked Docker images with panics at points where connections should be
made.
@ameba23 if you get a chance to take a peek at this I'd appreciate some ideas on (1) and
(2) from you.