Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when requesting signatures on testnet #1260

Open
frankiebee opened this issue Jan 23, 2025 · 8 comments
Open

Errors when requesting signatures on testnet #1260

frankiebee opened this issue Jan 23, 2025 · 8 comments
Labels

Comments

@frankiebee
Copy link
Contributor

frankiebee commented Jan 23, 2025

We are seeing this happen more often then not in the js cli

[
  {"Err":"Kv error: Recv Error: channel closed"},
  {"Err":"Subscribe message rejected: NoListener(\"no listener\")"}
]

i tried this:

for i in $(seq 1 10);
do
    echo "attempting to sign"
    entropy sign "hello world $1"
done

related to this issue in the sdk

@frankiebee
Copy link
Contributor Author

frankiebee commented Jan 23, 2025

@ameba23 correction: i am finding that it is reproducible in a dev environment at a loop of 20. but it's not consistent

side note are their docs for this error: Oneshot timeout error

[{"Err":"Oneshot timeout error: channel closed"},{"Err":"Subscribe message rejected: Decryption(\"Public key does not match any of those expected for this protocol session\")"}]

@ameba23
Copy link
Contributor

ameba23 commented Jan 23, 2025

For context, this error is coming from the response body of the /relay_tx HTTP route when requesting a signature, which gives us a vector of the results of the relayer hitting /sign_tx for each of the signers.

So one TSS node is responding with Subscribe message rejected: NoListener which i take to mean that it got an attempted incoming connection from a TSS node which it was not expecting. No idea why that is happening but it could be related to the error the other node is having.

The other error comes from the key-value db getting a tokio::sync::oneshot::RecvError - connection closed. This means that the sending half of the response channel to the kvdb command handler has been dropped. I had a look at the handler and to me it looks like the error case is not being handled properly:

Err(err) => tracing::error!("Failed to handle database query with: {:?}", err),

In the error case we should still be sending the result on the channel, not just logging and dropping the sender.

To be clear, this is not the cause of the problem, but its a reason why we cannot see the underlying db error without reading the logs, and it for sure needs fixing.

I have not looked at the logs from the testnet TSS nodes, and i have not yet tried to reproduce this using the rust test-cli.

@ameba23 ameba23 added the Bug label Jan 23, 2025
@HCastano HCastano changed the title getting an error when trying to sign on test net Errors when requesting signatures on testnet Jan 23, 2025
@HCastano HCastano moved this from 📋 Backlog to 🌝 Soon in Entropy Core Jan 23, 2025
@frankiebee
Copy link
Contributor Author

frankiebee commented Jan 24, 2025

I am also seeing this happen quite a bit now too:

[{"Err":"User Error: Kv error: Recv Error: channel closed"},
{"Ok":["OM6uWOmEbr2++85oarrq56uepXEzz4WpiQx66aFAjDZ8EgjgwvBfSVZGx/CIGCBn7nJjZnAS+h8Da6GHQaoORwE=","f8c973f7f4fa9287c6d1328439c1f3c539e51d97e9708fd32e2b51b0e5c7a44ca51e0f31bad339608632624823d63f6a54e96d98c784c517dcaf89655780138b"]}]

should i be concerned?

this is in are 4 node dev network so that is a valid signature right? or am i misunderstanding something?

@ameba23
Copy link
Contributor

ameba23 commented Jan 24, 2025

I can recreate this error on testnet using entropy-test-cli.

When i attempt to sign three messages one after the other, one time i get a signature and the other two times i get a channel closed error:

turnip ~/r/s/e/e/c/kvdb (master)$ entropy-test-cli -c wss://testnet.entropy.xyz sign 02895d30025bb6a4e301f62c464a25318cf7d40f237768bd346edd26048e833ba6 "sldkjfsdlkjfslkjsdf"
User account for current call: 5GrwvaEF5zXb26Fz9rcQpDWS57CtERHpNehXCPcNoHGKutQY
Success: Message signed: RecoverableSignature { signature: ecdsa::Signature<Secp256k1>(B0A49DFBDB55957A886086DA130BC06886F63DA5125A94FEB3E7837FF08782DB233EA7B7CD677A1F9492F4488B996EBB8856E4CCDCC8AF9A27332542BFA9472E), recovery_id: RecoveryId(0) }
That took 14.283989826s
turnip ~/r/s/e/e/c/kvdb (master)$ entropy-test-cli -c wss://testnet.entropy.xyz sign 02895d30025bb6a4e301f62c464a25318cf7d40f237768bd346edd26048e833ba6 "sldkjfsdlkjfslkjsdf"
User account for current call: 5GrwvaEF5zXb26Fz9rcQpDWS57CtERHpNehXCPcNoHGKutQY
Failed!
Error: Signing failed: Kv error: Recv Error: channel closed
turnip ~/r/s/e/e/c/kvdb (master)[1] $ entropy-test-cli -c wss://testnet.entropy.xyz sign 02895d30025bb6a4e301f62c464a25318cf7d40f237768bd346edd26048e833ba6 "sldkjfsdlkjfslkjsdf"
User account for current call: 5GrwvaEF5zXb26Fz9rcQpDWS57CtERHpNehXCPcNoHGKutQY
Failed!
Error: Signing failed: Kv error: Recv Error: channel closed

Here are some logs (from all TSS nodes) pasted from grafana - TLDR a subscribe message is getting rejected. But im not sure why.

Im pretty sure this was not the case when we first deployed testnet v0.3.0, as i signed a bunch of messages and never got any errors.

@ameba23
Copy link
Contributor

ameba23 commented Jan 24, 2025

I cannot re-create this with bug the docker-compose setup on head of master. But its hard to say if thats because of changes in master, or that the network has just been started, or something particular to the docker deployment.

Next step would be to try with the docker-compose setup at release/v0.3.0

@frankiebee
Copy link
Contributor Author

did you try looping? if so how many times @ameba23

@ameba23
Copy link
Contributor

ameba23 commented Jan 24, 2025

did you try looping? if so how many times @ameba23

Like 10 times.

Having looked at this a bit more, i think this is caused by a mismatch between which nodes the chain thinks the current signers are, and which nodes actually hold keyshares.

The reason i think this is:

  • Its happening on testnet, where we pushed a hot fix related to resharing. And it was not happening when we first deployed testnet.
  • It happens around two thirds of the time. Which would make sense since we choose 2 of 3 signers randomly.
  • We are getting an error from the key-value db. I think this is where it fails to get the keyshare.

The one thing that speaks against this is that @frankiebee has seen a similar problem in a dev environment (comment above: #1260 (comment) )

If this is the problem, it should be (hopefully) be fixed in the next deployment resharing has been fixed since the last release.

@frankiebee
Copy link
Contributor Author

see this for master reproduction: entropyxyz/sdk#461

side notes
tss nodes get rate limited

error still not consistent
is in a wip state

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 🌝 Soon
Development

No branches or pull requests

2 participants