-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug]: lnd (master 15d2ff0) is stalling connections at startup using externalhosts
#7924
Comments
Are you able to repro this without Tor? Just tried to repro and seems fine. Re Tor, seems related to this issue perhaps #7917 (comment)? Might be an intermittent Tor issue. I can try it out on my hybrid tor node once we get the next rc out (rc2). |
What tor flags are you running with? |
Yes, I ran into this with and without Tor before. Just ran new tests today: externalhosts + no Tor:
LND stalls after 15 minutes, it's caught in state externalhosts + Tor:
LND works fine now. I think for this test case, it could have been a Tor glitch yesterday. So the only problem left seems to be: running clearnet-only with |
Next time things stall, do you think you can grab a profile? https://github.com/lightningnetwork/lnd/blob/master/docs/debugging_lnd.md#capturing-pprof-data-with-lnd I ran with tor hybrid and external host and wasn't able to repro. |
This case is not a problem anymore (thanks to Elle's fix). |
@blckbx can you get a goroutine dump from that? https://go.dev/blog/pprof Then we can see what is actually blocked. |
There is a goroutine dump here: https://github.com/lightningnetwork/lnd/files/12462101/pprof.txt, though it's in a bit of a different format from what a goroutine dump normally looks like. |
Sorry I'm new to this tool. This I copied from the curl command mentioned in debugging guide: What exact command do I need to run to get the output you can work with? |
No this is fine, thanks. |
In this commit, we attempt to fix circular waiting scenario introduced inadvertently when [fixing a race condition scenario](lightningnetwork#7856). With that PR, we added a new channel that would block `Disconnect`, and `WaitForDisconnect` to ensure that only until the `Start` method has finished would those calls be allowed to succeed. The issue is that if the server is trying to disconnect a peer due to a concurrent connection, but `Start` is blocked on `maybeSendNodeAnn`, which then wants to grab the main server mutex, then `Start` can never exit, which causes `startReady` to never be closed, which then causes the server to be blocked. This PR attempts to fix the issue by calling `maybeSendNodeAnn` in a goroutine, so it can grab the server mutex and not block the `Start` method. Fixes lightningnetwork#7924 Fixes lightningnetwork#7928 Fixes lightningnetwork#7866
We have a candidate fix here: #7928 |
Pull request #7938 fixed all stalling issues for me. |
…7938) In this commit, we attempt to fix circular waiting scenario introduced inadvertently when [fixing a race condition scenario](#7856). With that PR, we added a new channel that would block `Disconnect`, and `WaitForDisconnect` to ensure that only until the `Start` method has finished would those calls be allowed to succeed. The issue is that if the server is trying to disconnect a peer due to a concurrent connection, but `Start` is blocked on `maybeSendNodeAnn`, which then wants to grab the main server mutex, then `Start` can never exit, which causes `startReady` to never be closed, which then causes the server to be blocked. This PR attempts to fix the issue by calling `maybeSendNodeAnn` in a goroutine, so it can grab the server mutex and not block the `Start` method. Fixes #7924 Fixes #7928 Fixes #7866
Background
Follow-up issue of #7921
Good news first:
externalip
is working now (with and without Tor).Bad news: using
externalhosts
LND still stalls connections and lnd becomes unresponsive withand withoutTor.lncli
commands are getting stuck and rpc calls are running into timeouts, finally getting cancelled.Startup log show that lnd is resolving the DNS correctly:
Your environment
lnd
: v0.17.0-beta.rc1-g15d2ff0c4uname -a
on *Nix): Linux node 5.15.0-79-generic 86-Ubuntu SMP Mon Jul 10 16:07:21 UTC 2023 x86_64 x86_64 x86_64 GNU/Linuxbtcd
,bitcoind
, or other backend: bitcoind v25.0Steps to reproduce
lnd.conf
: set a DNS address for externalhostsExpected behaviour
LND is starting up without issues and responding to calls.
Actual behaviour
LND starts up with
externalhosts
set but comes to halt and stops responding tolncli
commands.I can provide a full log for externalhosts and externalip scenario privately, if required.
The text was updated successfully, but these errors were encountered: