-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexplainable clock drift in LND causing issues #5340
Comments
Which Go version are you running? Possibly it's a runtime or kernel issue as we don't do any thing custom w/ the runtime's time source (don't think it's even possible). Are you monitoring the CPU/resource utilization of the node? During this "time travel" instance, do you see the node continue to receive new blocks? |
I can provide logs for the other side (in addition to those mentioned in #5324). |
From this, AFAICT, either there's a major network connectivity issue (are either nodes on Tor?) or one of the nodes has simply "frozen". If that happens, then it would make sense that force closes occur: your node is holding onto HTLCs and is no longer live. |
bitcoind's debug.log shows the following
It appears those times are correct, and that lnd didn't process them until after 21:38 (I'm assuming because once it had the time fluctuation, it thought bitcoind's block notifications were coming from the future?) I don't do any sort of utilization monitoring that logs locally. |
Perhaps we should revive this old idea: #2992 If the network link is truly dead, or the node not live, then they won't respond to the ping, so we can filter them out and not forward. In the case above, do you see your node responding to pings from other nodes? It would need to be on the debug logging level. |
I think to the outside world, my node appeared frozen because it seemed to stop doing all of the things it thought it should as soon as the clock changed momentarily. I think there are two issues here: 1) why did the clock change (and seemingly only for lnd) and 2) how does lnd react to clock changes that would cause it to wait an hour to time out all the things that should have timed out as soon as the clock change happened? |
Hard to say without any sort of instrumentation of the process and also other processes running at that time. Don't think there's really much we can do here without something to reproduce the incident, and also more telemetry on the node itself. |
Based on those ping timeouts in the bitcoind log, and all the sends on broken pipes, it looks like it was also affected in some manner. I don't think lnd nor bitcoind are related to the root cause here, instead something happened to the machine which affected those processes. |
The ping timeouts for bitcoind are quite normal, this just means that some remote peer "died" without disconnecting properly. I had 3304 of such entries for May 2021 on my node, without any noticable issue. |
Not that it helps my case much, but those messages are common in bitcoind debug logs all the time both on this node and others I run at other locations. I'd be happy to find out the issue isn't lnd, but nothing I've seen in any other logs on the machine show the clock changing or any odd behavior. |
Closing due to inactivity, not clear if this is a hardware issue, or something in |
Background
On May 20 UTC, my node had multiple channel closures due to what I originally thought (from the logs) was because of an ipv6 connectivity error, but upon poring over more logs, it appears to be a mysterious clock issue that only affected LND. For one log entry at 20:38, the clock skips ahead an hour to 21:38, then goes back an hour like nothing happened and continues going about its business. An hour later, when 21:38 actually occurs, all hell breaks loose and multiple channels have been force-closed remotely.
Related issue: #5324 which was one of the remote nodes that closed a channel with my node.
lnd log (timestamps are in local CDT which I believe is UTC-5
I've heavily cut out/redacted anything I didn't think was applicable. I'm @max on the lnd slack, and can provide unredacted logs there. Please let me know if I can provide any other helpful logs here as well.
The machine runs ntpd, but as far as I can tell the clock was not changed during this time. Every other log on the system looks normal and I don't see any clock changes or suspect behavior, so I am at a total loss as to what happened. bitcoind was running on the same machine and comparing the timestamps for blocks around this height seem to match as well. The line with 2021-05-19 21:38:12.609 in the lnd log is literally the only thing on the system I can find that shows anything weird happened with the clock.
Your environment
Steps to reproduce
Unknown
Expected behaviour
Time travel is impossible
Actual behaviour
We go back to the future
The text was updated successfully, but these errors were encountered: