-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug]: force close for unclear reason #7180
Comments
This states the close reason:
The HTLC was about to timeout. If you send an HTLC and the other peer never resolves or times out, then we need to go on chain to sweep it. If we attempted to fail it off chain, but the peer never responded (or the connection died, or the tor connection stalled, etc, etc), then we have no option but to go on chain to resolve the HTLC.
See #1226. This proposes that we start to take into account the expected gain when we decide to go on chain or not. |
Yeah, I just don't understand this flow clearly enough. Who should cancel the HTLC off chain (sender or receiver) and when? What happens if the first attempt failed for some reason, like you said if the connection stalled or was dropped? Do we retry that? And most importantly, was that caused by high fees that prevented an upstream force close to confirm in time and caused this domino effect? If that's the case, should the node operator watch their channels and manually bump the fee? I think lnd can do it automatically if there's a risk of losing another channel. |
What I often see is dead tor connection and stuck htlcs on it. When I see this restarting LND usually (90% or more) re-establishes the connection and the HTLCs clear. It seems LND should be able to re-establish these connections if that is all it takes (automatically, without my manual intervention) to save the channel. |
Yeah there's an old idea that was never fully implemented to send a ping over a connection before we send an HTLC. If we get a pong back (we should immediately), then we'd actually use the channel. If not, we'd treat it like the channel was actually offline. This would ensure we never try to use a stale connection (due to tor, mobile roaming, etc, etc). This implementation here is pretty simple, so I think we should brush this off again.
|
If you have an incoming HTLC, then you should be the one that cancels it. However if that has a corresponding outgoing HTLC (it was a forward), then the remote party needs to cancel it. If the remote party doesn't cancel it (stale connection that wasn't detected, or peer offline), then you (lnd) needs to actually go on chain to cancel it. In your case, we went on chain, but things didn't confirm in time (default 40 block CLTV delta, can be raised on the command line) since the mempool was jam packed. As a result, the incoming time lock also expired, the peer that sent us the HTLC needed to go on chain. This peer will then cancel back off chain once it resolves the HTLC. So in summary, everything worked as expected, but things took too long to confirm. We have some basic deadline awareness, but it'll only initially target with a higher confirmation target. The missing link here is to dynamically fee bump as it gets closer to the deadline. We have a lot of research and design for stuff like this, but it hasn't all been implemented yet. One thing that can prevent this in the future, is for a user to manually increase their CLTV delta when the mempool gets "full". In the future, we'll also start to do this automatically.
Yes an operator can do that, and yeah ideally Feel free to close this issue if the above statement answers your lingering questions @rkfg. |
Yeah, I guess that all explains it. For now we need to be extra cautious during high fee times, hopefully these ideas will be implemented soon! Thank you. |
Background
Channel with lnmarkets was force closed, then cascaded to the downstream channel. For the sake of clarity I'll call these nodes like this:
A
(my peer) —B
(me) —C
(lnmarkets). There was a stuck HTLC of 10 sats fromA
toC
going throughB
, it timed out. For some reason I can't understand it wasn't failed off-chain but instead went on-chain, I can confirm the lnmarkets node (C
) was online all the time and I don't see it disconnecting before the FC. The message says:However, due to mempool conditions the fee in that tx (10 sat/vB) wasn't enough to be confirmed. I'm not sure if I'm correct but this caused the channel this HTLC came from (
A
—B
) to also be force closed by peerA
in this case (who was also online, we were tracking this issue in real time). As a result I lost two channels because of one stupid 10 sat HTLC that can't even be represented on chain anyway. What's even more weird is that my peer (A
) reported seeing that FC (it was also unconfirmed due to the same reason) while to me (B
) the channel was active though unusable, I tried to rebalance through it and it failed at hop 0. I tried to restart lnd and also manually reconnect the peer, the channel was still shown as active.Is that true that until the force close tx is confirmed on chain the corresponding HTLC can't be failed off-chain? If that's the case it can easily cause a chain of FCs when the minimum fee is above 10 (the default maximum for anchor-type channels) and operators don't babysit their channels all the time to manually bump the fee through anchors and CPFP. If it's not then there's a bug that prevented lnd from cancelling the incoming HTLC when the outgoing channel FC is still not confirmed. I suppose the same might have happened to
C
, the outgoing channel for that HTLC on their node was offline, it timed out and was FCed but couldn't confirm in 40 blocks (our hop timeout) so my nodeB
couldn't cancel it off-chain and had to FC as well. My peerA
, however, said that he doesn't see that 10-sat HTLC anywhere among his channels so it must've been cancelled off-chain.There are no messages in the logs regarding HTLC failure errors, at least not with the default INFO-level settings. Maybe there should be.
Your environment
lnd
0.15.4uname -a
on *Nix) Raspbian arm64btcd
,bitcoind
, or other backendbitcoind 23.99
The text was updated successfully, but these errors were encountered: