Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: LND Suddenly lots of force closes, Error while broadcasting transaction #7959

Open
callebtc opened this issue Sep 5, 2023 · 18 comments
Labels
bug Unintended code behaviour force closes P1 MUST be fixed or reviewed

Comments

@callebtc
Copy link

callebtc commented Sep 5, 2023

Background

Nothing in particular happened. Updated to 0.17.0-rc1 shortly it came out. In the last few days, my node has seen a massive amount of force-closes.

Your environment

  • version of lnd: LND 0.17.0-rc1
  • which operating system (uname -a on *Nix): Linux 5.15.0-46-SMP x86_64 x86_64 x86_64 GNU/Linux
  • backend: bitcoind

Steps to reproduce

Unknown.

Logs:

During the FC:

[INF] CNCT: ChannelArbitrator(<id>:31): starting state=StateDefault, trigger=chainTrigger, triggerHeight=<block>
[INF] CNCT: ChannelArbitrator(<id>:31): fail dangling htlc=<id> from local/remote commitments diff
[INF] CNCT: ChannelArbitrator(<id>:31): force closing chan
[INF] CNCT: Broadcasting force close transaction <tx_id>, ChannelPoint(<id>:31): (*wire.MsgTx)(0xc004525980)({
Version: (int32) 2,
TxIn: ([]*wire.TxIn) (len=1 cap=1) {
...

Maybe related:

[ERR] WTCL: (anchor) SessionQueue(<id>) failed to ack update: unable to ack seqnum=99: channel not registered
[ERR] WTCL: (anchor) SessionQueue(<id>) unable to send state update: unable to ack seqnum=99: channel not registered
...
[WRN] LTND: Error while broadcasting transaction: mempool fee not met: -26: mempool min fee not met, 550 < 4208
(this error repeats many times)
....
[ERR] BTCN: Unable to rebroadcast transaction <txid>: double spend: -25: bad-txns-inputs-missingorspent

More errors (update: 11.09.2023)

[INF] PEER: Peer(<id1>): unable to read message from peer: read next header: EOF
[INF] PEER: Peer(<id1>): disconnecting 02c4a7390c5c0e28eff5d8034a49d724900b196948f5723fecd887512b2f0020f3@82.164.75.19:5750, reason: read handler closed
[WRN] HSWC: ChannelLink(<id2>:2): error when syncing channel states: first message sent to sync should be ChannelReestablish, instead received: *lnwire.Error
[ERR] HSWC: ChannelLink(<id2>:2): failing link: unable to synchronize channel states: first message sent to sync should be ChannelReestablish, instead received: *lnwire.Error with error: unable to resume channel, recovery required
[INF] HSWC: ChannelLink(<id2>:2): exited
[INF] NTFN: Cancelling epoch notification, epoch_id=75
[ERR] WTCL: (anchor) SessionQueue(<id1>) unable to send state update: unable to ack seqnum=176: channel not registered
[ERR] WTCL: (anchor) SessionQueue(<id2>) failed to ack update: unable to ack seqnum=155: channel not registered
[ERR] WTCL: (anchor) SessionQueue(<id2>) unable to send state update: unable to ack seqnum=155: channel not registered
[ERR] WTCL: (anchor) SessionQueue(<id3>) failed to ack update: unable to ack seqnum=600: channel not registered
[ERR] WTCL: (anchor) SessionQueue(<id3>) unable to send state update: unable to ack seqnum=600: channel not registered
[ERR] BTCN: Unable to rebroadcast transaction <id>: unable to replace transaction: -26: insufficient fee, rejecting replacement <id>; new feerate 0.00001011 BTC/kvB <= old feerate 0.00004684 BTC/kvB
[ERR] BTCN: Unable to rebroadcast transaction <id>: unable to replace transaction: -26: insufficient fee, rejecting replacement <id>; new feerate 0.00001011 BTC/kvB <= old feerate 0.00004508 BTC/kvB
[ERR] BTCN: Unable to rebroadcast transaction <id>: unable to replace transaction: -26: insufficient fee, rejecting replacement <id>; new feerate 0.00001011 BTC/kvB <= old feerate 0.00004684 BTC/kvB
@callebtc callebtc added bug Unintended code behaviour needs triage labels Sep 5, 2023
@ziggie1984
Copy link
Collaborator

ziggie1984 commented Sep 5, 2023

Commenting on a part which is most likely not the reason for the Force Closes !

Analyzing the logs I think I found the improvement we could make to prevent a lot of these lines:

[WRN] LTND: Error while broadcasting transaction: mempool fee not met: -26: mempool min fee not met, 550 < 4208
(this error repeats many times)
....
[ERR] BTCN: Unable to rebroadcast transaction <txid>: double spend:

The anchor sweeps fail to be broadcasted because of the min-mempool fee, and this does not stop because we never run in the exhausted case because we never count up: https://github.com/lightningnetwork/lnd/blob/master/sweep/sweeper.go#L1356 and in the rebroadcaster we only check for different txid, which seems to be always the case sweeps are bundled. I think we should also count up the attempt if we fail to broadcast the transaction for specific errors.
Or get rid of the anchor sweeps all together ?

Regarding the Force-Closes, there might be a problem between the peer connection because the dangling HTLC cannot be resolved hence no stable connection can be reestablished even during 80 blocks (default cltv delta)

@Roasbeef
Copy link
Member

Roasbeef commented Sep 5, 2023

Those logs look unrelated, if you look at the force closes, do they have HTLCs on them? The log areas you want to look at are HSWC, and CNCT.

@daywalker90
Copy link

For me the node just appeared offline and a htlc in our channel expired.

@guggero
Copy link
Collaborator

guggero commented Sep 5, 2023

This could've been the lock problem that we fixed in RC2.

@ziggie1984
Copy link
Collaborator

ziggie1984 commented Sep 5, 2023

if you look at the force closes, do they have HTLCs on them?

yeah there was an HTLC on the remote commitment (fail dangling htlc=<id> from local/remote commitments diff) which forced the channel to close.

@blckbx
Copy link

blckbx commented Sep 6, 2023

For me the node just appeared offline and a htlc in our channel expired.

Same for me. Node was unreachable for 3.5 days:

reason: unable to start peer: peer did not complete handshake within 15s

@callebtc
Copy link
Author

callebtc commented Sep 6, 2023

Those logs look unrelated, if you look at the force closes, do they have HTLCs on them? The log areas you want to look at are HSWC, and CNCT.

It looks like it, see my "CNCT" log entries added above. Could not find any relevant "HSWC" lines.

@Roasbeef
Copy link
Member

Roasbeef commented Sep 7, 2023

Same for me. Node was unreachable for 3.5 days:

Yeah that log line would be indicative of the bug fixed: eventually the peer sub-system would lock up, and though new TCP connections could be created, the handshake wouldn't complete in time.

@callebtc
Copy link
Author

callebtc commented Sep 11, 2023

Updating to rc2 doesn't seem to have fixed my issue, still seeing force-closes. I've added more errors I could find, please find them attached to my OP.

I've also noticed that I stop getting any updates on the node activity with bosbot (but I don't get a disconnect message either). When I restart LND, I get a disconnect message with bosbot.

@Roasbeef
Copy link
Member

Updating to rc2 doesn't seem to have fixed my issue, still seeing force-closes. I've added more errors I could find, please find them attached to my OP.

Are these due to normal HTLC timeouts? Those logs so far are due to transactions not being able to get into the mempool as fees are higher. If you're seeing this in a widespread manner with non anchor channels, then it may be the case that those can't confirm in time (as we can't fee bump them), so then other HTLCs start to timeout.

@Roasbeef
Copy link
Member

If you look at the logs of [INF] CNCT: ChannelArbitrator(<id>:31): force closing chan, and the surrounding areas, then you can see why it went on chain. If it was due to an HTLC timeout, or something else entirely.

@callebtc
Copy link
Author

Privately communicated LND logs to @ziggie1984 for further analysis.

@Roasbeef
Copy link
Member

@callebtc are you still seeing on going issues? Are you able to maintain connectivity with peers?

We have a PR (#7980) to fix that tower log error you're seeing. Basically your node was sending updates for a channel that had already been closed.

@niteshbalusu11
Copy link

niteshbalusu11 commented Sep 16, 2023

I just experienced this on RC3. Some peers force closed on restart and there were no pending htlcs

@niteshbalusu11
Copy link

niteshbalusu11 commented Sep 17, 2023

@Roasbeef

Here are my logs for one of the channel that force closed. I can get you more logs later.

2023-09-16 19:24:08.379 [DBG] CNCT: New ChainEventSubscription(id=0) for ChannelPoint(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2)
2023-09-16 19:24:08.421 [DBG] CNCT: Starting chain watcher for ChannelPoint(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2)
2023-09-16 19:24:08.421 [DBG] NTFN: Using height hint 807717 retrieved from cache for outpoint=55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2, script=0 f3cc926f542ef6f12c10eb1a153e9241c52ab0800541494e90d9410a434a2116 instead of 792125 for spend subscription
2023-09-16 19:24:08.425 [INF] NTFN: New spend subscription: spend_id=10, outpoint=55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2, script=0 f3cc926f542ef6f12c10eb1a153e9241c52ab0800541494e90d9410a434a2116, height_hint=807717
2023-09-16 19:24:08.425 [INF] NTFN: Dispatching historical spend rescan for outpoint=55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:
, script=0 f3cc926f542ef6f12c10eb1a153e9241c52ab0800541494e90d9410a434a2116, start=807717, end=808020
2023-09-16 19:24:08.876 [INF] CNCT: Close observer for ChannelPoint(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2) active
2023-09-16 19:24:09.277 [DBG] CNCT: Starting ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2), htlc_set=(map[contractcourt.HtlcSetKey]contractcourt.htlcSet) (len=2) {
2023-09-16 19:24:09.296 [INF] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): starting state=StateDefault, trigger=chainTrigger, triggerHeight=808020
2023-09-16 19:24:09.297 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): attempting state step with trigger=chainTrigger from state=StateDefault
2023-09-16 19:24:09.297 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): new block (height=808020) examining active HTLC's
2023-09-16 19:24:09.297 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): checking commit chain actions at height=808020, in_htlc_count=0, out_htlc_count=0
2023-09-16 19:24:09.298 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): no actions for chain trigger, terminating
2023-09-16 19:24:09.299 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): terminating at state=StateDefault
2023-09-16 19:24:10.114 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): attempting state step with trigger=chainTrigger from state=StateDefault
2023-09-16 19:24:10.118 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): new block (height=808020) examining active HTLC's
2023-09-16 19:24:10.122 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): checking commit chain actions at height=808020, in_htlc_count=0, out_htlc_count=0
2023-09-16 19:24:10.124 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): no actions for chain trigger, terminating
2023-09-16 19:24:10.134 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): terminating at state=StateDefault
2023-09-16 19:24:36.559 [INF] NTFN: Historical spend dispatch finished for request outpoint=55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2, script=0 f3cc926f542ef6f12c10eb1a153e9241c52ab0800541494e90d9410a434a2116 (start=807717 end=808020) with details: 4888f0b5df539d1132798dd8a349a33bf4d1c6869605396bb1cc697183828cf3[0] spending 55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2 at height=807950
2023-09-16 19:24:36.583 [DBG] NTFN: Updated spend hint to height=807950 for confirmed spend request outpoint=55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2, script=0 f3cc926f542ef6f12c10eb1a153e9241c52ab0800541494e90d9410a434a2116
2023-09-16 19:24:36.583 [INF] NTFN: Dispatching confirmed spend notification for outpoint=55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2, script=0 f3cc926f542ef6f12c10eb1a153e9241c52ab0800541494e90d9410a434a2116 at current height=808020: 4888f0b5df539d1132798dd8a349a33bf4d1c6869605396bb1cc697183828cf3[0] spending 55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2 at height=807950
2023-09-16 19:24:36.584 [INF] CNCT: Unilateral close of ChannelPoint(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2) detected
2023-09-16 19:24:36.587 [INF] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): remote party has closed channel out on-chain
2023-09-16 19:24:36.647 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): attempting state step with trigger=remoteCloseTrigger from state=StateDefault
2023-09-16 19:24:36.647 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): new block (height=807950) examining active HTLC's
2023-09-16 19:24:36.647 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): checking commit chain actions at height=807950, in_htlc_count=0, out_htlc_count=0
2023-09-16 19:24:40.995 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): attempting state step with trigger=remoteCloseTrigger from state=StateContractClosed
2023-09-16 19:24:40.995 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): checking commit chain actions at height=807950, in_htlc_count=0, out_htlc_count=0
2023-09-16 19:24:40.996 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): inserting 2 contract resolvers
2023-09-16 19:24:41.009 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): attempting to resolve *contractcourt.commitSweepResolver
2023-09-16 19:24:41.009 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): contract *contractcourt.commitSweepResolver not yet resolved
2023-09-16 19:24:41.012 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): attempting to resolve *contractcourt.anchorResolver
2023-09-16 19:24:41.012 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): contract *contractcourt.anchorResolver not yet resolved
2023-09-16 19:24:41.040 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): attempting state step with trigger=remoteCloseTrigger from state=StateWaitingFullResolution
2023-09-16 19:24:41.040 [INF] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): still awaiting contract resolution
2023-09-16 19:24:41.040 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): still have unresolved contract: *contractcourt.commitSweepResolver
2023-09-16 19:24:41.040 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): terminating at state=StateWaitingFullResolution
2023-09-16 19:25:26.581 [DBG] CNCT: *contractcourt.commitSweepResolver(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): commit conf_height=807950, unlock_height=807951
2023-09-16 19:25:26.581 [DBG] CNCT: *contractcourt.commitSweepResolver(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): waiting for CSV lock to expire at height 807951
2023-09-16 19:25:26.583 [DBG] CNCT: *contractcourt.commitSweepResolver(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): isDelayedOutput=true, isLocalCommitTx=false
2023-09-16 19:25:26.583 [INF] CNCT: *contractcourt.commitSweepResolver(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): Sweeping with witness type: CommitmentToRemoteConfirmed
2023-09-16 19:25:26.583 [INF] CNCT: *contractcourt.commitSweepResolver(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): sweeping commit output
2023-09-16 19:26:56.647 [WRN] CHBU: Replacing disk backup for ChannelPoint(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2) w/ newer version
2023-09-16 19:48:46.429 [INF] CNCT: *contractcourt.commitSweepResolver(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): local commitment output fully resolved by sweep tx: 0e12e6478ce4cb3b0c641a1014d52c14dbf23b1391006fe45501b8081b497fb8
2023-09-16 19:48:46.452 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): marking contract *contractcourt.commitSweepResolver fully resolved
2023-09-16 19:48:46.475 [INF] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): a contract has been fully resolved!
2023-09-16 19:48:46.476 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): attempting state step with trigger=chainTrigger from state=StateWaitingFullResolution
2023-09-16 19:48:46.476 [INF] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): still awaiting contract resolution
2023-09-16 19:48:46.503 [DBG] CNCT: ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2): attempting state step with trigger=chainTrigger from state=StateFullyResolved
2023-09-16 19:48:46.503 [INF] CNCT: ChannelPoint(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2) has been fully resolved on-chain at height=808022
2023-09-16 19:48:46.503 [INF] CNCT: Marking ChannelPoint(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2) fully resolved
2023-09-16 19:48:46.566 [DBG] CNCT: Stopping ChannelArbitrator(55bbe001814f30f93ed4e3c3583440022bcfdfcf66ee022795ce4e0208c7256a:2)

@ziggie1984
Copy link
Collaborator

@niteshbalusu11 hmm the logs only show that your remote party closed the channel (force closed) would be interesting to see why your channel with this node was not able to resolve the htlcs on it? Can you share more logs before the force-close or even from your peer.

@niteshbalusu11
Copy link

I will share full logs soon.

@Fopstronaut
Copy link

I have the same issue on 0.17.2-beta.

Logs

2023-11-30 17:11:46.698 [INF] CNCT: ChannelArbitrator(<id>:0): fail dangling htlc=<htlc> from local/remote commitments diff
2023-11-30 17:11:46.715 [INF] CNCT: ChannelArbitrator(<id>:0): force closing chan
2023-11-30 17:11:46.725 [INF] HSWC: ChannelLink(<id>:0): stopping
2023-11-30 17:11:46.725 [INF] HSWC: ChannelLink(<id>:0): exited
2023-11-30 17:11:46.725 [INF] HSWC: Removing channel link with ChannelID(<id2>)
2023-11-30 17:11:46.735 [INF] HSWC: Garbage collected 4 shared secret hashes at height=819149
2023-11-30 17:11:46.746 [INF] CNCT: Broadcasting force close transaction <tx>, ChannelPoint(<id>:0): (*wire.MsgTx)(0xc0<......>c0)({

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unintended code behaviour force closes P1 MUST be fixed or reviewed
Projects
None yet
Development

No branches or pull requests

10 participants