Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi: log additional information for local force closures #7787

Closed

Conversation

shaurya947
Copy link
Contributor

@shaurya947 shaurya947 commented Jun 22, 2023

Change Description

Addresses #7696 by logging the reason for a local force close in the DB, and echoing it within the ClosedChannels response. Since initiating a force close and marking the channel as closed happen at two distinct parts of the code, we utilize the ChannelArbitrator log to persist the necessary information between these two moments in time. Once we mark the channel as closed, we log that info in the closed channels bucket. In this PR we address 3 possible reasons of local force closure: user-initiated, link failure, and HTLC/chainTrigger-related.

Steps to Test

itests have been updated to test for all 3 conditions, but manual testing can be done by simulating any of the 3 local force close situations and then checking the channel information in the ClosedChannels response.

Pull Request Checklist

Testing

  • Your PR passes all CI checks.
  • Tests covering the positive and negative (error paths) are included.
  • Bug fixes contain tests triggering the bug to prevent regressions.

Code Style and Documentation

📝 Please see our Contribution Guidelines for further guidance.

A local force close could be due to one of the following reasons:
- User initiated
- Link failure (automatic)
- On-chain HTLC conditions such as timeout (automatic)
This commit adds a new struct that can encapsulate these different
reasons, and logs that struct to the channel's ArbitratorLog.
In this commit we use the functional optional for specifying the
local force close reason that we built in the previous commit.
The option can be for user-initiated FC, or link failure or FC,
or just logging the ChainActionMap in the case of chainTrigger.
During the local force close flow, at the time of actually marking
the channel as closed, we need to fetch the local force close info
from the channel's arbitrator log and write it into the persistent
closed channels bucket in the DB. This commit achieves that by
creating a new exported struct inside the channeldb package, populates
the struct, and writes it to the DB.
We add a new proto message to populate the local force close info
in the ClosedChannels response and generate the new RPC files. We
also add the info to the response.
@shaurya947 shaurya947 force-pushed the force-close-reason branch from 1c67ef3 to 08ca9fc Compare June 28, 2023 16:48
@shaurya947
Copy link
Contributor Author

Hey @saubyk this is mostly ready for review (I just need to investigate some failing tests) but I've fixed all the lint issues and added a release notes entry for 0.17.1 per the milestone you assigned. Let me know whom we can ask for reviews 🙂

@shaurya947 shaurya947 marked this pull request as ready for review June 28, 2023 16:58
@saubyk
Copy link
Collaborator

saubyk commented Jun 29, 2023

Hi @shaurya947 great work so far. thanks. Will start assigning reviewers when we start the dev cycle for 17.1
Excited to get this feature merged.

@saubyk saubyk modified the milestones: High Priority, v0.18.0 Aug 4, 2023
@ziggie1984
Copy link
Collaborator

looking forward to this PR !

@guggero guggero self-requested a review August 18, 2023 07:41
Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea for approaching this quite complex task. Very nice work, looks pretty far along.
Main comments are around using TLV to serialize the force close info and to share the code between the arbitrator and the channel DB if possible.

b := new(bytes.Buffer)

// first byte is userInitiated bool
err = binary.Write(b, endian, info.userInitiated)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use TLV for any new data structure that we save to the DB. That will make it way easier to extend this info structure as well. See the example in fetchChanInfo() in channeldb/channel.go.

return err
}

// next bytes are actual map data
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: all inline comments should also be full sentences with proper capitalization and punctuation.

func UserInitiatedForceClose(ca *ChannelArbitrator) {
// Ignore errors since logging force close info is not a critical part
// of the force close flow.
_ = ca.log.LogLocalForceCloseInfo(localForceCloseInfo{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should at least log any errors?

// of ForceCloseContract(). We don't export this type directly, but rather the
// concrete functions implementing this type, so that callers can choose the
// concrete function that best describes their use case for force closing.
type forceCloseContractOption func(*ChannelArbitrator)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this doesn't look like the normal option pattern, as it's just a function type. Maybe just call it forceCloseContractCallback instead?

@@ -1071,6 +1104,7 @@ func (c *ChainArbitrator) ForceCloseContract(chanPoint wire.OutPoint) (*wire.Msg
}

log.Infof("Attempting to force close ChannelPoint(%v)", chanPoint)
opt(arbitrator)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice design with the callback 👍

@@ -870,6 +870,11 @@ func (c *ChannelArbitrator) stateStep(
// next state, as we still need to broadcast the commitment
// transaction.
case chainTrigger:
// Ignore errors since logging force close info is not a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a named callback for this as well? Then we have all the code in one place.

// this may be extended in the future. For example, we may decide to log HTLC
// information that doesn't automatically result in a force close, but leaves
// the channel in a state whereby the user has no option but to force close.
type LocalForceCloseInfo struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use this in the contractcourt directly? I see that the HtlcActions isn't exactly the same type, but maybe we could convert? Otherwise we need encode/decode code in two places.
If we share this struct, it can have a public Encode() and Decode() method.

@@ -3579,6 +3606,18 @@ func serializeChannelCloseSummary(w io.Writer, cs *ChannelCloseSummary) error {
}
}

// Write whether the local force close info is present.
if err := WriteElements(w, cs.LocalFCInfo != nil); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work without changes to the WriteElement and ReadElement functions in channeldb/codec.go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you say this? bool is handled by WriteElement.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I misread it as just WriteElements(w, cs.LocalFCInfo), missed the != nil part. So maybe extract to a bool to make it more obvious?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the general pattern of writing it looking at the function.

for action, htlcs := range localFCInfo.HtlcActions {
htlcHashes := make([][]byte, len(htlcs))
for i, htlc := range htlcs {
htlcHashes[i] = htlc.RHash[:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use copy() here to avoid running into the famous golang loop variable issues.

@@ -1601,6 +1601,25 @@ enum Initiator {
INITIATOR_BOTH = 3;
}

message ListHTLCHash {
// Hash of HTLC
repeated bytes hash = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should be plural since it's a slice of hashes.

@Chinwendu20
Copy link
Contributor

Thanks @guggero for the review as per @saubyk message on slack, I will work on the PR to implement your suggestions and submit an update soon.

@saubyk
Copy link
Collaborator

saubyk commented Sep 11, 2023

Thanks @guggero for the review as per @saubyk message on slack, I will work on the PR to implement your suggestions and submit an update soon.

Thanks @Chinwendu20


// htlcActions is a non-empty map iff the force close was automatically
// initiated by an on-chain trigger such as HTLC timeout.
htlcActions ChainActionMap
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @guggero would be a good idea to just indicate that the closure was due to ahtlcAction and we just put the particular htlc action responsible for the closure. Would there be need for the actual htlc?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it would be nice to be able to correlate the closure resulting from an HTLC with the actual HTLC in the channel (to see why it timed out, what its CLTV delta was and so on).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay thanks.

@lightninglabs-deploy
Copy link

@shaurya947, remember to re-request review from reviewers when ready

@Chinwendu20
Copy link
Contributor

Continued here: #8072

@guggero
Copy link
Collaborator

guggero commented Oct 9, 2023

Replaced by #8072.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants