Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature]: Support latest two major releases #8374

Open
morehouse opened this issue Jan 11, 2024 · 6 comments
Open

[feature]: Support latest two major releases #8374

morehouse opened this issue Jan 11, 2024 · 6 comments
Labels
brainstorming Long term ideas/discussion/requests for feedback releases upgrading Related to the process of upgrading to newer versions of LND

Comments

@morehouse
Copy link
Collaborator

Background

Currently only the latest major LND release is supported, and previous releases do not get any bug fixes. At the same time, each major release has such a large scope that it inevitably introduces new bugs. So, every release, node operators have to make a difficult decision -- either stay on the previous release and deal with known bugs, or upgrade and deal with unknown bugs.

Illustration

LND 0.17.0 was released October 3 with a whole host of bug fixes and new features. Alice, a large node operator running 0.16.4, didn't plan to use the new features in 0.17.0 but decided to upgrade immediately for the bug fixes. After upgrading, Alice noticed:

Alice was frustrated that she had to keep restarting her deadlocked node and considered downgrading to 0.16.4 again. But Alice did like having the bug fixes from 0.17.0 and was hesitant to attempt a downgrade since 0.17.0 contained a database migration. So she decided to wait for the next minor release and hope it fixed her issues.

LND 0.17.1 was released November 14, containing fixes for CPU utilization and the deadlock problem. Alice immediately upgraded, excited to have her issues fixed. But soon after upgrading, she noticed a new problem: her node started crashing during startup and during operation. This problem was much worse than the previous one, so she downgraded back to 0.17.0.

LND 0.17.2 was released November 21 with a fix for the crashing problem. Alice upgraded and all her problems were fixed.

Soon after, on December 6, LND 0.17.3 was released. Alice saw there were some important bug fixes, but she was tired of dealing with upgrades and downgrades, and she figured "if it ain't broke, don't fix it". Alice decided she probably wouldn't upgrade her node to 0.18.0 either once it is released.

Discussion

In the above scenario, Alice upgraded 3 times and downgraded once in less than two months. She has started to distrust new releases and may choose not to upgrade in the future. As a result, she will not get any important security or bug fixes going forward, putting her large node at substantial risk.

Alice would have been happy to stay on 0.16.4 or upgrade to a 0.16.5 containing all the bug fixes from 0.17.0.

Proposed solution

We should start supporting the latest two major releases. Node operators who want new features can upgrade to the latest release, while node operators more concerned about security and stability can stay on the previous release until any new bugs are ironed out in the latest release.

@morehouse morehouse added upgrading Related to the process of upgrading to newer versions of LND releases labels Jan 11, 2024
@Roasbeef
Copy link
Member

I like this idea in theory, but those 3 releases were somewhat exceptional, as they were in response to security related issues, with a portion of the release fixing lingering deadlock or performance issues introduced in some prior releases. The two threads that led to that line of releases were:

  1. The mempool scanning logic not being efficient enough to keep up with the spike in mempool size due to recent activity. To mitigate, users could lower the polling interval, but likely didn't know about them. We knew the naive scanning code could be inefficient, and knew of that bitcoind RPC we could use, but wanted to get things out quicker as the disclosure was looking.
  2. A reported server bug by a user was patched, but that patch exposed another flaw in the area, which began the goose chase. Generally we want to entirely re-work this area, as it's pretty ancient, and we have some candidate PRs such as WIP: Introduce PeerConnManager to manage peer connections #7283.

If we back ported the fixes in 0.17.1 into 0.16.4, then we would've had to make 0.16.5 and 0.16.6 all the same, which would stretch us even thinner, as we'd need to run the normal rc and review process for those back ports.

I like the idea in theory, but in practice, I don't think we currently have the resources to continually support the last two major releases. I do agree though that the state of things do need to improve though. Today implementations with large user bases can be in a difficult spot at times. If we don't commit to taking on larger protocol features in major release (which are very review intensive, and where the most bugs come from down the line), then we're accused of "not following the spec process", or holding things back or w/e. Stepping back a bit rough pipeline of updates devs are working on is rather deep at this point, which deserves some reflection, and also coming to terms w/ the realities that everyone won't necessarily be working on absolutely everything at once, which IMO is fine, as this is why we have feature bits. Some of those items also may not necessarily be what'll move the needle on addressing common UX point points, or might just be stuff we thought was cool years ago, but don't necessarily solve concrete user/product issues.

With that said, I'd say a near term focus of ours is starting to chip away some of the items I listed here re improving the UX of users in persistently high fee environments: https://groups.google.com/u/1/a/lightning.engineering/g/lnd/c/gz25tikv_3g. This'll take away some resources from other larger protocol related initiatives, but IMO it's well worth it to restore a better baseline level of stability w.r.t fee choppiness (eliminate configs users need to send, make better decisions when going to chain, address long lived issues in the sweeper, etc, etc).

Curious re your thoughts here @saubyk.

@saubyk
Copy link
Collaborator

saubyk commented Jan 11, 2024

hi @morehouse I think you laid out two different problems here:

  1. Scope of the change included in each release is rather large
  2. Bug fixes are not back ported into previous release, forcing users to necessarily upgrade to latest versions

Both of these problems need to be addressed eventually, and the reason it's difficult to handle both of them in the short term is due to the stage of development LND is in right now.

My perception is that at this stage of evolution of LND and the Lightning Network protocol as such, we are still quite early and haven't arrived at some of sort of ossification of the code base. To be able to maintain multiple versions with back ported bug fixes, we either need significant resource commitments or slow down the development pace significantly.

The changes like Taproot Channels, Gossip 1.75, Dynamic Commitments, Relational DB schema for LND data stores, Route Blinding to name a few are significant engineering projects which are impacting the code base quite a bit. Additionally, we also attempt to refactor the code base to pay down tech debt and reduce complexity on an ongoing basis. All of that makes it hard to maintain multiple versions and apply bug fixes to earlier releases.

My reading is that once we are past the major projects mentioned above (famous last words? 😅), we should be in a more steady state position, which can enable us to focus on maintaining the stability of releases via shorter change scope and maintaining two/three versions with bug fixes.

That being said, we did back port bug fixes to earlier releases in certain instances where debilitating issues were uncovered and upgrading to the latest version imposed operational challenges like forcing a Bitcoin core upgrade as well.

Would be great to hear more feedback and suggestions on this, as this is an area which we would definitely like to address in medium to long term.

@saubyk saubyk added the brainstorming Long term ideas/discussion/requests for feedback label Jan 11, 2024
@Roasbeef
Copy link
Member

With all that said, I'm down to give it a shot for 0.18, on a case by case basis. I think there'll be some easy calls (stuff that may crash a node, p2p fixes, low hanging fee estimation stuff, etc). We're also looking to set up some additional branch automation to take out the manual steps currently involved in doing a minor release.

@saubyk
Copy link
Collaborator

saubyk commented Jan 12, 2024

With all that said, I'm down to give it a shot for 0.18, on a case by case basis. I think there'll be some easy calls (stuff that may crash a node, p2p fixes, low hanging fee estimation stuff, etc). We're also looking to set up some additional branch automation to take out the manual steps currently involved in doing a minor release.

I guess in the short term, a simple heuristic can be applied in making a selection of the bug fixes which can be back ported...If the bug fix is in an area with no updates or refactor (minimizing rebase overhead), then back port.

@yyforyongyu
Copy link
Member

Found this piece, I think the usage of stable and unstable branches can also be useful here - in short stable branches for the majority, and unstable for hackers/testers/early adaptors.

I think we should also consider moving to a time-based release: X major planned releases each year, and Y minor unplanned releases inbetween for bug fixes. There some examples to learn from,

@ziggie1984
Copy link
Collaborator

ziggie1984 commented Jan 18, 2024

I think it makes sense to decide a way forward in regards to this issue (at least for the backporting, if we are planning to do this already for 0.18), I especially like the idea of having those different branches (stable vs. testers etc.).

There are already node-runners interested in those backported bug-fix branches so would be cool to decide the next steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
brainstorming Long term ideas/discussion/requests for feedback releases upgrading Related to the process of upgrading to newer versions of LND
Projects
None yet
Development

No branches or pull requests

5 participants