kvdb/postgres: remove global application level lock #8529

guggero · 2024-03-07T16:25:45Z

Replacing #7992, pushing to upstream to enable collaboration.

Depends on btcsuite/btcwallet#913.
Fixes #8049.

In this PR, we remove the global application level lock from the postgres backend. This lock prevents multiple write transactions from happening at the same time, and will also block a writer if a read is on going. Since this lock was added, we know always open DB connections with the strongest level of concurrency control available: LevelSerializable. In concert with the new auto retry logic, we ensure that if db transactions conflict (writing the same key/row in this case), then the tx is retried automatically.

Removing this lock should increase perf for the postgres backend, as now concurrent write transactions can proceed, being serialized as needed. Rather then trying to handle concurrency at the application level, we'll set postgres do its job, with the application only needing to retry as necessary.

In this commit, we remove the global application level lock from the postgres backend. This lock prevents multiple write transactions from happening at the same time, and will also block a writer if a read is on going. Since this lock was added, we know always open DB connections with the strongest level of concurrency control available: `LevelSerializable`. In concert with the new auto retry logic, we ensure that if db transactions conflict (writing the same key/row in this case), then the tx is retried automatically. Removing this lock should increase perf for the postgres backend, as now concurrent write transactions can proceed, being serialized as needed. Rather then trying to handle concurrency at the application level, we'll set postgres do its job, with the application only needing to retry as necessary.

Some sub-systems like btcwallet will return an error from the database, but they won't properly wrap it. As a result, we were unable to actually catch the serialization errors in the first place. To work around this, we'll now attempt to parse the error string directly.

With the new postgres concurrency control, an error may come from a bucket function that's actually a postgres error. In this case, we need to return early so we can retry the txn. Otherwise, we'll be working with an aborted tx, and never actually return the error so we don't auto retry.

In this commit, we fix a bug that would cause the entire db to shutdown if hit a panic (since db operations in the main buckets exit with a panic) while executing a txn call back. This might be a postgres error we need to check, so we don't want to bail out, and instead want to pass up the error to the caller so we can retry if needed.

At times we'll get an error from the transaction call back itself, since we may be using postgres over streaming RPC. In this case, we still need to roll back then attempt to retry.

In this commit, we fix a bug in the commit retry loop, we'll now make sure that if we get an error on commit, we'll map it to the SQL error then attempt to decide if we need to retry or not.

Because rolling back any changes that haven't been committed yet is essential when re-trying, if rolling back we should fail hard.

coderabbitai · 2024-03-07T16:25:50Z

Important

Auto Review Skipped

Auto reviews are limited to the following labels: llm-review. Please add one of these labels to enable auto reviews.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

github-actions · 2024-03-07T16:38:40Z

Pull reviewers stats

Stats of the last 30 days for lnd:

User	Total reviews	Time to review	Total comments
guggero 🥇	9 ▀▀	1d 6h 54m ▀	26 ▀▀
ellemouton 🥈	8 ▀▀	4d 8h 16m ▀▀	26 ▀▀
ProofOfKeags 🥉	7 ▀	2d 23h 9m ▀	74 ▀▀▀▀▀
Roasbeef	5 ▀	12h 17m	4
bhandras	4 ▀	2d 18h 49m ▀	3
yyforyongyu	4 ▀	10h 33m	2
saubyk	3 ▀	16h 13m	0
Crypt-iQ	2	6d 11h 18m ▀▀▀	5
hieblmi	2	21m	1
Chinwendu20	2	21h 3m	3
morehouse	1	9h 41m	1
bitromortac	1	1h 13m	0
ziggie1984	1	1d 11h 28m ▀	2

guggero · 2024-03-07T16:43:45Z

Turned it into a draft, since it doesn't actually pass the integration tests yet. Still debugging the cause(s).

autopilot/agent.go

Includes a fix for a wrap, and module replace.

Roasbeef · 2024-03-07T23:43:00Z

kvdb/sqlbase/db.go

@@ -345,7 +345,7 @@ func (db *db) executeTransaction(f func(tx walletdb.ReadWriteTx) error,
 			}
 		}

-		return commitErr
+		return nil


I think this should still return the commitErr as otherwise, if it can't be mapped to a SQLError, then we always return nil, when the actual error should be thread through.

Also revisiting this section, it should be trying to map the commitErr here, now fnErrr (shadowed from above).

Roasbeef · 2024-03-07T23:59:19Z

channeldb/payment_control.go

@@ -171,14 +171,15 @@ func (p *PaymentControl) InitPayment(paymentHash lntypes.Hash,
 		// retrying the payment or return a specific error.
 		case err == nil:
 			if err := paymentStatus.initializable(); err != nil {
-				updateErr = err
+				updateErr = fmt.Errorf("initialize: %w", err)


So If I revert this back to just:

updateErr = err

Then strangely, the sendtoroute_multi_path_payment starts to pass again 🤔

Similarly, I also don't wrap the error down below, pushing this up as a commit to see how CI fares now.

Not sure why, but when we wrap the error here, this error check fails, causing the whole itest to fail:

lnd/routing/router.go

Line 2554 in 48a3d56

case mpp != nil && errors.Is(err, channeldb.ErrPaymentExists):

This causes itests to fail, still need to investigate why the other errors.Is check isn't catching this.

This fixes a bug as we were mapping the `fnErr`, which was already nil if we got to this point. We'll also return the `commitErr` as is, if we were unable to match it. This ensures we don't mask an error.

Roasbeef · 2024-03-08T00:18:31Z

Pushed a few commits. Unit tests should pass again now, allowing us to narrow down if the postgres itests are still failing. Left some comments above, and put some context in the temp commits pushed up.

We'll now actually catch the error for a failed rollback, so the error we return in the case of a severed connection differs from before, requiring an update to the tests.

Roasbeef · 2024-03-08T00:50:40Z

Pushed up another unit test fix. The error on panic due to a closed connection differs now as we catch the rollback error.

The latest itest failure is now:

       received error from payment stream: rpc error: code = Unknown desc = unable to init payment: db tx retries exceeded

This is from the async payment benchmark test (testAsyncPayments). The test ends up spawning 488 goroutines to make a concurrent payment. Perhaps we need to tweak the max connection settings to allow this to succeed?

This is an attempt to get the async payment test passing. This test creates 488 concurrent payments at once.

Roasbeef · 2024-03-08T01:26:26Z

Also I realize we need to port/merge these serialization retry changes to sqldb as well. Not sure if max connections is even set in the kvbd version right now?

lightninglabs-deploy · 2024-03-27T20:12:40Z

@guggero, remember to re-request review from reviewers when ready

guggero · 2024-04-02T16:50:46Z

Will be replaced by #8611.

Roasbeef and others added 11 commits March 7, 2024 09:43

temp: add replace so new code is used

9ee3aee

kvdb: implement txn retry when executing txn call back

53a5155

At times we'll get an error from the transaction call back itself, since we may be using postgres over streaming RPC. In this case, we still need to roll back then attempt to retry.

kvdb: fix bug in commit retry loop

f956545

In this commit, we fix a bug in the commit retry loop, we'll now make sure that if we get an error on commit, we'll map it to the SQL error then attempt to decide if we need to retry or not.

kvdb: return error if rollback fails

fb06c6d

Because rolling back any changes that haven't been committed yet is essential when re-trying, if rolling back we should fail hard.

mod: bump to btcwallet that wraps errors

d267747

multi: wrap all errors

0cd6017

wip: debug itest failures

43e6d9f

guggero added optimization concurrency postgres labels Mar 7, 2024

guggero requested review from Roasbeef, morehouse, ellemouton, bhandras, bitromortac, yyforyongyu, ProofOfKeags and Crypt-iQ as code owners March 7, 2024 16:25

guggero mentioned this pull request Mar 7, 2024

kvdb/postgres: remove global application level lock #7992

Closed

guggero marked this pull request as draft March 7, 2024 16:43

guggero removed request for bhandras, Roasbeef, ProofOfKeags and yyforyongyu March 7, 2024 16:43

guggero removed request for Crypt-iQ, morehouse, ellemouton and bitromortac March 7, 2024 16:43

Roasbeef reviewed Mar 7, 2024

View reviewed changes

autopilot/agent.go Show resolved Hide resolved

build: temp point to roasbeef's fork

5e5be37

Includes a fix for a wrap, and module replace.

Roasbeef reviewed Mar 7, 2024

View reviewed changes

Roasbeef added 2 commits March 7, 2024 15:43

funding: fix test warning due to error wrapping a logged err

2c12a92

channeldb: fix payment control unit tests w/ errors.Is

31d19e1

Roasbeef reviewed Mar 7, 2024

View reviewed changes

Roasbeef added 2 commits March 7, 2024 16:14

channeldb: don't wrap the error fail from initializable

40b5b60

This causes itests to fail, still need to investigate why the other errors.Is check isn't catching this.

kvdb/sqlbase: properly map commitErr on failed commit

04dcb65

This fixes a bug as we were mapping the `fnErr`, which was already nil if we got to this point. We'll also return the `commitErr` as is, if we were unable to match it. This ensures we don't mask an error.

kvdb/postgres: update TestPanic w/ rollback error

99506d9

We'll now actually catch the error for a failed rollback, so the error we return in the case of a severed connection differs from before, requiring an update to the tests.

kvdb: increase max number of retries to 50

8dd4dc2

This is an attempt to get the async payment test passing. This test creates 488 concurrent payments at once.

saubyk assigned guggero Mar 8, 2024

saubyk added this to the v0.18.0 milestone Mar 8, 2024

aakselrod mentioned this pull request Mar 14, 2024

multi: replace LockOutpoint with LeaseOutput #8496

Merged

8 tasks

levmi assigned bhandras and unassigned guggero Mar 19, 2024

bhandras marked this pull request as ready for review March 20, 2024 18:42

bhandras mentioned this pull request Apr 2, 2024

sqldb+kvdb: unify SQL error mapping and transaction retry for between the kvdb and sqldb packages #8611

Merged

guggero closed this Apr 2, 2024

saubyk removed this from the v0.18.0 milestone Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvdb/postgres: remove global application level lock #8529

kvdb/postgres: remove global application level lock #8529

guggero commented Mar 7, 2024 •

edited

Loading

coderabbitai bot commented Mar 7, 2024 •

edited

Loading

Auto Review Skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

CodeRabbit Discord Community

github-actions bot commented Mar 7, 2024

guggero commented Mar 7, 2024

Roasbeef Mar 7, 2024

Roasbeef Mar 7, 2024

Roasbeef Mar 7, 2024

Roasbeef Mar 7, 2024

Roasbeef Mar 8, 2024

Roasbeef commented Mar 8, 2024

Roasbeef commented Mar 8, 2024

Roasbeef commented Mar 8, 2024

lightninglabs-deploy commented Mar 27, 2024

guggero commented Apr 2, 2024

kvdb/postgres: remove global application level lock #8529

kvdb/postgres: remove global application level lock #8529

Conversation

guggero commented Mar 7, 2024 • edited Loading

coderabbitai bot commented Mar 7, 2024 • edited Loading

Auto Review Skipped

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

CodeRabbit Discord Community

github-actions bot commented Mar 7, 2024

Pull reviewers stats

guggero commented Mar 7, 2024

Roasbeef Mar 7, 2024

Choose a reason for hiding this comment

Roasbeef Mar 7, 2024

Choose a reason for hiding this comment

Roasbeef Mar 7, 2024

Choose a reason for hiding this comment

Roasbeef Mar 7, 2024

Choose a reason for hiding this comment

Roasbeef Mar 8, 2024

Choose a reason for hiding this comment

Roasbeef commented Mar 8, 2024

Roasbeef commented Mar 8, 2024

Roasbeef commented Mar 8, 2024

lightninglabs-deploy commented Mar 27, 2024

guggero commented Apr 2, 2024

guggero commented Mar 7, 2024 •

edited

Loading

coderabbitai bot commented Mar 7, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)