Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvdb/postgres: remove global application level lock #8529

Closed
wants to merge 18 commits into from

Conversation

guggero
Copy link
Collaborator

@guggero guggero commented Mar 7, 2024

Replacing #7992, pushing to upstream to enable collaboration.

Depends on btcsuite/btcwallet#913.
Fixes #8049.

In this PR, we remove the global application level lock from the postgres backend. This lock prevents multiple write transactions from happening at the same time, and will also block a writer if a read is on going. Since this lock was added, we know always open DB connections with the strongest level of concurrency control available: LevelSerializable. In concert with the new auto retry logic, we ensure that if db transactions conflict (writing the same key/row in this case), then the tx is retried automatically.

Removing this lock should increase perf for the postgres backend, as now concurrent write transactions can proceed, being serialized as needed. Rather then trying to handle concurrency at the application level, we'll set postgres do its job, with the application only needing to retry as necessary.

Roasbeef and others added 11 commits March 7, 2024 09:43
In this commit, we remove the global application level lock from the
postgres backend. This lock prevents multiple write transactions from
happening at the same time, and will also block a writer if a read is on
going. Since this lock was added, we know always open DB connections
with the strongest level of concurrency control available:
`LevelSerializable`. In concert with the new auto retry logic, we ensure
that if db transactions conflict (writing the same key/row in this
case), then the tx is retried automatically.

Removing this lock should increase perf for the postgres backend, as now
concurrent write transactions can proceed, being serialized as needed.
Rather then trying to handle concurrency at the application level, we'll
set postgres do its job, with the application only needing to retry as
necessary.
Some sub-systems like btcwallet will return an error from the database,
but they won't properly wrap it. As a result, we were unable to actually
catch the serialization errors in the first place. To work around this,
we'll now attempt to parse the error string directly.
With the new postgres concurrency control, an error may come from a
bucket function that's actually a postgres error. In this case, we need
to return early so we can retry the txn. Otherwise, we'll be working
with an aborted tx, and never actually return the error so we don't auto
retry.
In this commit, we fix a bug that would cause the entire db to shutdown
if hit a panic (since db operations in the main buckets exit with a
panic) while executing a txn call back. This might be a postgres error
we need to check, so we don't want to bail out, and instead want to pass
up the error to the caller so we can retry if needed.
At times we'll get an error from the transaction call back itself, since
we may be using postgres over streaming RPC. In this case, we still need
to roll back then attempt to retry.
In this commit, we fix a bug in the commit retry loop, we'll now make
sure that if we get an error on commit, we'll map it to the SQL error
then attempt to decide if we need to retry or not.
Because rolling back any changes that haven't been committed yet is
essential when re-trying, if rolling back we should fail hard.
Copy link
Contributor

coderabbitai bot commented Mar 7, 2024

Important

Auto Review Skipped

Auto reviews are limited to the following labels: llm-review. Please add one of these labels to enable auto reviews.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link

github-actions bot commented Mar 7, 2024

Pull reviewers stats

Stats of the last 30 days for lnd:

User Total reviews Time to review Total comments
guggero
🥇
9
▀▀
1d 6h 54m
26
▀▀
ellemouton
🥈
8
▀▀
4d 8h 16m
▀▀
26
▀▀
ProofOfKeags
🥉
7
2d 23h 9m
74
▀▀▀▀▀
Roasbeef
5
12h 17m
4
bhandras
4
2d 18h 49m
3
yyforyongyu
4
10h 33m
2
saubyk
3
16h 13m
0
Crypt-iQ
2
6d 11h 18m
▀▀▀
5
hieblmi
2
21m
1
Chinwendu20
2
21h 3m
3
morehouse
1
9h 41m
1
bitromortac
1
1h 13m
0
ziggie1984
1
1d 11h 28m
2

@guggero guggero marked this pull request as draft March 7, 2024 16:43
@guggero
Copy link
Collaborator Author

guggero commented Mar 7, 2024

Turned it into a draft, since it doesn't actually pass the integration tests yet. Still debugging the cause(s).

Includes a fix for a wrap, and module replace.
@@ -345,7 +345,7 @@ func (db *db) executeTransaction(f func(tx walletdb.ReadWriteTx) error,
}
}

return commitErr
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should still return the commitErr as otherwise, if it can't be mapped to a SQLError, then we always return nil, when the actual error should be thread through.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also revisiting this section, it should be trying to map the commitErr here, now fnErrr (shadowed from above).

@@ -171,14 +171,15 @@ func (p *PaymentControl) InitPayment(paymentHash lntypes.Hash,
// retrying the payment or return a specific error.
case err == nil:
if err := paymentStatus.initializable(); err != nil {
updateErr = err
updateErr = fmt.Errorf("initialize: %w", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So If I revert this back to just:

updateErr = err

Then strangely, the sendtoroute_multi_path_payment starts to pass again 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I also don't wrap the error down below, pushing this up as a commit to see how CI fares now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why, but when we wrap the error here, this error check fails, causing the whole itest to fail:

case mpp != nil && errors.Is(err, channeldb.ErrPaymentExists):

Roasbeef added 2 commits March 7, 2024 16:14
This causes itests to fail, still need to investigate why the other
errors.Is check isn't catching this.
This fixes a bug as we were mapping the `fnErr`, which was already nil
if we got to this point. We'll also return the `commitErr` as is, if we
were unable to match it. This ensures we don't mask an error.
@Roasbeef
Copy link
Member

Roasbeef commented Mar 8, 2024

Pushed a few commits. Unit tests should pass again now, allowing us to narrow down if the postgres itests are still failing. Left some comments above, and put some context in the temp commits pushed up.

We'll now actually catch the error for a failed rollback, so the error
we return in the case of a severed connection differs from before,
requiring an update to the tests.
@Roasbeef
Copy link
Member

Roasbeef commented Mar 8, 2024

Pushed up another unit test fix. The error on panic due to a closed connection differs now as we catch the rollback error.

The latest itest failure is now:

       received error from payment stream: rpc error: code = Unknown desc = unable to init payment: db tx retries exceeded

This is from the async payment benchmark test (testAsyncPayments). The test ends up spawning 488 goroutines to make a concurrent payment. Perhaps we need to tweak the max connection settings to allow this to succeed?

This is an attempt to get the async payment test passing. This test
creates 488 concurrent payments at once.
@Roasbeef
Copy link
Member

Roasbeef commented Mar 8, 2024

Also I realize we need to port/merge these serialization retry changes to sqldb as well. Not sure if max connections is even set in the kvbd version right now?

@saubyk saubyk added this to the v0.18.0 milestone Mar 8, 2024
@levmi levmi assigned bhandras and unassigned guggero Mar 19, 2024
@bhandras bhandras marked this pull request as ready for review March 20, 2024 18:42
@lightninglabs-deploy
Copy link

@guggero, remember to re-request review from reviewers when ready

@guggero
Copy link
Collaborator Author

guggero commented Apr 2, 2024

Will be replaced by #8611.

@guggero guggero closed this Apr 2, 2024
@saubyk saubyk removed this from the v0.18.0 milestone Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants