-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvdb/postgres: remove global application level lock #7992
kvdb/postgres: remove global application level lock #7992
Conversation
c5b85fc
to
c4ea65c
Compare
Looks like transactions are actually conflicting even at startup:
I remember that we had to add an application level lock for the wallet DB even for Line 314 in 0730337
Though I'm not sure why that isn't a problem with SQLite... Perhaps we do need that single writer lock just for the wallet (because it was even more built with only bbolt in mind)? |
In this commit, we remove the global application level lock from the postgres backend. This lock prevents multiple write transactions from happening at the same time, and will also block a writer if a read is on going. Since this lock was added, we know always open DB connections with the strongest level of concurrency control available: `LevelSerializable`. In concert with the new auto retry logic, we ensure that if db transactions conflict (writing the same key/row in this case), then the tx is retried automatically. Removing this lock should increase perf for the postgres backend, as now concurrent write transactions can proceed, being serialized as needed. Rather then trying to handle concurrency at the application level, we'll set postgres do its job, with the application only needing to retry as necessary.
Some sub-systems like btcwallet will return an error from the database, but they won't properly wrap it. As a result, we were unable to actually catch the serialization errors in the first place. To work around this, we'll now attempt to parse the error string directly.
With the new postgres concurrency control, an error may come from a bucket function that's actually a postgres error. In this case, we need to return early so we can retry the txn. Otherwise, we'll be working with an aborted tx, and never actually return the error so we don't auto retry.
In this commit, we fix a bug that would cause the entire db to shutdown if hit a panic (since db operations in the main buckets exit with a panic) while executing a txn call back. This might be a postgres error we need to check, so we don't want to bail out, and instead want to pass up the error to the caller so we can retry if needed.
At times we'll get an error from the transaction call back itself, since we may be using postgres over streaming RPC. In this case, we still need to roll back then attempt to retry.
In this commit, we fix a bug in the commit retry loop, we'll now make sure that if we get an error on commit, we'll map it to the SQL error then attempt to decide if we need to retry or not.
e4e4b86
to
c88e270
Compare
Pushed up a series of new commits, fixing some issues:
With this, everything starts up locally for me, and I was able to pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a first pass, looks pretty good. Though something's still not working correctly with the wallet. I'm not sure if we have to keep some sort of lock just for the wallet still? Because that code was never updated to not rely on the assumption of a global per-database lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Great we get rid of this lock finally. Will also do some stress testing before final approval.
@Roasbeef, remember to re-request review from reviewers when ready |
Replaced by #8529. |
In this commit, we remove the global application level lock from the postgres backend. This lock prevents multiple write transactions from happening at the same time, and will also block a writer if a read is on going. Since this lock was added, we know always open DB connections with the strongest level of concurrency control available:
LevelSerializable
. In concert with the new auto retry logic, we ensure that if db transactions conflict (writing the same key/row in this case), then the tx is retried automatically.Removing this lock should increase perf for the postgres backend, as now concurrent write transactions can proceed, being serialized as needed. Rather then trying to handle concurrency at the application level, we'll set postgres do its job, with the application only needing to retry as necessary.