kvdb/postgres: remove global application level lock #7992

Roasbeef · 2023-09-15T23:15:06Z

In this commit, we remove the global application level lock from the postgres backend. This lock prevents multiple write transactions from happening at the same time, and will also block a writer if a read is on going. Since this lock was added, we know always open DB connections with the strongest level of concurrency control available: LevelSerializable. In concert with the new auto retry logic, we ensure that if db transactions conflict (writing the same key/row in this case), then the tx is retried automatically.

Removing this lock should increase perf for the postgres backend, as now concurrent write transactions can proceed, being serialized as needed. Rather then trying to handle concurrency at the application level, we'll set postgres do its job, with the application only needing to retry as necessary.

guggero · 2023-09-19T11:16:49Z

Looks like transactions are actually conflicting even at startup:

 2023-09-18 20:47:30.985 UTC [57] ERROR:  could not serialize access due to read/write dependencies among transactions
2023-09-18 20:47:30.985 UTC [57] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2023-09-18 20:47:30.985 UTC [57] HINT:  The transaction might succeed if retried.
2023-09-18 20:47:30.985 UTC [57] STATEMENT:  INSERT INTO walletdb_kv (key, value, parent_id) VALUES($1, $2, $3) ON CONFLICT (key, parent_id) WHERE parent_id IS NOT NULL DO UPDATE SET value=$2 WHERE walletdb_kv.value IS NOT NULL
2023-09-18 20:47:31.003 UTC [57] ERROR:  could not serialize access due to read/write dependencies among transactions
2023-09-18 20:47:31.003 UTC [57] DETAIL:  Reason code: Canceled on identification as a pivot, during write.
2023-09-18 20:47:31.003 UTC [57] HINT:  The transaction might succeed if retried.
2023-09-18 20:47:31.003 UTC [57] STATEMENT:  INSERT INTO walletdb_kv (key, value, parent_id) VALUES($1, $2, $3) ON CONFLICT (key, parent_id) WHERE parent_id IS NOT NULL DO UPDATE SET value=$2 WHERE walletdb_kv.value IS NOT NULL
2023-09-18 20:47:31.061 UTC [57] ERROR:  could not serialize access due to read/write dependencies among transactions
2023-09-18 20:47:31.061 UTC [57] DETAIL:  Reason code: Canceled on identification as a pivot, during conflict out checking.
2023-09-18 20:47:31.061 UTC [57] HINT:  The transaction might succeed if retried.
2023-09-18 20:47:31.061 UTC [57] STATEMENT:  SELECT id FROM walletdb_kv WHERE parent_id=107 AND key=$1 AND value IS NULL

I remember that we had to add an application level lock for the wallet DB even for etcd due to a similar issue:

lnd/lncfg/db.go

Line 314 in 0730337

CloneWithSingleWriter(),

Though I'm not sure why that isn't a problem with SQLite...

Perhaps we do need that single writer lock just for the wallet (because it was even more built with only bbolt in mind)?

In this commit, we remove the global application level lock from the postgres backend. This lock prevents multiple write transactions from happening at the same time, and will also block a writer if a read is on going. Since this lock was added, we know always open DB connections with the strongest level of concurrency control available: `LevelSerializable`. In concert with the new auto retry logic, we ensure that if db transactions conflict (writing the same key/row in this case), then the tx is retried automatically. Removing this lock should increase perf for the postgres backend, as now concurrent write transactions can proceed, being serialized as needed. Rather then trying to handle concurrency at the application level, we'll set postgres do its job, with the application only needing to retry as necessary.

Some sub-systems like btcwallet will return an error from the database, but they won't properly wrap it. As a result, we were unable to actually catch the serialization errors in the first place. To work around this, we'll now attempt to parse the error string directly.

With the new postgres concurrency control, an error may come from a bucket function that's actually a postgres error. In this case, we need to return early so we can retry the txn. Otherwise, we'll be working with an aborted tx, and never actually return the error so we don't auto retry.

In this commit, we fix a bug that would cause the entire db to shutdown if hit a panic (since db operations in the main buckets exit with a panic) while executing a txn call back. This might be a postgres error we need to check, so we don't want to bail out, and instead want to pass up the error to the caller so we can retry if needed.

At times we'll get an error from the transaction call back itself, since we may be using postgres over streaming RPC. In this case, we still need to roll back then attempt to retry.

In this commit, we fix a bug in the commit retry loop, we'll now make sure that if we get an error on commit, we'll map it to the SQL error then attempt to decide if we need to retry or not.

Roasbeef · 2023-09-20T01:26:17Z

Pushed up a series of new commits, fixing some issues:

btcwallet won't wrap db errors proeprly, so the calls to errors.As failed. We'll now just check for the string directly.
With the way the kvdb mapping works, certain calls (based on the interface) can never fail. This includes calls like Sequence, so it ends up panicking instead. When this happens, we now no longer always call Criticalf. Instead, we'll check to see what the error is, so we can pass it to the retry loop.
Some DB calls get an error, check it, then continue. We can't do that anymore as that might be a single that the txn is borked, and that we need to rollback+retry. I found one instance of this in the grpah for pruning so far and fixed it.
We'll now properly map the error to a SQL error on commit fail.
We didn't go to retry when the call back returned an error. For btcwallet and some other cases, we'll get the error on call back rather than on begin or commit. We now have a retry check here.

With this, everything starts up locally for me, and I was able to pass icase=basic_funding_flow. Checkpointing here for now to move on to some other more near term things.

guggero

Did a first pass, looks pretty good. Though something's still not working correctly with the wallet. I'm not sure if we have to keep some sort of lock just for the wallet still? Because that code was never updated to not rely on the assumption of a global per-database lock.

kvdb/sqlbase/db.go

sqldb/sqlerrors.go

kvdb/sqlbase/sqlerrors_sqlite.go

bhandras

Looks good! Great we get rid of this lock finally. Will also do some stress testing before final approval.

kvdb/sqlbase/db.go

lightninglabs-deploy · 2024-03-01T00:00:05Z

@Roasbeef, remember to re-request review from reviewers when ready

guggero · 2024-03-07T16:25:55Z

Replaced by #8529.

Roasbeef added optimization concurrency postgres labels Sep 15, 2023

Roasbeef added this to the v0.18.0 milestone Sep 15, 2023

Roasbeef force-pushed the remove-postgres-master-lock branch from c5b85fc to c4ea65c Compare September 15, 2023 23:22

Roasbeef added 7 commits September 19, 2023 18:21

temp: add replace so new code is used

e2d1def

kvdb: implement txn retry when executing txn call back

f98e8a4

At times we'll get an error from the transaction call back itself, since we may be using postgres over streaming RPC. In this case, we still need to roll back then attempt to retry.

kvdb: fix bug in commit retry loop

c88e270

In this commit, we fix a bug in the commit retry loop, we'll now make sure that if we get an error on commit, we'll map it to the SQL error then attempt to decide if we need to retry or not.

Roasbeef force-pushed the remove-postgres-master-lock branch from e4e4b86 to c88e270 Compare September 20, 2023 01:22

Roasbeef mentioned this pull request Sep 20, 2023

[bug]: lncli getinfo and LND in general, getting stuck at COMMIT when using Postgres #8009

Closed

guggero mentioned this pull request Oct 2, 2023

[bug]: postgres error: could not serialize access due to read/write dependencies among transactions (SQLSTATE 40001) #8049

Closed

saubyk modified the milestones: v0.18.0, v0.17.1 Oct 3, 2023

bhandras self-requested a review October 3, 2023 17:16

saubyk requested a review from guggero October 3, 2023 17:17

guggero reviewed Oct 4, 2023

View reviewed changes

kvdb/sqlbase/db.go Show resolved Hide resolved

sqldb/sqlerrors.go Show resolved Hide resolved

kvdb/sqlbase/sqlerrors_sqlite.go Show resolved Hide resolved

saubyk assigned Roasbeef Oct 4, 2023

guggero mentioned this pull request Oct 6, 2023

Very slow start of LND even with Postgresql backend #6646

Open

Roasbeef modified the milestones: v0.17.1, v0.18.0 Oct 26, 2023

bhandras reviewed Nov 16, 2023

View reviewed changes

kvdb/sqlbase/db.go Show resolved Hide resolved

kvdb/sqlbase/db.go Show resolved Hide resolved

saubyk linked an issue Feb 6, 2024 that may be closed by this pull request

[bug]: postgres error: could not serialize access due to read/write dependencies among transactions (SQLSTATE 40001) #8049

Closed

guggero mentioned this pull request Mar 7, 2024

kvdb/postgres: remove global application level lock #8529

Closed

guggero closed this Mar 7, 2024

saubyk removed this from the v0.18.0 milestone Mar 8, 2024

Roasbeef mentioned this pull request Apr 12, 2024

kvdb/postgres: remove global application level lock #8644

Merged

blckbx mentioned this pull request Apr 16, 2024

[NEW BONUS GUIDE + REESTRUCTURE] PostgreSQL + other related minibolt-guide/minibolt#93

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvdb/postgres: remove global application level lock #7992

kvdb/postgres: remove global application level lock #7992

Roasbeef commented Sep 15, 2023

guggero commented Sep 19, 2023

Roasbeef commented Sep 20, 2023

guggero left a comment

bhandras left a comment

lightninglabs-deploy commented Mar 1, 2024

guggero commented Mar 7, 2024

kvdb/postgres: remove global application level lock #7992

kvdb/postgres: remove global application level lock #7992

Conversation

Roasbeef commented Sep 15, 2023

guggero commented Sep 19, 2023

Roasbeef commented Sep 20, 2023

guggero left a comment

Choose a reason for hiding this comment

bhandras left a comment

Choose a reason for hiding this comment

lightninglabs-deploy commented Mar 1, 2024

guggero commented Mar 7, 2024