-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support different lock retry configurations for different schema_statements? #67
Comments
I didn't get what the problem is. We use the default lock retrier and Lock retrier uses small lock timeouts with many attempts, so this won't block the database for a long time. Both adding an index and a foreign key are made idempotent in this gem, so if you use Also, you can configure different lock retriers by assigning to |
Our problem is a lock retrier with a short timeout is having troubles creating an index concurrently due to it having to wait for any existing transaction id, possibly minutes. That combined with the need for short timeouts for strong locking operations like adding a FK leaves our lock retrier implementation in a middle ground where we hit Currently we have a constant lock retrier configured like, it's a targetted to avoiding blocking queries with strong locks while trying to allow time for index builds. config.lock_retrier = OnlineMigrations::ConstantLockRetrier.new(
attempts: 9,
delay: 10.seconds,
lock_timeout: 5.seconds
) When adding an index with this config and there are some long running queries we're hitting The index builds fine with a retrier configuration like the following as it allows the lock to wait for any existing transaction that could modify or use the index to terminate. config.lock_retrier = OnlineMigrations::ConstantLockRetrier.new(
attempts: 4,
delay: 30.seconds,
lock_timeout: 3.minutes
) However with that configuration we have problematic blocking operations when adding foreign keys on busy tables as the queries can pile up behind the strong locks that schema statement uses. Ideally when adding FKs with strong locks we'd use a retrier configuration like config.lock_retrier = OnlineMigrations::ConstantLockRetrier.new(
attempts: 60,
delay: 5.seconds,
lock_timeout: 0.5.seconds
) So we have two distinct retrier configurations depending on the type of schema statement the migration is running. Ideally having an internal mechanism in this gem that detects the type of schema statement and applies a tailored lock retier mechanism based where it can, as developers can get it wrong and cause downtime selecting the wrong retrier configuration. Hopefully this helps clarify the problem we're experiencing with the current system. I'd be happy to work on this implementation myself but I couldn't see a clean pathway to doing so with the current gem api. |
Thanks for the explanation. The easiest solution I can currently think of is extending the implementation of lock retrier in the gem to also accept the current command and its arguments and allow people to do some logic depending on it. Something like: class MyLockRetrier < LockRetrier
def lock_timeout(_attempt, command: nil, arguments: nil)
if command == :add_index
30
else
5
end
end
# other methods
end Will that work? |
Yes that would work! Having the Ideally the same API surface exists for the Sadly, I couldn't quite see how to inject the command to the with_lock_retries method to do it myself. |
Should be done somewhere here online_migrations/lib/online_migrations/migration.rb Lines 19 to 31 in 02a0445
|
Some schema statements have different lock requirements and trying to create a generic lock retry mechanism that fits different needs can lead to timeouts and retry exhaustion which can then fail the migration process.
For instance when attempting to add_reference_concurrently(foreign_key: true) we add an index concurrently and then add a foreign key.
Adding an index concurrently results in a ShareUpdateExclusiveLock which allow reads on the table. However that process must wait for all existing transactions to finish before it can successfully build the index.
On a busy database with a longer statement timeout that index creation may fail with a shorter lock_timeout, depending on how the query patterns are on the database at any given moment (noting that ratcheting down the statement_timeout is ideal but currently unfeasible).
Then adding the FK constraint uses AccessExclusiveLock which blocks any query behind it so we really want a very short lock_timeout for this statement but with lots of retries :(
Our current problem is supporting these types of schema statements with the existing lock retry mechanism https://github.com/fatkodima/online_migrations/blob/master/docs/configuring.md#lock-timeout-retries for a migration run (on deploy).
Ideally we don't have to teach all our developers about the intricacies of database locks and how to split migrations up with overridden lock mechanics in migrations, they are better focussed on domain modelling and not deploying these migrations to a busy database.
Is there any way we can modify the
LockRetrier
implementation to support these conflicting use cases?Perhaps we can add different lock retries on the config and dynamically select them via the schema statement, e.g. add_index?
Or allow the lock_retrier to reflect on the migration and auto select the lock retrier implementation to use?
The text was updated successfully, but these errors were encountered: