-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rails transaction retry after rollback #258
Comments
The Problem is, that some of the objects keep there IDs and persisted status, so that Rails thinks in the retry, |
I think the bug comes form here: This code catches the And now the (Right now its not possible for me to create a test, so I cant fix this :/ ) |
I think we have encountered the same problem as well, where a db transaction is being retried but the record memory status is not being reset, hence the record is not saved, but the after_save callback was fired. We think that this problem can be reproduced fairly reliabliy using the following steps:
ActiveRecord::Base.logger = Logger.new(STDOUT)
ActiveRecord::Base.configurations = {
"cockroachdb" => {
adapter: "cockroachdb",
host: "localhost",
port: 26256,
user: "root",
database: "crdb_rollback_test",
}
}
ActiveRecord::Base.establish_connection(:cockroachdb)
ActiveRecord::Schema.define do
create_table :customers, force: true do |t|
t.string :name
end
create_table :customer_data, force: true do |t|
t.belongs_to :customer, null: false, foreign_key: true
t.string :city
t.datetime :deleted_at
end
end
class CockroachDbApplicationRecord < ActiveRecord::Base
self.abstract_class = true
connects_to database: { writing: :cockroachdb, reading: :cockroachdb }
end
class Customer < CockroachDbApplicationRecord
end
class CustomerData < CockroachDbApplicationRecord
belongs_to :customer
after_save :mark_as_current
# we want to set all other CustomerData to have deleted_at
def mark_as_current
if deleted_at.blank?
ids = CustomerData.where(customer_id: customer_id, deleted_at: nil).where.not(id: id).pluck(:id)
puts 'mark_as_current sleeping'
sleep 5
puts 'mark_as_current sleeping end'
CustomerData.where(id: ids).update_all(deleted_at: Time.current)
end
end
end
customer = Customer.create!(name: 'Name')
CustomerData.create!(customer_id: customer.id, city: 'SF')
customer_data = CustomerData.new(customer_id: customer.id, city: 'NYC')
ActiveRecord::Base.logger = Logger.new(STDOUT)
ActiveRecord::Base.configurations = {
"cockroachdb" => {
adapter: "cockroachdb",
host: "localhost",
port: 26256,
user: "root",
database: "crdb_rollback_test",
}
}
ActiveRecord::Base.establish_connection(:cockroachdb)
class Customer < CockroachDbApplicationRecord
end
class CustomerData < CockroachDbApplicationRecord
belongs_to :customer
after_save :mark_as_current
def mark_as_current
if deleted_at.blank?
ids = CustomerData.where(customer_id: customer_id, deleted_at: nil).where.not(id: id).pluck(:id)
CustomerData.where(id: ids).update_all(deleted_at: Time.current)
end
end
end
customer = Customer.first
customer_data = CustomerData.new(customer_id: customer.id, city: 'NYC')
customer_data.save!
customer_data.save! In the 2nd console, you should see the Transaction being committed without any problem CustomerData.where(customer_id: customer.id, deleted_at: nil) in either console, it will return an empty array, which is the wrong behavior. @BuonOmo Can you help look into it with the above steps to reproduce? Thanks |
@shfung I reproduce the behaviour you are saying (empty array). I'll make it simpler and in a single script to execute in tests. However, it feels like the behaviour is working quite normally. You are not taking advantage of any transaction, I understand how this could fail using any kind of database. So two things:
Async Designrun an def perform(customer_id, cd_id) # should be wrapped in a transaction
Locker.lock("c#{customer_id}-cd#{cd_id}) do # should raise to re-enqueue job if locked
ids = CustomerData.where(customer_id: customer_id, deleted_at: nil).where.not(id: cd_id).pluck(:id)
CustomerData.where(id: ids).update_all(deleted_at: Time.current)
CustomerData.where(id: cd_id).update_all(deleted_at: nil)
end
end Note that I don't fully understand your use case, as I can see that you can only delete data. And my script is kind of un-deleting. Another solution could be something like: cd = CustomerData.find(cd_id)
# Only delete older ones.
CustomerData.where(customer_id: customer_id).where("created_at < ?", cd.created_at).update_all(deleted_at: Time.current) Or just not have this column and use the latest customer data as the valid one... Simple designUse an update lock around the customer (depending on your usual queries, this can lead to overhead): def mark_as_current
if deleted_at.blank?
Customer.find(customer_id).with_lock do
ids = CustomerData.where(customer_id: customer_id, deleted_at: nil).where.not(id: id).pluck(:id)
CustomerData.where(id: ids).update_all(deleted_at: Time.current)
end
end
end Note that the first data created will get the priority in that case. And your could consider a @Piioo I did not see your message, I think it is quite important, and related to the bug I am currently trying to solve in #333. Thank you for the links, I'll have a look at this while finishing my PR. And we'll see if this fixes everything. Basically, the client retry logic is messing with rails internal logic |
@BuonOmo Thanks for the suggestions. We notice some places for improvement in our own implementation that we made changes to, but we still believe that there is a underlying retry issue caused by the gem. Can you elaborate more on what you meant by The Looking into the code, it seems that the db transaction errored out when it's being committed here in the
exception that is caught by the
This exception got raised to the patch by the this gem, and triggered the retry: Lines 25 to 32 in 6a2a593
Unfortunately, because the rollback failed, ActiveRecord actually didn't rollback the status of the db record in memory. So, when the db transaction is being retried, ActiveRecord thinks that the CustomerData was saved already and didn't insert it into the db, but it still execute the after save callback to delete the other CustomerData rows. |
That's for sure, that is the main thing I'm investigating on these days! But I am not sure resolving this issue will fully solve your design problems... These are tricky race condition issues that you'd need to test precisely !
Well if there is a wrapping transaction my comment is useless. The reproduction code you send wasn't using any, hence my comment. Maybe your codebase does wrap every http call in a transaction (default behaviour for rails if I remember correctly). |
We also encountered this bug while testing our application under load. It occurred when we made several concurrent requests to the same Create API (with different values); all the requests returned a success response with an ID for the new record, but only one or two records would actually be committed to the database. After a great deal of debugging, we came to exactly the same conclusion as @shfung did in their second comment: when Rails attempts to rollback the transaction, a new exception is raised and the state of the in-memory record is not reset. According to our testing, this can affect any model with a transactional callback that performs a query on the same table. TLDR: When a commit fails due a Clarifications:
Sequence of events, using the
Reproduction Script @mmalek-sa and I found that this bug can be trivially reproduced using a model validation and concurrent processes, without any need to explicitly # frozen_string_literal: true
Gem.paths = { "GEM_HOME" => Gem.user_dir }
require 'bundler/inline'
gemfile(true) do
source 'https://rubygems.org'
gem 'activerecord', "7.1.4"
gem 'activerecord-cockroachdb-adapter', "~> 7.1.X"
gem 'securerandom'
end
require 'active_record'
require 'activerecord-cockroachdb-adapter'
require 'minitest/autorun'
require 'logger'
require 'securerandom'
# You might need to change the database name or connection string.
ActiveRecord::Base.establish_connection('cockroachdb://root@localhost:26255/testdb?sslmode=disable')
ActiveRecord::Base.logger = Logger.new(STDOUT)
ActiveRecord::Schema.define do
create_table :customers, force: true do |t|
t.string :name
t.string :username
end
end
class Customer < ActiveRecord::Base
validate :validate_unique_username
def validate_unique_username
duplicate = Customer.where(username: username).any?
errors.add("Duplicate username!") if duplicate
end
def to_s
"#{id}:#{name}:#{username}"
end
end
# This monkey patch fixes the bug.
# ActiveRecord::ConnectionAdapters::TransactionManager.class_eval do
# def rollback_transaction(transaction = nil)
# @connection.lock.synchronize do
# transaction ||= @stack.last
# begin
# transaction.rollback
# rescue ActiveRecord::StatementInvalid => err
# # This is important to make ActiveRecord aware the record was not inserted/saved
# # Otherwise AR will assume save was successful and it doesn't retry the transaction
# # See this for more details:
# # https://github.com/cockroachdb/activerecord-cockroachdb-adapter/issues/258#issuecomment-2256633329
# if err.cause.is_a?(PG::NoActiveSqlTransaction)
# puts "rollback_transaction exception PG::NoActiveSqlTransaction, rolling back records #{transaction.records}"
# transaction.rollback_records
# end
#
# raise
# ensure
# @stack.pop if @stack.last == transaction
# end
# transaction.rollback_records
# end
# end
# end
class BugTest < ActiveSupport::TestCase
def test_concurrent_customer_insert_with_processes
concurrency_level = 3
time_to_start = 2.seconds.from_now
pids = []
concurrency_level.times do |i|
pids << Process.fork do
sleep_time = (time_to_start - Time.current).to_f
sleep(sleep_time) if sleep_time > 0
begin
c = Customer.new
c.name = "Customer #{i + 1}"
c.username = SecureRandom.uuid
c.save!
puts "Inserted customer with id #{c.id} successfully!"
rescue => e
puts "Process #{Process.pid} - Error: #{e.message}"
end
end
end
pids.each { |pid| Process.wait(pid) }
puts "Customers #{Customer.all.map { |c| c.to_s }}"
assert_equal concurrency_level, Customer.count
end
end Proposed solution
ActiveRecord::ConnectionAdapters::TransactionManager.class_eval do
def rollback_transaction(transaction = nil)
@connection.lock.synchronize do
transaction ||= @stack.last
begin
transaction.rollback
rescue ActiveRecord::StatementInvalid => err
# This is important to make Active Record aware the record was not inserted/saved
# Otherwise Active Record will assume save was successful and it doesn't retry the transaction
# See this thread for more details:
# https://github.com/cockroachdb/activerecord-cockroachdb-adapter/issues/258#issuecomment-2256633329
transaction.rollback_records if err.cause.is_a?(PG::NoActiveSqlTransaction)
raise
ensure
@stack.pop if @stack.last == transaction
end
transaction.rollback_records
end
end
end |
@arolling-sa thank you for the detailed response. Unfortunately, your reproduction is giving me even worse locally: a segv in the pg gem. I'll be on both issues fast. I believe the segv could be only related to my laptop by a mac m1. I'll try to do both things in parallel:
|
@BuonOmo That's odd, I was testing this on a MacBook M1 as well (I believe @mmalek-sa uses a Macbook M2) and neither of us saw a segv; it could be something specific to your environment. Also, we were using version 7.1.1 of the |
@arolling-sa I'm using EDIT: found the segv culprit ged/ruby-pg#311 (comment), local solution is |
I implemented the test and solution. But I want to check one last thing before we push. This seems to only fail in crdb 24.1 and I want to understand why. Can you confirm you are using this version @arolling-sa ? Ok it might not be that it fails only in 24.1 but that the test does not work aaall the time. I'll figure out a way to make it more robust. And in 23.2 there's no issue |
@BuonOmo - I was using 23.1.25 when I reproduced it, actually, and I believe my colleague tested against multiple CRDB versions. And the initial report that started this thread was from December 2022 - they don't mention the version, but according to the release page the latest possible version they could have been using was 22.2.0. As this is essentially a race condition, it may be affected by local resources and configuration. Have you tried increasing the concurrency level to make it more likely to fail? |
Hallo,
I have not much informations, but maybe you see something I miss.
We have a transaction where we save 7 objects in different tables.
On our staging/production system sometimes the transaction commit breaks with an error.
The transaction is rollbacked and rails try to retry it.
Our problem is, that 2nd (retried) commit saves only 3 of 7 objects in the DB without any error.
And here the errors we get from the Database:
After this rails try to retry begins a new transaction (last log message) and saves only token and role. There are no insert queries for admin or company.
Maybe the others has already an id (because token and role needs the id from the admin) and for this reason they are not saved.
Or company and admin are in cache and token and role not.
Token and Role may be only created when the admin.id is present.
Im trying to write a rspec test, but fail to trigger a retry transaction.
Have you a clue what we may do?
The text was updated successfully, but these errors were encountered: