Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release RowExclusiveLock on pg_dist_transaction as soon as remote xacts are recovered #7863

Merged
merged 5 commits into from
Jan 31, 2025

Conversation

m3hm3t
Copy link
Contributor

@m3hm3t m3hm3t commented Jan 30, 2025

DESCRIPTION: Fixes a bug that might cause a deadlock when upgrading Citus.

As of this PR, after recovering the remote transactions, now we release the lock
on pg_dist_transaction while closing it to avoid deadlocks that might occur because
of trying to acquire a lock on pg_dist_authinfo while holding a lock on
pg_dist_transaction. Such a scenario can only cause a deadlock if another transaction
is trying to acquire a strong lock on pg_dist_transaction while holding a lock on
pg_dist_authinfo. As of today, we (implicitly) acquire a strong lock on
pg_dist_transaction only when upgrading Citus to 11.3-1 and this happens when creating
a REPLICA IDENTITY on pg_dist_transaction.

And regardless of the code-path we are in, it should be okay to release the lock there
because all we do after that point is to abort the prepared transactions that are not
part of an in-progress distributed transaction and releasing the lock before doing so
should be just fine.

This also changes the blocking behavior between citus_create_restore_point and the
transaction recovery code-path in the sense that now citus_create_restore_point doesn't
until transaction recovery completes aborting the prepared transactions that are not
part of an in-progress distributed transaction. However, this should be fine because
even before this was possible, e.g., if transaction recovery fails to open a remote
connection to a node.

Also fix a flaky citus upgrade test.

Finally, upgrade download-artifacts action to 4.1.8 and upload-artifacts action to 4.6.0
since v3.x.y is no longer supported for both actions. And while doing that, had to do
some changes to distinguish the artifacts we produce in CI jobs as documented in
actions/upload-artifact#480.


After merging this PR;

  • Need to backport these two commits to 11.3, 12.0 & 12.1, i.e., where we want to fix the deadlock issue that happens during Citus upgrades, unless we decide reverting this and having a more appropriate fix, e.g., by stopping the maintenance daemon during Citus upgrades:
  • Need to backport these three commits to any active release branch to get the CI working again:

@m3hm3t
Copy link
Contributor Author

m3hm3t commented Jan 30, 2025

pipenv run make check-citus-upgrade-local citus-old-version=v**11.1.1**

-------------------
pgenv current
**15.10**

diff -dU10 -w /workspaces/citus/src/test/regress/expected/upgrade_citus_finish_citus_upgrade_0.out /workspaces/citus/src/test/regress/results/upgrade_citus_finish_citus_upgrade.out
--- /workspaces/citus/src/test/regress/expected/upgrade_citus_finish_citus_upgrade_0.out.modified	2025-01-30 10:01:25.622350900 +0000
+++ /workspaces/citus/src/test/regress/results/upgrade_citus_finish_citus_upgrade.out.modified	2025-01-30 10:01:25.632351501 +0000
@@ -6,25 +6,25 @@
 SELECT substring(:'upgrade_test_old_citus_version', 'v(\d+)\.\d+\.\d+')::int < 11
 AS upgrade_test_old_citus_version_lt_11_0;
  upgrade_test_old_citus_version_lt_11_0 
 ----------------------------------------
  f
 (1 row)
 
 -- this is a transactional procedure, so rollback should be fine
 BEGIN;
 	CALL citus_finish_citus_upgrade();
-NOTICE:  already at the latest distributed schema version (11.3-1)
+NOTICE:  already at the latest distributed schema version (11.1-1)
 ROLLBACK;
 -- do the actual job
 CALL citus_finish_citus_upgrade();
-NOTICE:  already at the latest distributed schema version (11.3-1)
+NOTICE:  already at the latest distributed schema version (11.1-1)
 -- show that the upgrade is successfull
 SELECT metadata->>'last_upgrade_version' = extversion
 FROM pg_dist_node_metadata, pg_extension WHERE extname = 'citus';
  ?column? 
 ----------
  f
 (1 row)
 
 -- idempotent, should be called multiple times
 -- still, do not NOTICE the version as it changes per release
diff -dU10 -w /workspaces/citus/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out /workspaces/citus/src/test/regress/results/upgrade_pg_dist_cleanup_after.out
--- /workspaces/citus/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out.modified	2025-01-30 10:01:26.042376102 +0000
+++ /workspaces/citus/src/test/regress/results/upgrade_pg_dist_cleanup_after.out.modified	2025-01-30 10:01:26.052376702 +0000
@@ -15,16 +15,14 @@
 -- verify that the orphaned placement is deleted and cleanup record is created
 SELECT COUNT(*) FROM pg_dist_placement WHERE shardid IN (SELECT shardid FROM pg_dist_shard WHERE logicalrelid='table_with_orphaned_shards'::regclass);
  count 
 -------
     32
 (1 row)
 
 SELECT * FROM pg_dist_cleanup;
  record_id | operation_id | object_type | object_name | node_group_id | policy_type 
 -----------+--------------+-------------+-------------+---------------+-------------
-         1 |            0 |           1 | table_with_orphaned_shards_980001 |             1 |           0
-(1 row)
+(0 rows)
 
 CALL citus_cleanup_orphaned_resources();
-NOTICE:  cleaned up 1 orphaned resources
 DROP TABLE table_with_orphaned_shards;

-------------------------------------

pgenv current
**15.0**
also in **15.5**

diff -dU10 -w /workspaces/citus/src/test/regress/expected/upgrade_citus_finish_citus_upgrade_0.out /workspaces/citus/src/test/regress/results/upgrade_citus_finish_citus_upgrade.out
--- /workspaces/citus/src/test/regress/expected/upgrade_citus_finish_citus_upgrade_0.out.modified	2025-01-30 10:07:22.412642606 +0000
+++ /workspaces/citus/src/test/regress/results/upgrade_citus_finish_citus_upgrade.out.modified	2025-01-30 10:07:22.422643193 +0000
@@ -6,31 +6,31 @@
 SELECT substring(:'upgrade_test_old_citus_version', 'v(\d+)\.\d+\.\d+')::int < 11
 AS upgrade_test_old_citus_version_lt_11_0;
  upgrade_test_old_citus_version_lt_11_0 
 ----------------------------------------
  f
 (1 row)
 
 -- this is a transactional procedure, so rollback should be fine
 BEGIN;
 	CALL citus_finish_citus_upgrade();
-NOTICE:  already at the latest distributed schema version (11.3-1)
+NOTICE:  already at the latest distributed schema version (13.0-1)
 ROLLBACK;
 -- do the actual job
 CALL citus_finish_citus_upgrade();
-NOTICE:  already at the latest distributed schema version (11.3-1)
+NOTICE:  already at the latest distributed schema version (13.0-1)
 -- show that the upgrade is successfull
 SELECT metadata->>'last_upgrade_version' = extversion
 FROM pg_dist_node_metadata, pg_extension WHERE extname = 'citus';
  ?column? 
 ----------
- f
+ t
 (1 row)
 
 -- idempotent, should be called multiple times
 -- still, do not NOTICE the version as it changes per release
 SET client_min_messages TO WARNING;
 CALL citus_finish_citus_upgrade();
 -- we should be able to sync metadata in nontransactional way as well
 SET citus.metadata_sync_mode TO 'nontransactional';
 SELECT start_metadata_sync_to_all_nodes();
  start_metadata_sync_to_all_nodes 
diff -dU10 -w /workspaces/citus/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out /workspaces/citus/src/test/regress/results/upgrade_pg_dist_cleanup_after.out
--- /workspaces/citus/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out.modified	2025-01-30 10:07:22.862668933 +0000
+++ /workspaces/citus/src/test/regress/results/upgrade_pg_dist_cleanup_after.out.modified	2025-01-30 10:07:22.872669517 +0000
@@ -9,22 +9,20 @@
 (1 row)
 
 \gset
 \if :upgrade_test_old_citus_version_gte_11_2
 \q
 \endif
 -- verify that the orphaned placement is deleted and cleanup record is created
 SELECT COUNT(*) FROM pg_dist_placement WHERE shardid IN (SELECT shardid FROM pg_dist_shard WHERE logicalrelid='table_with_orphaned_shards'::regclass);
  count 
 -------
-    32
+    33
 (1 row)
 
 SELECT * FROM pg_dist_cleanup;
  record_id | operation_id | object_type | object_name | node_group_id | policy_type 
 -----------+--------------+-------------+-------------+---------------+-------------
-         1 |            0 |           1 | table_with_orphaned_shards_980001 |             1 |           0
-(1 row)
+(0 rows)
 
 CALL citus_cleanup_orphaned_resources();
-NOTICE:  cleaned up 1 orphaned resources
 DROP TABLE table_with_orphaned_shards;




@onurctirtir onurctirtir changed the title Release RowExclusiveLock on pg_dist_transaction as soon as remote xac… Release RowExclusiveLock on pg_dist_transaction as soon as remote xacts are recovered Jan 30, 2025
@onurctirtir onurctirtir marked this pull request as ready for review January 30, 2025 11:16
@onurctirtir
Copy link
Member

Given the presence of upgrade script on versions >= 11.3, this needs to be backported at least to versions 11.3, 12.0 and 12.1.

@colm-mchugh
Copy link
Contributor

Running in my local environment with PG 15.7 make check-citus-upgrade-local citus-old-version=v11.1.5 shows diffs:

test upgrade_citus_finish_citus_upgrade ... FAILED    
test upgrade_pg_dist_cleanup_after ... FAILED      
test upgrade_basic_after          ... ok       
test upgrade_post_11_after        ... ok           
diff -dU10 -w /workspaces/citus/src/test/regress/expected/upgrade_citus_finish_citus_upgrade_0.out 
/workspaces/citus/src/test/regress/results/upgrade_citus_finish_citus_upgrade.out
--- /workspaces/citus/src/test/regress/expected/upgrade_citus_finish_citus_upgrade_0.out.modified 2025-01-30 11:32:08.522652721 +0000
+++ /workspaces/citus/src/test/regress/results/upgrade_citus_finish_citus_upgrade.out.modified  202
5-01-30 11:32:08.532752002 +0000
@@ -6,25 +6,25 @@
 SELECT substring(:'upgrade_test_old_citus_version', 'v(\d+)\.\d+\.\d+')::int < 11
 AS upgrade_test_old_citus_version_lt_11_0;
  upgrade_test_old_citus_version_lt_11_0 
 ----------------------------------------
  f
 (1 row)
 
 -- this is a transactional procedure, so rollback should be fine
 BEGIN;
        CALL citus_finish_citus_upgrade();
-NOTICE:  already at the latest distributed schema version (11.3-1)
+NOTICE:  already at the latest distributed schema version (11.1-1)
 ROLLBACK;
 -- do the actual job
 CALL citus_finish_citus_upgrade();
-NOTICE:  already at the latest distributed schema version (11.3-1)
+NOTICE:  already at the latest distributed schema version (11.1-1)
 -- show that the upgrade is successfull
 SELECT metadata->>'last_upgrade_version' = extversion
 FROM pg_dist_node_metadata, pg_extension WHERE extname = 'citus';
  ?column? 
 ----------
  f
 (1 row)
 
 -- idempotent, should be called multiple times
 -- still, do not NOTICE the version as it changes per release
diff -dU10 -w /workspaces/citus/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out /work
spaces/citus/src/test/regress/results/upgrade_pg_dist_cleanup_after.out
--- /workspaces/citus/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out.modified    202
5-01-30 11:32:08.916524662 +0000
+++ /workspaces/citus/src/test/regress/results/upgrade_pg_dist_cleanup_after.out.modified       202
5-01-30 11:32:08.926623944 +0000
@@ -15,16 +15,14 @@
 -- verify that the orphaned placement is deleted and cleanup record is created
 SELECT COUNT(*) FROM pg_dist_placement WHERE shardid IN (SELECT shardid FROM pg_dist_shard WHERE l
ogicalrelid='table_with_orphaned_shards'::regclass);
  count 
 -------
     32
 (1 row)
 
 SELECT * FROM pg_dist_cleanup;
  record_id | operation_id | object_type | object_name | node_group_id | policy_type 
 -----------+--------------+-------------+-------------+---------------+-------------
-         1 |            0 |           1 | table_with_orphaned_shards_980001 |             1 |     
      0
-(1 row)
+(0 rows)
 
 CALL citus_cleanup_orphaned_resources();
-NOTICE:  cleaned up 1 orphaned resources
 DROP TABLE table_with_orphaned_shards;

With PG15.6 and PG15.7 the same test (make check-citus-upgrade-local citus-old-version=v11.1.5) has diffs:

test upgrade_citus_finish_citus_upgrade ... FAILED    
test upgrade_pg_dist_cleanup_after ... FAILED      
test upgrade_basic_after          ... ok       
test upgrade_post_11_after        ... ok           
diff -dU10 -w /workspaces/citus/src/test/regress/expected/upgrade_citus_finish_citus_upgrade_0.out 
/workspaces/citus/src/test/regress/results/upgrade_citus_finish_citus_upgrade.out
--- /workspaces/citus/src/test/regress/expected/upgrade_citus_finish_citus_upgrade_0.out.modified 2025-01-30 11:35:06.978462496 +0000
+++ /workspaces/citus/src/test/regress/results/upgrade_citus_finish_citus_upgrade.out.modified  202
5-01-30 11:35:06.998801716 +0000
@@ -6,31 +6,31 @@
 SELECT substring(:'upgrade_test_old_citus_version', 'v(\d+)\.\d+\.\d+')::int < 11
 AS upgrade_test_old_citus_version_lt_11_0;
  upgrade_test_old_citus_version_lt_11_0 
 ----------------------------------------
  f
 (1 row)
 
 -- this is a transactional procedure, so rollback should be fine
 BEGIN;
        CALL citus_finish_citus_upgrade();
-NOTICE:  already at the latest distributed schema version (11.3-1)
+NOTICE:  already at the latest distributed schema version (13.0-1)
 ROLLBACK;
 -- do the actual job
 CALL citus_finish_citus_upgrade();
-NOTICE:  already at the latest distributed schema version (11.3-1)
+NOTICE:  already at the latest distributed schema version (13.0-1)
 -- show that the upgrade is successfull
 SELECT metadata->>'last_upgrade_version' = extversion
 FROM pg_dist_node_metadata, pg_extension WHERE extname = 'citus';
  ?column? 
 ----------
- f
+ t
 (1 row)
 
 -- idempotent, should be called multiple times
 -- still, do not NOTICE the version as it changes per release
 SET client_min_messages TO WARNING;
 CALL citus_finish_citus_upgrade();
 -- we should be able to sync metadata in nontransactional way as well
 SET citus.metadata_sync_mode TO 'nontransactional';
 SELECT start_metadata_sync_to_all_nodes();
  start_metadata_sync_to_all_nodes 
diff -dU10 -w /workspaces/citus/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out /work
spaces/citus/src/test/regress/results/upgrade_pg_dist_cleanup_after.out
--- /workspaces/citus/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out.modified    202
5-01-30 11:35:07.627223000 +0000
+++ /workspaces/citus/src/test/regress/results/upgrade_pg_dist_cleanup_after.out.modified       202
5-01-30 11:35:07.627223000 +0000
@@ -9,22 +9,20 @@
 (1 row)
 
 \gset
 \if :upgrade_test_old_citus_version_gte_11_2
 \q
 \endif
 -- verify that the orphaned placement is deleted and cleanup record is created
 SELECT COUNT(*) FROM pg_dist_placement WHERE shardid IN (SELECT shardid FROM pg_dist_shard WHERE l
ogicalrelid='table_with_orphaned_shards'::regclass);
  count 
 -------
-    32
+    33
 (1 row)
 
 SELECT * FROM pg_dist_cleanup;
  record_id | operation_id | object_type | object_name | node_group_id | policy_type 
 -----------+--------------+-------------+-------------+---------------+-------------
-         1 |            0 |           1 | table_with_orphaned_shards_980001 |             1 |     
      0
-(1 row)
+(0 rows)
 
 CALL citus_cleanup_orphaned_resources();
-NOTICE:  cleaned up 1 orphaned resources
 DROP TABLE table_with_orphaned_shards;

Similar results when running make check-citus-upgrade-local citus-old-version=v11.1.0. No deadlocks in any tests.

@onurctirtir
Copy link
Member

Running in my local environment with PG 15.7 make check-citus-upgrade-local citus-old-version=v11.1.5 shows diffs:

test upgrade_citus_finish_citus_upgrade ... FAILED    
test upgrade_pg_dist_cleanup_after ... FAILED      
test upgrade_basic_after          ... ok       
test upgrade_post_11_after        ... ok           
diff -dU10 -w /workspaces/citus/src/test/regress/expected/upgrade_citus_finish_citus_upgrade_0.out 
/workspaces/citus/src/test/regress/results/upgrade_citus_finish_citus_upgrade.out
--- /workspaces/citus/src/test/regress/expected/upgrade_citus_finish_citus_upgrade_0.out.modified 2025-01-30 11:32:08.522652721 +0000
+++ /workspaces/citus/src/test/regress/results/upgrade_citus_finish_citus_upgrade.out.modified  202
5-01-30 11:32:08.532752002 +0000
@@ -6,25 +6,25 @@
 SELECT substring(:'upgrade_test_old_citus_version', 'v(\d+)\.\d+\.\d+')::int < 11
 AS upgrade_test_old_citus_version_lt_11_0;
  upgrade_test_old_citus_version_lt_11_0 
 ----------------------------------------
  f
 (1 row)
 
 -- this is a transactional procedure, so rollback should be fine
 BEGIN;
        CALL citus_finish_citus_upgrade();
-NOTICE:  already at the latest distributed schema version (11.3-1)
+NOTICE:  already at the latest distributed schema version (11.1-1)
 ROLLBACK;
 -- do the actual job
 CALL citus_finish_citus_upgrade();
-NOTICE:  already at the latest distributed schema version (11.3-1)
+NOTICE:  already at the latest distributed schema version (11.1-1)
 -- show that the upgrade is successfull
 SELECT metadata->>'last_upgrade_version' = extversion
 FROM pg_dist_node_metadata, pg_extension WHERE extname = 'citus';
  ?column? 
 ----------
  f
 (1 row)
 
 -- idempotent, should be called multiple times
 -- still, do not NOTICE the version as it changes per release
diff -dU10 -w /workspaces/citus/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out /work
spaces/citus/src/test/regress/results/upgrade_pg_dist_cleanup_after.out
--- /workspaces/citus/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out.modified    202
5-01-30 11:32:08.916524662 +0000
+++ /workspaces/citus/src/test/regress/results/upgrade_pg_dist_cleanup_after.out.modified       202
5-01-30 11:32:08.926623944 +0000
@@ -15,16 +15,14 @@
 -- verify that the orphaned placement is deleted and cleanup record is created
 SELECT COUNT(*) FROM pg_dist_placement WHERE shardid IN (SELECT shardid FROM pg_dist_shard WHERE l
ogicalrelid='table_with_orphaned_shards'::regclass);
  count 
 -------
     32
 (1 row)
 
 SELECT * FROM pg_dist_cleanup;
  record_id | operation_id | object_type | object_name | node_group_id | policy_type 
 -----------+--------------+-------------+-------------+---------------+-------------
-         1 |            0 |           1 | table_with_orphaned_shards_980001 |             1 |     
      0
-(1 row)
+(0 rows)
 
 CALL citus_cleanup_orphaned_resources();
-NOTICE:  cleaned up 1 orphaned resources
 DROP TABLE table_with_orphaned_shards;

With PG15.6 and PG15.7 the same test (make check-citus-upgrade-local citus-old-version=v11.1.5) has diffs:

test upgrade_citus_finish_citus_upgrade ... FAILED    
test upgrade_pg_dist_cleanup_after ... FAILED      
test upgrade_basic_after          ... ok       
test upgrade_post_11_after        ... ok           
diff -dU10 -w /workspaces/citus/src/test/regress/expected/upgrade_citus_finish_citus_upgrade_0.out 
/workspaces/citus/src/test/regress/results/upgrade_citus_finish_citus_upgrade.out
--- /workspaces/citus/src/test/regress/expected/upgrade_citus_finish_citus_upgrade_0.out.modified 2025-01-30 11:35:06.978462496 +0000
+++ /workspaces/citus/src/test/regress/results/upgrade_citus_finish_citus_upgrade.out.modified  202
5-01-30 11:35:06.998801716 +0000
@@ -6,31 +6,31 @@
 SELECT substring(:'upgrade_test_old_citus_version', 'v(\d+)\.\d+\.\d+')::int < 11
 AS upgrade_test_old_citus_version_lt_11_0;
  upgrade_test_old_citus_version_lt_11_0 
 ----------------------------------------
  f
 (1 row)
 
 -- this is a transactional procedure, so rollback should be fine
 BEGIN;
        CALL citus_finish_citus_upgrade();
-NOTICE:  already at the latest distributed schema version (11.3-1)
+NOTICE:  already at the latest distributed schema version (13.0-1)
 ROLLBACK;
 -- do the actual job
 CALL citus_finish_citus_upgrade();
-NOTICE:  already at the latest distributed schema version (11.3-1)
+NOTICE:  already at the latest distributed schema version (13.0-1)
 -- show that the upgrade is successfull
 SELECT metadata->>'last_upgrade_version' = extversion
 FROM pg_dist_node_metadata, pg_extension WHERE extname = 'citus';
  ?column? 
 ----------
- f
+ t
 (1 row)
 
 -- idempotent, should be called multiple times
 -- still, do not NOTICE the version as it changes per release
 SET client_min_messages TO WARNING;
 CALL citus_finish_citus_upgrade();
 -- we should be able to sync metadata in nontransactional way as well
 SET citus.metadata_sync_mode TO 'nontransactional';
 SELECT start_metadata_sync_to_all_nodes();
  start_metadata_sync_to_all_nodes 
diff -dU10 -w /workspaces/citus/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out /work
spaces/citus/src/test/regress/results/upgrade_pg_dist_cleanup_after.out
--- /workspaces/citus/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out.modified    202
5-01-30 11:35:07.627223000 +0000
+++ /workspaces/citus/src/test/regress/results/upgrade_pg_dist_cleanup_after.out.modified       202
5-01-30 11:35:07.627223000 +0000
@@ -9,22 +9,20 @@
 (1 row)
 
 \gset
 \if :upgrade_test_old_citus_version_gte_11_2
 \q
 \endif
 -- verify that the orphaned placement is deleted and cleanup record is created
 SELECT COUNT(*) FROM pg_dist_placement WHERE shardid IN (SELECT shardid FROM pg_dist_shard WHERE l
ogicalrelid='table_with_orphaned_shards'::regclass);
  count 
 -------
-    32
+    33
 (1 row)
 
 SELECT * FROM pg_dist_cleanup;
  record_id | operation_id | object_type | object_name | node_group_id | policy_type 
 -----------+--------------+-------------+-------------+---------------+-------------
-         1 |            0 |           1 | table_with_orphaned_shards_980001 |             1 |     
      0
-(1 row)
+(0 rows)
 
 CALL citus_cleanup_orphaned_resources();
-NOTICE:  cleaned up 1 orphaned resources
 DROP TABLE table_with_orphaned_shards;

Similar results when running make check-citus-upgrade-local citus-old-version=v11.1.0. No deadlocks in any tests.

Just fixed one of those tests in the latest commit.

@onurctirtir
Copy link
Member

onurctirtir commented Jan 30, 2025

It's quite strange that we have this diff:

(Note that we should consider / compare with upgrade_pg_dist_cleanup_after_0.out when investigating this test failure, not upgrade_pg_dist_cleanup_after.out one)

--- a/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out
+++ b/src/test/regress/expected/upgrade_pg_dist_cleanup_after_0.out
@@ -20,11 +20,9 @@ SELECT COUNT(*) FROM pg_dist_placement WHERE shardid IN (SELECT shardid FROM pg_
 (1 row)
 
 SELECT * FROM pg_dist_cleanup;
- record_id | operation_id | object_type |            object_name            | node_group_id | policy_type
+ record_id | operation_id | object_type | object_name | node_group_id | policy_type
 ---------------------------------------------------------------------
-         1 |            0 |           1 | table_with_orphaned_shards_980001 |             1 |           0
-(1 row)
+(0 rows)
 
 CALL citus_cleanup_orphaned_resources();
-NOTICE:  cleaned up 1 orphaned resources
 DROP TABLE table_with_orphaned_shards;
(regress) 22.04: $ 

We should already be registering the orphaned shards into pg_dist_cleanup in citus--11.1-1--11.2-1.sql:

-- drop orphaned shards after inserting records for them into pg_dist_cleanup
INSERT INTO pg_dist_cleanup
    SELECT nextval('pg_dist_cleanup_recordid_seq'), 0, 1, shard_name(sh.logicalrelid, sh.shardid) AS object_name, plc.groupid AS node_group_id, 0
        FROM pg_dist_placement plc
        JOIN pg_dist_shard sh ON sh.shardid = plc.shardid
        WHERE plc.shardstate = 4;

@onurctirtir onurctirtir self-requested a review January 30, 2025 12:58
@onurctirtir onurctirtir enabled auto-merge (squash) January 30, 2025 12:59
Copy link
Member

@onurctirtir onurctirtir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taking my approval back until we figure out what to do with isolation test failures

@onurctirtir onurctirtir disabled auto-merge January 31, 2025 11:13
…ts are recovered

As of this commit, after recovering the remote transactions, now we release the lock
on pg_dist_transaction while closing it to avoid deadlocks that might occur because
of trying to acquire a lock on pg_dist_authinfo while holding a lock on
pg_dist_transaction. Such a scenario can only cause a deadlock if another transaction
is trying to acquire a strong lock on pg_dist_transaction while holding a lock on
pg_dist_authinfo. As of today, we (implicitly) acquire a strong lock on
pg_dist_transaction only when upgrading Citus to 11.3-1 and this happens when creating
a REPLICA IDENTITY on pg_dist_transaction.

And regardless of the code-path we are in, it should be okay to release the lock there
because all we do after that point is to abort the prepared transactions that are not
part of an in-progress distributed transaction and releasing the lock before doing so
should be just fine.

This also changes the blocking behavior between citus_create_restore_point and the
transaction recovery code-path in the sense that now citus_create_restore_point doesn't
until transaction recovery completes aborting the prepared transactions that are not
part of an in-progress distributed transaction. However, this should be fine because
even before this was possible, e.g., if transaction recovery fails to open a remote
connection to a node.
@onurctirtir onurctirtir force-pushed the avoid-deadlock-xact-recov branch 12 times, most recently from 1d9d384 to cf49a07 Compare January 31, 2025 16:09
@onurctirtir onurctirtir force-pushed the avoid-deadlock-xact-recov branch from cf49a07 to 0d4c676 Compare January 31, 2025 22:03
@citusdata citusdata deleted a comment from codecov bot Jan 31, 2025
@onurctirtir onurctirtir self-requested a review January 31, 2025 22:43
@onurctirtir onurctirtir merged commit 26f16a7 into release-13.0 Jan 31, 2025
153 checks passed
@onurctirtir onurctirtir deleted the avoid-deadlock-xact-recov branch January 31, 2025 22:49
Copy link

codecov bot commented Jan 31, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.49%. Comparing base (c55bc8c) to head (0d4c676).
Report is 11 commits behind head on release-13.0.

Additional details and impacted files
@@               Coverage Diff                @@
##           release-13.0    #7863      +/-   ##
================================================
+ Coverage         89.48%   89.49%   +0.01%     
================================================
  Files               276      276              
  Lines             60063    60067       +4     
  Branches           7524     7524              
================================================
+ Hits              53747    53758      +11     
+ Misses             4166     4160       -6     
+ Partials           2150     2149       -1     

* transactions that are not part of an in-progress distributed transaction and
* releasing the lock before doing so should be just fine.
*/
table_close(pgDistTransaction, RowExclusiveLock);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing DDL commands to go through unencumbered before this transaction commits seems a bit risky / undefined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the heads-up @marcoslot! We'll certainly re-think this before moving forward.

Copy link
Contributor

@emelsimsek emelsimsek Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing DDL commands to go through unencumbered before this transaction commits seems a bit risky / undefined.

@marcoslot, do you mean there is a risk of partially committing propagated DDL commands with this change? We would appreciate it tremendously if you elaborate on the risk and provide the scenario of concern for our testing. Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants