Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change psql concurrency from autocommit to serializable. #1190

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Change psql concurrency from autocommit to serializable. #1190

wants to merge 2 commits into from

Conversation

acceso
Copy link

@acceso acceso commented May 29, 2018

Psql Autocommit is giving concurrency errors in PostgreSQL when operations are sent in parallel. Using serializable transactions seems to fix it.

Example:

DETAIL:  Process 176184 waits for ShareLock on transaction 15529683; blocked by process 191002.
 Process 191002 waits for ShareLock on transaction 15529684; blocked by process 178678.
 Process 178678 waits for ExclusiveLock on tuple (1386,16) of relation 43815 of database 16391; blocked by process 176184.
 Process 176184: DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.10.240/28'
 Process 191002: DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.11.0/28'
 Process 178678: DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.10.208/28'
HINT:  See server log for query details.
CONTEXT:  while locking tuple (1386,16) in relation "ip_net_plan"
 SQL statement "UPDATE ip_net_plan SET children =
      (SELECT COUNT(1)
      FROM ip_net_plan
      WHERE vrf_id = OLD.vrf_id
       AND iprange(prefix) << iprange(old_parent.prefix)
       AND indent = old_parent.indent+1)
     WHERE id = old_parent.id"
 PL/pgSQL function tf_ip_net_plan__prefix_iu_after() line 92 at SQL statement
STATEMENT:  DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.10.240/28'

This problem is happening pretty often in our deployment (several times per day).

I can easily reproduce it with this simple (and partial) Python code. list_of_prefixes is a file with a prefix on each line:

    with open('list_of_prefixes') as f:
        lines = f.readlines()
    for line in lines:
        line = line.strip('\n')
        procs[line] = multiprocessing.Process(target=delete_prefix, args=(ipam, line,))
    for _, obj in procs.items():
        obj.start()
    for _, obj in procs.items():
        obj.join()

I guess this patch only hides the problem, but changing every query/transaction/function would not be as easy.

Autocommit is giving concurrency errors in PostgreSQL when operations
are sent in parallel. Using serializable transactions seems to fix it.

For ex:

ERROR:  deadlock detected
DETAIL:  Process 176184 waits for ShareLock on transaction 15529683; blocked by process 191002.
 Process 191002 waits for ShareLock on transaction 15529684; blocked by process 178678.
 Process 178678 waits for ExclusiveLock on tuple (1386,16) of relation 43815 of database 16391; blocked by process 176184.
 Process 176184: DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.10.240/28'
 Process 191002: DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.11.0/28'
 Process 178678: DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.10.208/28'
HINT:  See server log for query details.
CONTEXT:  while locking tuple (1386,16) in relation "ip_net_plan"
 SQL statement "UPDATE ip_net_plan SET children =
      (SELECT COUNT(1)
      FROM ip_net_plan
      WHERE vrf_id = OLD.vrf_id
       AND iprange(prefix) << iprange(old_parent.prefix)
       AND indent = old_parent.indent+1)
     WHERE id = old_parent.id"
 PL/pgSQL function tf_ip_net_plan__prefix_iu_after() line 92 at SQL statement
STATEMENT:  DELETE FROM ip_net_plan AS p WHERE vrf_id = 0 AND prefix = '10.0.10.240/28'
@@ -771,7 +771,7 @@ def _connect_db(self):
while True:
try:
self._con_pg = psycopg2.connect(**db_args)
self._con_pg.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
self._con_pg.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_SERIALIZABLE)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (98 > 79 characters)

@acceso
Copy link
Author

acceso commented Jun 7, 2018

I have more feedback about this change. It works if concurrency is low, but when concurrency increases, the deadlocks cause timeouts.

I think the actual problem is harder to fix: there are concurrency issues in the database code. Unfortunately a fix for this is outside my reach.

So far we have mitigated the problem by adding retries in the frontend code. This hides the problem from our users.

@acceso
Copy link
Author

acceso commented Jul 1, 2018

Here is a different patch that undoes the previous change and locks the tables for the deletes. We have tested it for one week and no complains so far. Performance is around 50% lower but no more deadlocks for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants