Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nogil __len__ and __eq__ break atomicity on sets and dicts #129519

Open
luisggpina opened this issue Jan 31, 2025 · 2 comments
Open

nogil __len__ and __eq__ break atomicity on sets and dicts #129519

luisggpina opened this issue Jan 31, 2025 · 2 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error

Comments

@luisggpina
Copy link

luisggpina commented Jan 31, 2025

Bug report

Bug description:

Hi,

We're a research group focused on testing concurrent runtimes. Our work-in-progress prototype found a violation of atomicity on the current nogil build when using set __len__ or __eq__ with other concurrent operations on the same set/dict. The program below shows the wrong behavior for __len__ and __ixor__:

import threading
import sys
def t0(b1,s,res):
    b1.wait()
    res.append(s.__len__())

def t1(b1,s,res):
    b1.wait()
    s.__ixor__({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 'a'})

def Test():
  s =  {17, 'a', 'b', 18, 'c', 'd', 'e'}
  threads=[]
  barrier = threading.Barrier(2)
  res = []
  threads.append(threading.Thread(target= t0, args=(barrier, s,res)))
  threads.append(threading.Thread(target= t1, args=(barrier, s,res)))
  for i in range(0, len(threads)):
      threads[i].start()
  for i in range(0, len(threads)):
      threads[i].join()
  if res[0] not in { 7 , 32 }:
      print("found bug: " + str(res[0]))

print("test begin...")
for i in range(0,50000):
  threads = []
  if i % 1000 == 0:
      print(i)
  for i in range(0,100):
      threads.append(threading.Thread(target= Test))
  for t in threads:
      t.start()
  for t in threads:
      t.join()
print("test Done")

Operation __len__ should see the initial set and return 7 or the set after the __ixor__ completes and return 32. However, it can see internal changes made by __ixor__ and return different numbers. Here's an example execution:

test begin...
0
found bug: 9
found bug: 17
found bug: 22
found bug: 30
found bug: 9
found bug: 9
found bug: 9
found bug: 9
found bug: 9

We have observed the same problem when __len__ runs concurrently with other operations: symmetric_difference_update, __isub__, and update. We'll share these test below in a comment.

We observed the same problem when dict's __len__ runs concurrently with __ior__:

import threading
import sys
def t0(b1,s,res):
    b1.wait()
    res.append(s.__len__())
​
def t1(b1,s,res):
    b1.wait()
    s.__ior__({0: 0, 1: 1, 2: 2, 3: 3, 4: 4})
​
def Test():
  d =  {'k1': 'v1', 'k2': 'v2', 'k3': 3, 10: 10}
  threads=[]
  barrier = threading.Barrier(2)
  res = []
  threads.append(threading.Thread(target= t0, args=(barrier, d,res)))
  threads.append(threading.Thread(target= t1, args=(barrier, d,res)))
  for i in range(0, len(threads)):
      threads[i].start()
  for i in range(0, len(threads)):
      threads[i].join()
  if res[0] not in { 4, 9 }:
      print("found bug: " + str(res[0]))
​
print("test begin...")
for i in range(0,50000):
  threads = []
  if i % 1000 == 0:
      print(i)
  for i in range(0,100):
      threads.append(threading.Thread(target= Test))
  for t in threads:
      t.start()
  for t in threads:
      t.join()
print("test Done")

Again, the length of the dict should be either 4 or 9, depending on whether __ior__ took place. However, __len__ can see internal changes from __ior__ as shown by the output:

test begin...
0
found bug: 5
found bug: 6
found bug: 5
found bug: 5
found bug: 8
found bug: 7

Finally, we observed set's __eq__ to be able to access intermediate results of __isub__, as shown by the program below:

import threading
import sys
def t0(b1,s,res):
    b1.wait()
    s.__isub__({1, 'b', 'd', 17})

def t1(b1,s,res):
    b1.wait()
    res.append(s.__eq__({'a', 'b', 18, 'c', 'd', 'e'}))

def Test():
  s =  {'a', 17, 'b', 18, 'c', 'd', 'e'}
  threads=[]
  barrier = threading.Barrier(2)
  res = []
  threads.append(threading.Thread(target= t0, args=(barrier, s,res)))
  threads.append(threading.Thread(target= t1, args=(barrier, s,res)))
  for i in range(0, len(threads)):
      threads[i].start()
  for i in range(0, len(threads)):
      threads[i].join()
  if res[0] != False:
      print("found bug: " + str(res[0]))

print("test begin...")
for i in range(0,50000):
  threads = []
  if i % 1000 == 0:
      print(i)
  for i in range(0,100):
      threads.append(threading.Thread(target= Test))
  for t in threads:
      t.start()
  for t in threads:
      t.join()
print("test Done")

The original and resulting sets are never equal to the argument passed to __eq__, however the operation can see internal changes from __isub__ as shown by the ouput:

test begin...
0
found bug: True
found bug: True
found bug: True
found bug: True
found bug: True
found bug: True

@flypoodles and @overlorde are part of the team, adding them so they get notified about further discussion.

We note that we observed these outputs in several x86_64 machines and one ARM machine.

CPython versions tested on:

CPython main branch, 3.14

Operating systems tested on:

Linux

@luisggpina luisggpina added the type-bug An unexpected behavior, bug, or error label Jan 31, 2025
@luisggpina
Copy link
Author

luisggpina commented Jan 31, 2025

Adding the tests mentioned in the main issue here, with sample outputs.

__len__ and symmetric_difference_update:

import threading
import sys
def t0(b1,s,res):
    b1.wait()
    res.append(s.__len__())

def t1(b1,s,res):
    b1.wait()
    s.symmetric_difference_update({3, 4, 'b', 'e', 'f'})

def Test():
  s =  {1, 2, 'a', 'b', 'c', 'd', 'e'}
  threads=[]
  barrier = threading.Barrier(2)
  res = []
  threads.append(threading.Thread(target= t0, args=(barrier, s,res)))
  threads.append(threading.Thread(target= t1, args=(barrier, s,res)))
  for i in range(0, len(threads)):
      threads[i].start()
  for i in range(0, len(threads)):
      threads[i].join()
  if res[0] not in { 7, 8 }:
      print("found bug: " + str(res))

print("test begin...")
for i in range(0,50000):
  threads = []
  if i % 1000 == 0:
      print(i)
  for i in range(0,100):
      threads.append(threading.Thread(target= Test))
  for t in threads:
      t.start()
  for t in threads:
      t.join()
print("test Done")

Output:

test begin...
0
found bug: [9]
found bug: [9]
found bug: [9]
found bug: [9]
found bug: [9]
found bug: [9]
found bug: [9]
found bug: [9]
found bug: [9]
found bug: [9]
found bug: [9]
found bug: [9]
found bug: [9]

__len__ and __isub__ (this one is rare, you may have to run it for a while to observe the behavior we mention; ARM hardware seems to observe it more often):

import threading
import sys
def t0(b1,s,res):
    b1.wait()
    s.__isub__({1, 'b', 17, 'd'})
​
def t1(b1,s,res):
    b1.wait()
    res.append(s.__len__())
​
def Test():
  s =  {'a', 17, 18, 'b', 'c', 'd', 'f'}
  threads=[]
  barrier = threading.Barrier(2)
  res = []
  threads.append(threading.Thread(target= t0, args=(barrier, s,res)))
  threads.append(threading.Thread(target= t1, args=(barrier, s,res)))
  for i in range(0, len(threads)):
      threads[i].start()
  for i in range(0, len(threads)):
      threads[i].join()
  if res[0] not in { 4, 7 }:
      print("found bug: " + str(res[0]))
​
print("test begin...")
for i in range(0,50000):
  threads = []
  if i % 1000 == 0:
      print(i)
  for i in range(0,100):
      threads.append(threading.Thread(target= Test))
  for t in threads:
      t.start()
  for t in threads:
      t.join()
print("test Done")

Output:

test begin...
0
1000
2000
3000
found bug: 5
4000
found bug: 5

__len__ and update:

import threading
import sys
def t0(b1,s,res):
    b1.wait()
    res.append(s.__len__())

def t1(b1,s,res):
    b1.wait()
    s.update({0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 'a'})

def Test():
  s =  {17, 18, 'a', 'b', 'c', 'd', 'e'}
  threads=[]
  barrier = threading.Barrier(2)
  res = []
  threads.append(threading.Thread(target= t0, args=(barrier, s,res)))
  threads.append(threading.Thread(target= t1, args=(barrier, s,res)))
  for i in range(0, len(threads)):
      threads[i].start()
  for i in range(0, len(threads)):
      threads[i].join()
  if res[0] not in { 7, 35 }:
      print("found bug: " + str(res[0]))

print("test begin...")
for i in range(0,50000):
  threads = []
  if i % 1000 == 0:
      print(i)
  for i in range(0,100):
      threads.append(threading.Thread(target= Test))
  for t in threads:
      t.start()
  for t in threads:
      t.join()
print("test Done")

Output:

test begin...
0
found bug: 24
found bug: 24
found bug: 21
found bug: 24
found bug: 32
found bug: 25
found bug: 28

@ZeroIntensity
Copy link
Member

There's two possibilities here, as far as I can tell:

  • The lockless-ness of the set implementation is causing issues--the length isn't protected by a lock, so it's free to be read by another thread before the set finishes the update. Fixing this would involve extra locking, which would hurt performance; I'm not convinced it'd be worth doing that for this single use case.
  • There might be an actual race on the used field of set objects. An atomic load/store is used for most cases, but some functions (e.g. set_merge_lock_held) read it non-atomically and rely on the critical section. I don't think that's thread-safe, because not all functions (including __len__ in this case) hold the lock.

@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants