Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order-preserving CommonDictionaryStorage #1908

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

slozier
Copy link
Contributor

@slozier slozier commented Feb 7, 2025

Cherry-picked my PR for 3.6. It's a nice feature to have so why not include it in 3.4...

@slozier slozier marked this pull request as ready for review February 7, 2025 01:34
@BCSharp
Copy link
Member

BCSharp commented Feb 7, 2025

Oh, it is not simple code. Looking at the diffs on GitHub I can't see how the thread safety is ensured without locking. I'll have to check it out locally.

@slozier
Copy link
Contributor Author

slozier commented Feb 7, 2025

Oh, it is not simple code. Looking at the diffs on GitHub I can't see how the thread safety is ensured without locking. I'll have to check it out locally.

I don't think I altered the locking strategy that was currently in place, however now that I look at it I think switching _buckets to a List<Bucket> might have broken it...

Anyway, I need to review this myself since I wrote it two years ago and am a bit foggy on the details. But I would still appreciate a review if you feel like providing one.

Side note, net6.0.CPython.test_keyword has been failing intermittently on macOS during Azure CI. I was going to blame a concurrency issue, but it's already annotated with NotParallelSafe=true. I haven't been able to reproduce it (and it's unclear why it would only occur on the mac).

@BCSharp BCSharp self-requested a review February 8, 2025 02:21
Copy link
Member

@BCSharp BCSharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, unfortunately, I think the way it currently is, it is not thread-safe. The main challenge is because of the change from one structure containing data (Bucket[] _buckets) to two (int[] _indices and List<Bucket> _buckets). What was before an atomic operation on _buckets, now is sometimes split into two that are not atomic.

For instance, the call TryGetValue(_buckets, key, out value) would atomically grab field _buckets, and as long as operations that may disrupt readers were done on a new array, which then would be atomically assigned to the field, things were fine.

Now, the call TryGetValue(_indices, _buckets, key, out value) grabs two fields: _indices and _buckets in indeterminate order, that may not be in sync witch each other.

There are various ways of handling it; in my edit suggestions I propose the way based on the following rules:

  1. There is never a situation that _indices point to some bucket that does not exist. The opposite is OK - there may be a bucket that is not indexed yet.
    1. If something is added, it is first added to _buckets, and only after to _indices.
    2. If something is removed, it is first removed from _indices, only then from _buckets, and not really removed but marked as DUMMY/Bucket.Removed.
    3. List _buckets never gets shorter (except for Clear() - special case).
  2. If both _indices and _buckets have to be replaced by new objects (e.g. to shorten them like in Clear()), it is done in a predictable order to ensure Rule 1 is always satisfied.
    1. If they are reset, _indices are reset first, then _buckets.
    2. If they are read, _buckets are read first, then _indices.

Rule 2 is to ensure that we never get _indices that refer to _buckets that are not (or no longer) around. In other words, if we get a torn read of _indices and _buckets for the lookup, it is always new _indices and old _buckets.

To ensure proper ordering, Thread.MemoryBarrier() was needed in more places than before.

In my code suggestions I tried to demonstrate what I mean by all this. I hope I got all the relevant places, but don't take my word for it.

return false;
}
public override bool TryGetValue(object key, out object value)
=> TryGetValue(_indices, _buckets, key, out value);

/// <summary>
/// Static helper to try and get the value from the dictionary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not static anymore.

Comment on lines 433 to 434
/// Used so the value lookup can run against a buckets while a writer
/// replaces the buckets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description seems obsolete.

@@ -226,7 +209,7 @@ private void UpdateHelperFunctions(Type t, object key) {

// we need to clone the buckets so any lock-free readers will only see
// the old buckets which are homogeneous
_buckets = (Bucket[])_buckets.Clone();
_indices = (int[])_indices.Clone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_indices = (int[])_indices.Clone();
_buckets = new List<Bucket>(_buckets);

Comment on lines +254 to +255
_indices = new int[(int)(size / Load) + 1];
_indices.AsSpan().Fill(FREE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_indices = new int[(int)(size / Load) + 1];
_indices.AsSpan().Fill(FREE);
var newIndices = new int[(int)(size / Load) + 1];
newIndices.AsSpan().Fill(FREE);
_indices = newIndices;

Comment on lines +324 to +326
indices[pair.Key] = buckets.Count;
buckets.Add(bucket);
_count++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
indices[pair.Key] = buckets.Count;
buckets.Add(bucket);
_count++;
_count++;
buckets.Add(bucket);
Thread.MemoryBarrier();
indices[pair.Key] = buckets.Count;

Comment on lines +400 to +402
_indices[pair.Key] = DUMMY;
_buckets[pair.Value] = Bucket.Removed;
Thread.MemoryBarrier();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_indices[pair.Key] = DUMMY;
_buckets[pair.Value] = Bucket.Removed;
Thread.MemoryBarrier();
_indices[pair.Key] = DUMMY;
Thread.MemoryBarrier();
_buckets[pair.Value] = Bucket.Removed;

Comment on lines +427 to +428
public override bool TryGetValue(object key, out object value)
=> TryGetValue(_indices, _buckets, key, out value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public override bool TryGetValue(object key, out object value)
=> TryGetValue(_indices, _buckets, key, out value);
public override bool TryGetValue(object key, out object value) {
var buckets = _buckets;
Thread.MemoryBarrier();
var indices = _indices;
return TryGetValue(indices, buckets, key, out value);
}

Comment on lines +468 to +470
_indices = new int[8];
_indices.AsSpan().Fill(FREE);
_buckets.Clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_buckets.Clear() is not enough. A new list object is needed not to mess up readers in progress.

Suggested change
_indices = new int[8];
_indices.AsSpan().Fill(FREE);
_buckets.Clear();
var newIndices = new int[InitialBucketSize];
newIndices.AsSpan().Fill(FREE);
_indices = newIndices;
Thread.MemoryBarrier();
_buckets = new List<Bucket>();

@BCSharp
Copy link
Member

BCSharp commented Feb 8, 2025

Side note, net6.0.CPython.test_keyword has been failing intermittently on macOS during Azure CI. I was going to blame a concurrency issue, but it's already annotated with NotParallelSafe=true. I haven't been able to reproduce it (and it's unclear why it would only occur on the mac).

When I looked at the error messages from the failing test, it surely looked like a typical example of a concurrent execution of a test that is clearly not parallel-safe: the test creates a test file with a fixed name, uses it, and then deletes it. The error was "file not found" when trying to use the test file. It is exactly the same cause in all failures of test_keyword that I looked into. If I had to guess, I'd say that the mechanism intended to protect tests marked as NotParallelSafe is not working as intended.

Why it is happening intermittently, and why only on some OS and some instances could have been a pure incidental timing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants