Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This accomplishes the same thing that my worker-stack change tried to do, but far more efficiently, plus you can mix permanent and temporary worker-record registration. Also, according to my tests, there seems to be no change in performance.
There's a small bug-fix in the makefile. Without it, the test/stress executables can't be debugged. Trying to view variables in gdb usually results in "<optimized out>". My memory was that later flags provided to gcc override the settings in earlier flags, but that doesn't seem to be true any more. The fix was easy.
ringbuf_worker.registered is now a 64-bit field, but because that struct already had a 64-bit field, there wasn't any change in struct size. Besides, this keeps me from having to implement another compare-and-swap function.
ringbuf.first_free_worker doesn't have to be exact; the intention is to minimize the number of records that have to be searched when registering another worker. This is especially important when using lots of temporary worker-registrations.
I modified t_stress so that one of the worker-records is registered permanently, just to demonstrate that mixing permanent & temporary registration works. I also had it print the total number of bytes received, to make it easy to determine whether new features have performance implications.
The rest of the changes should be easy to understand.