-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: forward default allocator methods and remove memory overprovisioning #1045
Conversation
Hmm..
Just to confirm, this was introduced by this PR? I'm going to submit a PR with only the size round-up change, no reallocation invocation, to A/B test this. edit: I'll just add a commit and undo the changes later, just squash if/when the changes are merged. |
Hmm. I don't think this is the allocator, then. Maybe there's a memory management bug in some platform-specific codepath in the kernel, or the size of some structure is meaningfully different? (That is avoided by allocating/deallocating more than needed? Or perhaps the rounding might be homogenizing the size of a deallocation of an allocation with a different size sizes which... is related how? I don't know.) I don't have aarch64 hardware nor the familiarity with the codebase to meaningfully assist there though. I can change the PR to just enable reallocation, and a separate issue could be opened to address this, or just keep the PR blocked as-is? Completely up to you. |
Thanks for opening this! Both changes are wonderful. I opened #1059 to track avoiding padding the size of allocations. I'll have to dig a bit into this, to locate the underlying issue. Changing this PR to resolve the reallocation issue would be great. Then we can move forward with that change while I dig into the other issue. |
I'm causing a mess, heh, sorry. I'll need to check this out properly when I have time. Hopefully soon. |
Interesting issues! Thanks a lot for your help! :) I'll look at #743. Maybe that will help us locate any issues on our side. |
e60e212
to
4142484
Compare
Co-authored-by: Martin Kröning <[email protected]> Signed-off-by: Martin Kröning <[email protected]>
Co-authored-by: Martin Kröning <[email protected]> Signed-off-by: Martin Kröning <[email protected]>
Increasing the alignment of an allocation is sufficient to avoid false sharing. Talc needs a little bit of metadata next to each allocation, and this would create N-8 sized holes in between every allocation due to the demands placed on it. By removing the size requirement, false sharing avoidance is unhindered, but this allows Talc to minimize the overhead of high-alignments. (Talc is designed to automatically place the metadata in a way that minimized false sharing, but this only helps if Talc can sneak it in within the same cache line, which is impossible if Talc thinks you want the whole cache line for user data.) Co-authored-by: Martin Kröning <[email protected]> Signed-off-by: Martin Kröning <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @SFBdragon, as you have already noticed, I finally located the issue that was causing the AArch64 CI to fail (#1155). We had allocated too little memory for a struct, which then became corrupted by the allocator metadata.
I am relieved, that we can merge this now. Hopefully, we will find these cases as soon as we introduce them from now on.
Thanks a lot for your contribution! :)
Well spotted! This class of bug is a nightmare.. I've been thinking about the feasibility of talc providing a configuration that performs more thorough integrity checks, with extra memory overhead for debug data to help with catching issues like overwriting allocator metadata. I can't promise anything in the near future but I think it would be quite useful to embedded projects where unsafe code is common. (So you'd run your tests and CI in this configuration but leave it off in production, typically. Although I'd rather not rely on a system allocator necessarily.) Idle thoughts for now though. |
Thought I'd check up on how people were using talc in the wild :)
There are two ideas here:
talc
's reallocation mechanisms. This is recommended for faster in-place reallocation (avoiding the much-more-expensive memcpy that default-reallocation always invokes) and also allows for interleaving some allocator operations as introduced in v4.2 to somewhat improve allocation concurrency.N-8
sized holes in between every allocation due to the demands placed on it. By removing the size requirement, false sharing avoidance is unhindered, but this allows Talc to minimize the overhead of high-alignments. (Talc is designed to automatically place the metadata in a way that minimized false sharing, but this only helps if Talc can sneak it in within the same cache line, which is impossible if Talc thinks you want the whole cache line for user data.)Word salad aside, the changes are quite trivial.
Closes #1059