Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: forward default allocator methods and remove memory overprovisioning #1045

Merged
merged 3 commits into from
Apr 24, 2024

Conversation

SFBdragon
Copy link
Contributor

@SFBdragon SFBdragon commented Jan 31, 2024

Thought I'd check up on how people were using talc in the wild :)

There are two ideas here:

  1. Use talc's reallocation mechanisms. This is recommended for faster in-place reallocation (avoiding the much-more-expensive memcpy that default-reallocation always invokes) and also allows for interleaving some allocator operations as introduced in v4.2 to somewhat improve allocation concurrency.
  2. Avoid padding the size of allocations. Increasing the alignment of an allocation is sufficient to avoid false sharing. Talc needs a little bit of metadata next to each allocation, and this would create N-8 sized holes in between every allocation due to the demands placed on it. By removing the size requirement, false sharing avoidance is unhindered, but this allows Talc to minimize the overhead of high-alignments. (Talc is designed to automatically place the metadata in a way that minimized false sharing, but this only helps if Talc can sneak it in within the same cache line, which is impossible if Talc thinks you want the whole cache line for user data.)

Word salad aside, the changes are quite trivial.

Closes #1059

@mkroening mkroening self-requested a review January 31, 2024 23:20
@mkroening mkroening self-assigned this Jan 31, 2024
@SFBdragon
Copy link
Contributor Author

SFBdragon commented Feb 1, 2024

Hmm..

thread '<unnamed>' panicked at /home/runner/.rustup/toolchains/nightly-2024-01-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:262:26:
cannot access a Thread Local Storage value during or after destruction: AccessError
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Just to confirm, this was introduced by this PR?

I'm going to submit a PR with only the size round-up change, no reallocation invocation, to A/B test this.

edit: I'll just add a commit and undo the changes later, just squash if/when the changes are merged.

@SFBdragon
Copy link
Contributor Author

SFBdragon commented Feb 1, 2024

Hmm. I don't think this is the allocator, then. Maybe there's a memory management bug in some platform-specific codepath in the kernel, or the size of some structure is meaningfully different? (That is avoided by allocating/deallocating more than needed? Or perhaps the rounding might be homogenizing the size of a deallocation of an allocation with a different size sizes which... is related how? I don't know.) I don't have aarch64 hardware nor the familiarity with the codebase to meaningfully assist there though.

I can change the PR to just enable reallocation, and a separate issue could be opened to address this, or just keep the PR blocked as-is? Completely up to you.

@mkroening
Copy link
Member

Thanks for opening this! Both changes are wonderful. I opened #1059 to track avoiding padding the size of allocations. I'll have to dig a bit into this, to locate the underlying issue.

Changing this PR to resolve the reallocation issue would be great. Then we can move forward with that change while I dig into the other issue.

@SFBdragon
Copy link
Contributor Author

I'm causing a mess, heh, sorry. I'll need to check this out properly when I have time. Hopefully soon.

@mkroening
Copy link
Member

Interesting issues! Thanks a lot for your help! :)

I'll look at #743. Maybe that will help us locate any issues on our side.

SFBdragon and others added 3 commits April 24, 2024 16:53
Co-authored-by: Martin Kröning <[email protected]>
Signed-off-by: Martin Kröning <[email protected]>
Co-authored-by: Martin Kröning <[email protected]>
Signed-off-by: Martin Kröning <[email protected]>
Increasing the alignment of an allocation is sufficient to avoid false sharing.
Talc needs a little bit of metadata next to each allocation, and this would create N-8 sized holes in between every allocation due to the demands placed on it.
By removing the size requirement, false sharing avoidance is unhindered, but this allows Talc to minimize the overhead of high-alignments.
(Talc is designed to automatically place the metadata in a way that minimized false sharing, but this only helps if Talc can sneak it in within the same cache line, which is impossible if Talc thinks you want the whole cache line for user data.)

Co-authored-by: Martin Kröning <[email protected]>
Signed-off-by: Martin Kröning <[email protected]>
Copy link
Member

@mkroening mkroening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @SFBdragon, as you have already noticed, I finally located the issue that was causing the AArch64 CI to fail (#1155). We had allocated too little memory for a struct, which then became corrupted by the allocator metadata.

I am relieved, that we can merge this now. Hopefully, we will find these cases as soon as we introduce them from now on.

Thanks a lot for your contribution! :)

@mkroening mkroening changed the title Memory Utilization and Allocator Performance Improvements perf: forward default allocator methods and remove memory overprovisioning Apr 24, 2024
@mkroening mkroening added this pull request to the merge queue Apr 24, 2024
@mkroening mkroening removed this pull request from the merge queue due to a manual request Apr 24, 2024
@mkroening mkroening added this pull request to the merge queue Apr 24, 2024
Merged via the queue into hermit-os:main with commit 5fa96eb Apr 24, 2024
13 checks passed
@SFBdragon
Copy link
Contributor Author

Well spotted! This class of bug is a nightmare..

I've been thinking about the feasibility of talc providing a configuration that performs more thorough integrity checks, with extra memory overhead for debug data to help with catching issues like overwriting allocator metadata. I can't promise anything in the near future but I think it would be quite useful to embedded projects where unsafe code is common. (So you'd run your tests and CI in this configuration but leave it off in production, typically. Although I'd rather not rely on a system allocator necessarily.) Idle thoughts for now though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

alloc: avoid padding the size of allocations
2 participants