perf: forward default allocator methods and remove memory overprovisioning #1045

SFBdragon · 2024-01-31T21:19:21Z

Thought I'd check up on how people were using talc in the wild :)

There are two ideas here:

Use talc's reallocation mechanisms. This is recommended for faster in-place reallocation (avoiding the much-more-expensive memcpy that default-reallocation always invokes) and also allows for interleaving some allocator operations as introduced in v4.2 to somewhat improve allocation concurrency.
Avoid padding the size of allocations. Increasing the alignment of an allocation is sufficient to avoid false sharing. Talc needs a little bit of metadata next to each allocation, and this would create N-8 sized holes in between every allocation due to the demands placed on it. By removing the size requirement, false sharing avoidance is unhindered, but this allows Talc to minimize the overhead of high-alignments. (Talc is designed to automatically place the metadata in a way that minimized false sharing, but this only helps if Talc can sneak it in within the same cache line, which is impossible if Talc thinks you want the whole cache line for user data.)

Word salad aside, the changes are quite trivial.

SFBdragon · 2024-02-01T20:05:43Z

Hmm..

thread '<unnamed>' panicked at /home/runner/.rustup/toolchains/nightly-2024-01-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:262:26:
cannot access a Thread Local Storage value during or after destruction: AccessError
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Just to confirm, this was introduced by this PR?

I'm going to submit a PR with only the size round-up change, no reallocation invocation, to A/B test this.

edit: I'll just add a commit and undo the changes later, just squash if/when the changes are merged.

SFBdragon · 2024-02-01T20:33:58Z

Hmm. I don't think this is the allocator, then. Maybe there's a memory management bug in some platform-specific codepath in the kernel, or the size of some structure is meaningfully different? (That is avoided by allocating/deallocating more than needed? Or perhaps the rounding might be homogenizing the size of a deallocation of an allocation with a different size sizes which... is related how? I don't know.) I don't have aarch64 hardware nor the familiarity with the codebase to meaningfully assist there though.

I can change the PR to just enable reallocation, and a separate issue could be opened to address this, or just keep the PR blocked as-is? Completely up to you.

mkroening · 2024-02-13T12:32:01Z

Thanks for opening this! Both changes are wonderful. I opened #1059 to track avoiding padding the size of allocations. I'll have to dig a bit into this, to locate the underlying issue.

Changing this PR to resolve the reallocation issue would be great. Then we can move forward with that change while I dig into the other issue.

SFBdragon · 2024-02-13T13:17:07Z

I'm causing a mess, heh, sorry. I'll need to check this out properly when I have time. Hopefully soon.

mkroening · 2024-02-13T13:39:48Z

Interesting issues! Thanks a lot for your help! :)

I'll look at #743. Maybe that will help us locate any issues on our side.

Co-authored-by: Martin Kröning <[email protected]> Signed-off-by: Martin Kröning <[email protected]>

Increasing the alignment of an allocation is sufficient to avoid false sharing. Talc needs a little bit of metadata next to each allocation, and this would create N-8 sized holes in between every allocation due to the demands placed on it. By removing the size requirement, false sharing avoidance is unhindered, but this allows Talc to minimize the overhead of high-alignments. (Talc is designed to automatically place the metadata in a way that minimized false sharing, but this only helps if Talc can sneak it in within the same cache line, which is impossible if Talc thinks you want the whole cache line for user data.) Co-authored-by: Martin Kröning <[email protected]> Signed-off-by: Martin Kröning <[email protected]>

mkroening

Hi, @SFBdragon, as you have already noticed, I finally located the issue that was causing the AArch64 CI to fail (#1155). We had allocated too little memory for a struct, which then became corrupted by the allocator metadata.

I am relieved, that we can merge this now. Hopefully, we will find these cases as soon as we introduce them from now on.

Thanks a lot for your contribution! :)

SFBdragon · 2024-04-24T19:43:41Z

Well spotted! This class of bug is a nightmare..

I've been thinking about the feasibility of talc providing a configuration that performs more thorough integrity checks, with extra memory overhead for debug data to help with catching issues like overwriting allocator metadata. I can't promise anything in the near future but I think it would be quite useful to embedded projects where unsafe code is common. (So you'd run your tests and CI in this configuration but leave it off in production, typically. Although I'd rather not rely on a system allocator necessarily.) Idle thoughts for now though.

mkroening self-requested a review January 31, 2024 23:20

mkroening self-assigned this Jan 31, 2024

mkroening mentioned this pull request Feb 13, 2024

alloc: avoid padding the size of allocations #1059

Closed

mkroening approved these changes Feb 13, 2024

View reviewed changes

mkroening force-pushed the main branch 3 times, most recently from e60e212 to 4142484 Compare April 19, 2024 14:52

mkroening mentioned this pull request Apr 20, 2024

fix(virtio/pci): use volatile accesses for device features #1146

Merged

mkroening force-pushed the main branch from d46ae4f to ee9fddf Compare April 20, 2024 22:28

mkroening mentioned this pull request Apr 23, 2024

fix(aarch64): add size of _private to TaskTLS allocation #1155

Merged

mkroening force-pushed the main branch from 6725245 to d7fbbc9 Compare April 24, 2024 14:51

SFBdragon and others added 3 commits April 24, 2024 16:53

perf(alloc): forward alloc_zeroed

df7b858

Co-authored-by: Martin Kröning <[email protected]> Signed-off-by: Martin Kröning <[email protected]>

perf(alloc): forward realloc

be1134c

Co-authored-by: Martin Kröning <[email protected]> Signed-off-by: Martin Kröning <[email protected]>

mkroening force-pushed the main branch from d7fbbc9 to df7b858 Compare April 24, 2024 14:53

mkroening approved these changes Apr 24, 2024

View reviewed changes

mkroening changed the title ~~Memory Utilization and Allocator Performance Improvements~~ perf: forward default allocator methods and remove memory overprovisioning Apr 24, 2024

mkroening added this pull request to the merge queue Apr 24, 2024

mkroening removed this pull request from the merge queue due to a manual request Apr 24, 2024

mkroening added this pull request to the merge queue Apr 24, 2024

Merged via the queue into hermit-os:main with commit 5fa96eb Apr 24, 2024
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: forward default allocator methods and remove memory overprovisioning #1045

perf: forward default allocator methods and remove memory overprovisioning #1045

SFBdragon commented Jan 31, 2024 •

edited by mkroening

Loading

SFBdragon commented Feb 1, 2024 •

edited

Loading

SFBdragon commented Feb 1, 2024 •

edited

Loading

mkroening commented Feb 13, 2024

SFBdragon commented Feb 13, 2024

mkroening commented Feb 13, 2024

mkroening left a comment

SFBdragon commented Apr 24, 2024

perf: forward default allocator methods and remove memory overprovisioning #1045

perf: forward default allocator methods and remove memory overprovisioning #1045

Conversation

SFBdragon commented Jan 31, 2024 • edited by mkroening Loading

SFBdragon commented Feb 1, 2024 • edited Loading

SFBdragon commented Feb 1, 2024 • edited Loading

mkroening commented Feb 13, 2024

SFBdragon commented Feb 13, 2024

mkroening commented Feb 13, 2024

mkroening left a comment

Choose a reason for hiding this comment

SFBdragon commented Apr 24, 2024

SFBdragon commented Jan 31, 2024 •

edited by mkroening

Loading

SFBdragon commented Feb 1, 2024 •

edited

Loading

SFBdragon commented Feb 1, 2024 •

edited

Loading