Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix checksumming in the presence of large, mostly-zero buffers. #3355

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Keno
Copy link
Member

@Keno Keno commented Aug 2, 2022

Tools like asan allocate large regions of memory (multi-TB)
that are generally sparsely allocated using demand-paging.
Currently, when checksumming a process that has such regions,
we attempt to read the entire multi-TB region into a std::vector.
This obviously does not work. Switch our checksum method to
crc32c for performance and further improve performance by:

  1. Reading the pagemap to detect pages that were allocated, but
    are currently still 0 and waiting to demand-paged in. These
    can be fast-forwarded using a precomputed crc operator.
  2. Capping the amount of memory to be read at once at a reasonable
    max buffer size to avoid thrashing.

The crc32c implementation here is copied from Julia, which itself
is cobbled together from various places around the web - it's not
the prettiest, but keeping it aligned with the Julia version will
make it easier to port any future hardware acceleration improvements
over, if necessary.

Tools like asan allocate large regions of memory (multi-TB)
that are generally sparsely allocated using demand-paging.
Currently, when checksumming a process that has such regions,
we attempt to read the entire multi-TB region into a std::vector.
This obviously does not work. Switch our checksum method to
crc32c for performance and further improve performance by:

1. Reading the pagemap to detect pages that were allocated, but
are currently still 0 and waiting to demand-paged in. These
can be fast-forwarded using a precomputed crc operator.
2. Capping the amount of memory to be read at once at a reasonable
max buffer size to avoid thrashing.

The crc32c implementation here is copied from Julia, which itself
is cobbled together from various places around the web - it's not
the prettiest, but keeping it aligned with the Julia version will
make it easier to port any future hardware acceleration improvements
over, if necessary.
@rocallahan
Copy link
Collaborator

How much of a performance improvement is this fancy crc32 implementation?

@yuyichao
Copy link
Contributor

yuyichao commented Aug 8, 2022

You can see the benchmark result I did a while ago when I implemented the arm versions here JuliaLang/julia#22385 there isn’t any benchmark result from the x86 version when it was originally added afaict. JuliaLang/julia#18297

# endif
# endif

/* Table-driven software version as a fall-back. This is about 15 times slower
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the comment suggests 15x faster for the hardware accelerated version which is similar with what I've seen on some older ARM cores.

@rocallahan
Copy link
Collaborator

rocallahan commented Aug 9, 2022

I think the pagemap optimization makes sense because the speedup is nearly infinite in those pathological cases with lots of unreserved memory.

I'm less convinced about the benefits of accelerating the CRC implementation for data that we've had to read from the tracee. Wouldn't it be good enough to use the crc32() function from zlib? We could replace the existing crc code in util.cc with that too.

Personally I almost never use the memory checksumming. Maybe you and Keno use it a lot?

@GitMensch
Copy link
Contributor

Friendly ping on this.

Wouldn't it be good enough to use the crc32() function from zlib?

As the zlib crc32 implementation has ARM optimizations since 2019, I think that's good to use and preferable over (another) 3rd party implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants