Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
While working on a personal data compression project that uses a modified version of cloudflare/zlib, I took an in-depth look at longest_match and was able to find some possible improvements.
This patch unrolls the inner loop in
longest_match()
, which is used to find the length of a potential match and depending on the data and compression level represents a major bottleneck.The existing loop checks 8 bytes at a time and the surrounding code guarantees that
strend - scan = MAXMATCH - 4 = 254
when the loop is entered (i.e., there are initially 254 bytes for the loop to check), so thescan < strend
condition only becomes false after the 32nd iteration of the loop and it is possible to unroll the loop to process 16 bytes per iteration and thus effectively ignore every odd-numbered check.On an x86 PNG benchmark using settings close to level 9, the new version speeds up deflate() by 7% using clang. I expect a smaller impact for a lower compression level or files with lower redundancy, but there should still be improvements across the board. I did not see a performance improvement when unrolling the loop further or trying to get the clang to unroll the loop itself.
I have some more improvements for longest_match if you are interested, but those may increase code complexity and require more discussion.