-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up is_ascii_hex_digit #479
Conversation
It's possible to reduce the number of required comparisons by converting `c` into lowercase, so that instead of checking both lowercase and uppercase ranges, only lowercase range needs to be checked. This allows clang to generate branchless x86_64 asm (https://compiler-explorer.com/z/P6f7c8e4T). Instead of ``` is_ascii_hex_digit(char): # @is_ascii_hex_digit(char) lea ecx, [rdi - 48] mov al, 1 cmp cl, 10 jb .LBB0_3 lea ecx, [rdi - 65] cmp cl, 6 jb .LBB0_3 add dil, -97 cmp dil, 6 setb al .LBB0_3: ret ``` clang generates ``` is_ascii_hex_digit(char): # @is_ascii_hex_digit(char) lea eax, [rdi - 48] cmp al, 10 setb cl or dil, 32 add dil, -97 cmp dil, 6 setb al or al, cl ret ``` that is 2 instructions shorter and contains no branches.
Can you run and share the results of the benchmarks? |
I am using GCC 11 on an Ice Lake-based server. benchdataMain branch...
This PR:
bench_search_paramsMain branch...
This PR:
ConclusionIt does not appear that this PR improves the performance, at least according to these results. However, the PR does suggest that we can do better. |
@ttsugriy Can you try...
|
@lemire the reason you're not seeing any improvement is because you're using gcc 11 that produces a very different assembly (https://compiler-explorer.com/z/16eEbPezb) from the one I've shared above using clang 16. |
what's even more interesting is that GCC 13 generates an even better assembly for the original version (https://compiler-explorer.com/z/1j9oMTrbc)
Basically it's using a lookup table approach where all hex chars are packed into an integer value. |
@ttsugriy Can you share your benchmark results with clang 16? |
@anonrig I have only clang 15 on my machine and the results are not very promising :)
this PR
clang info:
As such this PR should probably be closed and it's probably best to try to write some sort of lookup table implementation so that clang generates the same super efficient assembly as GCC. |
The closest I could get with clang is by using
that generates
https://compiler-explorer.com/z/b93bYhn4j but it's still much slower on the benchmark running on M1 ARM. |
It's possible to reduce the number of required comparisons by converting
c
into lowercase, so that instead of checking both lowercase and uppercase ranges, only lowercase range needs to be checked.This allows clang to generate branchless x86_64 asm (https://compiler-explorer.com/z/P6f7c8e4T). Instead of
clang generates
that is 2 instructions shorter and contains no branches.