Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve dec_digits performance and benchmarking methodology #250

Merged
merged 2 commits into from
Oct 8, 2024

Conversation

bnbajwa
Copy link
Contributor

@bnbajwa bnbajwa commented Oct 7, 2024

Note to reviewers: This pull request consists of 2 commits, so view separately for easier reviewing.

The first commit changes how dec_digits is benchmarked (i.e. using randomly generated numbers).

In this commit, you may notice K=7 for the benchmarks. This means that 1 to 7 digit random numbers will be used for benchmarking i.e. [1, 10^7). The reasoning for that is because I think very large numbers (>7 digits) are much less common in production.

BEFORE commit (baseline results):

NAME                                                   COUNT       MIN       %50       %95       %99     %99.9    %99.99
------------------------------------------------------------------------------------------------------------------------
dec_digits_lib                                    3516840000         0         0         0         0         0         0
dec_digits_div                                    1205850000     0.002     0.002     0.002     0.003     0.004     0.005
dec_digits_math                                    152790000     0.018     0.019      0.02     0.021     0.027     0.034
dec_digits_stdio                                    90960000     0.028     0.032     0.033     0.034     0.046     0.062
dec_digits_less                                   3694740000         0         0         0         0         0         0

Note that the results indicate dec_digits_less is faster than than the dec_digits_lib by 1.05x (i.e. 5%)

AFTER commit (K=7):

NAME                                                   COUNT       MIN       %50       %95       %99     %99.9    %99.99
------------------------------------------------------------------------------------------------------------------------
dec_digits_lib                                     478000000     0.005     0.006     0.006     0.006     0.007     0.007
dec_digits_div                                     266000000     0.010     0.011     0.011     0.012     0.012     0.012
dec_digits_math                                    170000000     0.017     0.017     0.017     0.018     0.018     0.018
dec_digits_stdio                                    73000000     0.040     0.041     0.041     0.043     0.043     0.043
dec_digits_less                                    367000000     0.008     0.008     0.008     0.008     0.009     0.009

Now, the benchmark shows dec_digits_lib is faster than dec_digits_less by 1.3x when dealing with integers up to 10^7.

AFTER commit (K=19).

Adding K=19 benchmark results here if anyone is curious how dec_digits performs when dealing with integers of all digit sizes possible (for int64).

NAME                                                   COUNT       MIN       %50       %95       %99     %99.9    %99.99
------------------------------------------------------------------------------------------------------------------------
dec_digits_lib                                     291000000     0.010     0.010     0.010     0.010     0.010     0.010
dec_digits_div                                     137000000     0.020     0.021     0.023     0.024     0.024     0.024
dec_digits_math                                    150000000     0.018     0.019     0.022     0.028     0.032     0.032
dec_digits_stdio                                    60000000     0.049     0.050     0.052     0.055     0.055     0.055
dec_digits_less                                    322000000     0.008     0.009     0.009     0.009     0.010     0.010

But when dealing all integers up to 10^19, dec_digits_less takes the lead again -- 10% faster than it's rival.

The second commit provides a faster implementation of dec_digits along with support for negative integers.

Results with new implementation (K=7):

NAME                                                   COUNT       MIN       %50       %95       %99     %99.9    %99.99
------------------------------------------------------------------------------------------------------------------------
dec_digits_lib_unsigned                           4258000000         0         0         0         0         0         0
dec_digits_lib_signed                             3370000000         0         0         0         0         0         0
dec_digits_branch                                  441000000     0.006     0.006     0.006     0.006     0.007     0.007
dec_digits_div                                     257000000     0.011     0.011     0.011     0.012     0.012     0.012
dec_digits_math                                    164000000     0.018     0.018     0.018     0.018     0.018     0.018
dec_digits_stdio                                    70000000     0.042     0.043     0.044     0.045     0.045     0.045
dec_digits_less                                    386000000     0.007     0.007     0.007     0.008     0.008     0.008

The previous dec_digits_lib is now dec_digits_branch.

New dec_digits unsigned overload is about 9.6x faster than previous.
And the signed overload is about 7.6x faster.

Results with new implementation (K=19):

NAME                                                   COUNT       MIN       %50       %95       %99     %99.9    %99.99
------------------------------------------------------------------------------------------------------------------------
dec_digits_lib_unsigned                           4286000000         0         0         0         0         0         0
dec_digits_lib_signed                             3364000000         0         0         0         0         0         0
dec_digits_branch                                  263000000      0.01     0.011     0.011     0.012     0.013     0.013
dec_digits_div                                     130000000     0.022     0.023     0.023     0.023     0.024     0.024
dec_digits_math                                    150000000     0.018      0.02      0.02      0.02      0.02      0.02
dec_digits_stdio                                    57000000     0.052     0.053     0.055     0.055     0.055     0.055
dec_digits_less                                    308000000     0.009     0.009      0.01      0.01      0.01      0.01

Unsigned overload about 16x faster.
Signed overload about 12.8x faster

Previously, dec_digits was invoked with inputs starting from 0 and
increasing by 1. This is a very predictable pattern, and easy for the
branch predictor to learn. Thus, the benchmarks show unrepresentative
performance and favour branchy implementations of dec_digits.

Now, dec_digits is invoked with random numbers of mixed digits.
pow10 was previously in reactive-cpp but moved as it's now required in
toolbox
@bnbajwa bnbajwa requested a review from a team as a code owner October 7, 2024 15:32
The implementation of dec_digits is improved for better performance
and added support for negative numbers.
@bnbajwa bnbajwa merged commit 74bc557 into reactivemarkets:master Oct 8, 2024
5 checks passed
@bnbajwa bnbajwa deleted the perf branch October 8, 2024 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants