Skip to content

Fix CPU hog function #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

marcusmueller
Copy link

cpu_hog: don't use non-reentrant rand() in threads, do smth with result

Previously, stress -c did a terrible job at actually loading the CPU; it
was idle most of the times:

$> perf stat stress -c 16 -t 5
stress: info: [526148] dispatching hogs: 16 cpu, 0 io, 0 vm, 0 hdd
stress: info: [526148] successful run completed in 5s

 Performance counter stats for 'stress -c 16 -t 5':

         79,580.45 msec task-clock:u                     #   15.910 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
               309      page-faults:u                    #    3.883 /sec
   418,716,815,425      cycles:u                         #    5.262 GHz
   262,176,845,042      stalled-cycles-frontend:u        #   62.61% frontend cycles idle
   617,055,840,870      instructions:u                   #    1.47  insn per cycle
                                                  #    0.42  stalled cycles per insn
   175,186,890,751      branches:u                       #    2.201 G/sec
       269,450,686      branch-misses:u                  #    0.15% of all branches

       5.001799550 seconds time elapsed

      79.463002000 seconds user
       0.007854000 seconds sys

This means that in more than half of the cycles, the CPU frontend
couldn't do something. Why? A perf record -g trace of the same
invocation tells us that the CPU is spending > 99% of its time in
__random, waiting for an integer comparison that involves a data load.

No surprise there: rand() relies on global state that needs to get
synchronized.

With this percentage in mind, it's not so bad that the result of sqrt
never got used.

This commit changes both:

  • stores the result of sqrt in a volatile double
  • to stay portable, and use a very small-state algorithm for
    pseudo-random number generation, we just inline xoroshiro128+ [1],
    which is under a MIT-0 style "dedication to public domain" license.

We still don't "spin on sqrt()", because floating point sqrt is very
very fast on modern desktop/server CPUs; but at least we actually make
the CPU do its rounds.

With this change, the statistic now looks like this:

stress: info: [580362] dispatching hogs: 16 cpu, 0 io, 0 vm, 0 hdd
stress: info: [580362] successful run completed in 5s

 Performance counter stats for '/home/marcus/.usrlocal/bin/stress -c 16 -t 5':

         79,575.88 msec task-clock:u                     #   15.900 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
               453      page-faults:u                    #    5.693 /sec
   425,671,786,366      cycles:u                         #    5.349 GHz
       139,385,837      stalled-cycles-frontend:u        #    0.03% frontend cycles idle
 1,055,461,772,875      instructions:u                   #    2.48  insn per cycle
                                                  #    0.00  stalled cycles per insn
    45,889,827,063      branches:u                       #  576.680 M/sec
           220,005      branch-misses:u                  #    0.00% of all branches

       5.004837362 seconds time elapsed

      79.455434000 seconds user
       0.006330000 seconds sys

So, we're nearly doubling the number of actually executed instructions, proving that we're now really stressing our superscalar CPU

[1] https://prng.di.unimi.it/

Marcus Müller added 3 commits February 21, 2025 00:23
Previously, stress -c did a terrible job at actually loading the CPU; it
was idle most of the times:

```
$> perf stat stress -c 16 -t 5
stress: info: [526148] dispatching hogs: 16 cpu, 0 io, 0 vm, 0 hdd
stress: info: [526148] successful run completed in 5s

 Performance counter stats for 'stress -c 16 -t 5':

         79,580.45 msec task-clock:u                     #   15.910 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
               309      page-faults:u                    #    3.883 /sec
   418,716,815,425      cycles:u                         #    5.262 GHz
   262,176,845,042      stalled-cycles-frontend:u        #   62.61% frontend cycles idle
   617,055,840,870      instructions:u                   #    1.47  insn per cycle
                                                  #    0.42  stalled cycles per insn
   175,186,890,751      branches:u                       #    2.201 G/sec
       269,450,686      branch-misses:u                  #    0.15% of all branches

       5.001799550 seconds time elapsed

      79.463002000 seconds user
       0.007854000 seconds sys
```

This means that in more than half of the cycles, the CPU frontend
couldn't do something. Why? A `perf record -g` trace of the same
invocation tells us that the CPU is spending > 99% of its time in
__random, waiting for an integer comparison that involves a data load.

No surprise there: rand() relies on global state that needs to get
synchronized.

With this percentage in mind, it's not so bad that the result of sqrt
never got used.

This commit changes both:

- stores the result of sqrt in a volatile double
- to stay portable, and use a very small-state algorithm for
  pseudo-random number generation, we just inline xoroshiro128+ [1],
  which is under a MIT-0 style "dedication to public domain" license.

We still don't "spin on sqrt()", because floating point sqrt is very
very fast on modern desktop/server CPUs; but at least we actually make
the CPU do its rounds.

[1] https://prng.di.unimi.it/

Signed-off-by: Marcus Müller <[email protected]>
Not a great idea to have to adjust release numbers in multiple files.

Signed-off-by: Marcus Müller <[email protected]>
Signed-off-by: Marcus Müller <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant