Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ZigguratGaussianRng<RNG> #1981

Merged
merged 21 commits into from
May 28, 2024

Conversation

ralfkonrad
Copy link
Contributor

@ralfkonrad ralfkonrad commented May 28, 2024

The improved Ziggurat method to generate normal random samples is significantly faster than BoxMuller. Therefore, it is e.g. the default generator in rust-random for StandardNormal distributions.

As the underlying RNG needs to provide std::uint64_t nextInt64() const random numbers, currently it only works in combination with Xoshiro256StarStarUniformRng.

On my local machine I get the following benchmark values.

Windows CXX compiler MSVC 19.40.33808.0 approx. two times faster compared to BoxMuller with MersenneTwister:

Run on (16 X 2304 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
xoshiro256StarStarZigguratGaussianNext.next();       5.39 ns         5.47 ns    100000000
xoshiro256StarStarBoxMullerGaussian.next();          9.30 ns         9.21 ns     74666667
mersenneTwisterBoxMullerGaussian.next();             11.7 ns         11.2 ns     56000000

WSL Ubuntu 22.04 CXX compiler GNU 11.4.0 approx. four times faster compared to BoxMuller with MersenneTwister:

Run on (16 X 2304.01 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 256 KiB (x8)
  L3 Unified 16384 KiB (x1)
Load Average: 0.27, 0.10, 0.27
-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
xoshiro256StarStarZigguratGaussianNext.next();       2.82 ns         2.82 ns    245704472
xoshiro256StarStarBoxMullerGaussian.next();          9.27 ns         9.27 ns     74894426
mersenneTwisterBoxMullerGaussian.next();             11.3 ns         11.3 ns     61261372

@lballabio
Copy link
Owner

Thanks! Out of curiosity, how does it compare to our current default, InverseCumulativeRng<MersenneTwisterUniformRng, InverseCumulativeNormal>?

@coveralls
Copy link

Coverage Status

coverage: 72.551% (+0.008%) from 72.543%
when pulling a4d5392 on ralfkonrad:feature/ZigguratGaussianRng
into 02426c0 on lballabio:master.

@ralfkonrad
Copy link
Contributor Author

That's interesting, @lballabio.

On pure Windows, the current default is slightly faster (4.4ns to 5.4ns), on WSL Ubuntu slower (4.8ns to 2.8ns)

Windows CXX compiler MSVC 19.40.33808.0

-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
xoshiro256StarStarZigguratGaussianNext.next();       5.39 ns         5.47 ns    100000000
xoshiro256StarStarBoxMullerGaussian.next();          9.30 ns         9.21 ns     74666667
mersenneTwisterBoxMullerGaussian.next();             11.7 ns         11.2 ns     56000000
inverseCumulativeRng.next();                         4.41 ns         4.35 ns    154482759

WSL Ubuntu 22.04 CXX compiler GNU 11.4.0

-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
xoshiro256StarStarZigguratGaussianNext.next();       2.82 ns         2.82 ns    245704472
xoshiro256StarStarBoxMullerGaussian.next();          9.27 ns         9.27 ns     74894426
mersenneTwisterBoxMullerGaussian.next();             11.3 ns         11.3 ns     61261372
inverseCumulativeRng.next();                         4.80 ns         4.80 ns    145683037

The benchmarks can be found here: https://github.com/ralfkonrad/ql_performance_testing, so you might also compare them on your MAC(?).

@lballabio
Copy link
Owner

The default is slightly slower on my Mac as well:

-----------------------------------------------------------------------------------------
Benchmark                                               Time             CPU   Iterations
-----------------------------------------------------------------------------------------
xoshiro256StarStarZigguratGaussianNext.next();       3.81 ns         3.81 ns    185110260
xoshiro256StarStarBoxMullerGaussian.next();          7.54 ns         7.53 ns     92264298
mersenneTwisterBoxMullerGaussian.next();             11.8 ns         11.8 ns     59168597
inverseCumulativeRng.next();                         4.30 ns         4.29 ns    163303394

@lballabio lballabio added this to the Release 1.35 milestone May 28, 2024
@lballabio lballabio merged commit 1089567 into lballabio:master May 28, 2024
42 checks passed
@ralfkonrad
Copy link
Contributor Author

Interesting, how differently this pure number crunching behaves on different architectures and compilers...

@ralfkonrad ralfkonrad deleted the feature/ZigguratGaussianRng branch May 28, 2024 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants