Random and pseudo-random generation of portable packed SIMD vector types #497

gnzlbg · 2018-06-06T13:26:46Z

To showcase the portable packed SIMD vector facilities in std::simd I've ported the Ambient Occlusion ray casting benchmark (aobench) from ISPC to Rust: https://github.com/gnzlbg/aobench

The scalar version of the benchmark needs to generate random f32s, and the vectorized version of the benchmark needs to generate pseudo-random SIMD vectors of f32s. I did not know how to do that with the rand crate, so I've ended up hacking a pseudo-random number generator for the scalar version (src here), and explicitly vectorizing it for generating SIMD vectors (src here).

I've added some benchmarks (src here):

scalar (Xeon E5-2690 v4 @ 2.60GHz): throughput: 174*10^6 f32/s, 5.7ns per function call
vector (Xeon E5-2690 v4 @ 2.60GHz): throughput: 2072*10^6 f32/s (12x larger), 3.8ns per function call (generates one f32x8 per call)
scalar (Intel Core i5 @1.8 GHz): throughput: 190*10^6 f32/s, 5.2ns per function call
vector (Intel Core i5 @1.8 GHz)): throughput: 673*10^6 f32/s (3.5x larger), 11.9ns per function call (generates one f32x8 per call)

These numbers do not make much sense to me (feel free to investigate further), but they hint that explicitly vectorized PRNG might make sense in some cases. For example, if my intent was to populate a large vector of f32s with pseudo-random numbers, on my laptop the vector version has twice the latency but still 3.5x higher throughput.

It would be cool if some of the pseudo-random number generators in the rand crate could be explicitly vectorized to generate random SIMD vectors.

The text was updated successfully, but these errors were encountered:

gnzlbg · 2018-06-06T13:30:20Z

Duplicate of #377

dhardy · 2018-06-06T15:20:31Z

Interesting case study. No, those benchmarks don't make much sense (vector generation takes less time than scalar on Xeon?). What happens to throughput if you run this in many threads at once? (Hyperthreading and possibly frequency adjustment should reduce the gains.)

Ah, your RNG is essentially several copies of a small RNG. I guess this is an easy way to construct a fast SIMD RNG, though probably better speed/quality compromises are possible. Can I ask, is there any reason why transmuting a large integer or byte-array shouldn't work well (e.g. next_u256()), other than Endianness (which we try to address but don't technically have to for every RNG).

gnzlbg · 2018-06-07T08:31:16Z

What happens to throughput if you run this in many threads at once?

I'll try to benchmark this, probably can come up with something using rayon::split.

Can I ask, is there any reason why transmuting a large integer or byte-array shouldn't work well (e.g. next_u256())

I can't think of any serious reason for this. For vectors of floating-point numbers, one typically wants to avoid generating NaNs, but that's something that every implementation needs to deal with. Otherwise, transmuting a [u32; 8] into a f32x8 should just work (the f32s values might be endian dependent).

dhardy · 2018-06-07T09:45:02Z

I wasn't talking about converting ints to floats by transmutation, just e.g. u128 -> [u32; 4].

gnzlbg · 2018-06-07T12:09:51Z

@dhardy For 256-bit we would need u256 or [u128; 2] I guess.

gnzlbg closed this as completed Jun 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random and pseudo-random generation of portable packed SIMD vector types #497

Random and pseudo-random generation of portable packed SIMD vector types #497

gnzlbg commented Jun 6, 2018 •

edited

Loading

gnzlbg commented Jun 6, 2018

dhardy commented Jun 6, 2018

gnzlbg commented Jun 7, 2018

dhardy commented Jun 7, 2018

gnzlbg commented Jun 7, 2018

Random and pseudo-random generation of portable packed SIMD vector types #497

Random and pseudo-random generation of portable packed SIMD vector types #497

Comments

gnzlbg commented Jun 6, 2018 • edited Loading

gnzlbg commented Jun 6, 2018

dhardy commented Jun 6, 2018

gnzlbg commented Jun 7, 2018

dhardy commented Jun 7, 2018

gnzlbg commented Jun 7, 2018

gnzlbg commented Jun 6, 2018 •

edited

Loading