Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SmallRng: Replace PCG algorithm with xoshiro{128,256}++ #1038

Merged
merged 8 commits into from
Sep 15, 2020

Conversation

vks
Copy link
Collaborator

@vks vks commented Sep 6, 2020

Due to close correlations of PCG streams (#907) and lack of right-state propagation (#905), the SmallRng algorithm is switched to xoshiro{128,256}++. The implementation is taken from the rand_xoshiro crate and slightly simplified.

Fixes #910.

src/rngs/mod.rs Show resolved Hide resolved
@vks
Copy link
Collaborator Author

vks commented Sep 8, 2020

I decided to use the default impl of seed_from_u64 instead of pulling in the splitmix RNG as well. This means that the results will be different than with rand_xoshiro. I think this is okay, since we don't guarantee value stability, but maybe we should avoid this gotcha.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I like the simplified implementations and agree with the decision not to use SplitMix for seed_from_u64 (IIRC the official advice is simply to use a significantly different PRNG, so our existing PCG32 impl should be fine).

src/rngs/small.rs Outdated Show resolved Hide resolved
Comment on lines 44 to 67
impl RngCore for Xoshiro256PlusPlus {
#[inline]
fn next_u32(&mut self) -> u32 {
self.next_u64() as u32
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This takes the low-order bytes. @vigna is it preferable to take the high-order bytes given that we're discarding half anyway? Or is there no point caring for the ++ variant?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I'd take the high bits (longer carry chain propagation, which means higher linear complexity), but, really, the difference in quality is not detectable.

Note that if you use PCG32 to initialize the state usinga 64-bit seed, seeds with a large number of lower bits in common will initialize the state in a similar way (try with 0 and 2^63), and consequently the very first outputs will be correlated. The effect might be however very mild in practice.

You can avoid this problem by mixing the seed with a MurmurHash mix (or any other quick bijective mixing function) before using it as a seed for PCG32 (maybe you're already doing something like that, in which case forget this comment).

Of course there will be always seeds leading to similar initialization if you use PCG32, but at least they will be random-looking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking the higher bits makes sense (this should also be done for the implementations in rand_xoshiro).

@vigna Would switching to splitmix for 64-bit seeding also avoid this problem?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, using SplitMix avoids visible dependence on the 64-bit seed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's an issue with our seed_from_u64 method, I'd prefer we deal with that directly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. So the issue propagates to every generator using that method. I would have used something with less dependence from the initial value, but I agree that it's unlikely significant problems might surface.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhardy

I'm tempted to ignore it: it's not very significant.

What is the argument for continuing to use PCG? Avoiding churn? I think it would be nice to avoid issues such as #905, even if it does not matter that much in this case.

We could just use a cryptographic hash function here; it's not performance critical, though we prefer to minimise code size.

Switching to splitmix is approximately the same number of lines and does not suffer from the correlation problem. As far as I can tell, the only downside is that this is a value-breaking change for everyone using seed_from_u64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vks #905 is not an "issue" so much as incorrect documentation. To quote @vigna:

Personally, I don't believe in the "middle ground" of "difficult-to-predict-but-not-crypto": usually it just means it is so easy to break that nobody tries. I mean, I can claim that xoshiro256++ is "difficult to predict"—and maybe that's even true. But until someone tries for real, that's just a vague and unsubstatiated claim.

The main issue with PCG, as far as I am aware, is that streams tend to be quite similar; this is (part) of the issue in #1032 for example. This really doesn't matter for seed_from_u64 since it only uses a single stream.

If we really want a perfect initialiser we should use a proven cryptographic PRNG or hash function, but to some extent this is overkill considering we only start with (a maximum of) 64-bits of entropy. It seems to me it is "good enough" for intended uses. If @vigna strongly disagrees, I will listen to his advice (though what I understood from the above is that we should use a hash function such as murmurhash and a PRNG).

We also are trying to keep the code-size of rand_core minimal however. I'm not entirely convinced the split between rand and rand_core still makes sense, but changing that now is more churn than justified in a mostly-mature library.

Between SplitMix64 and PCG32, I remain unconvinced that either is significantly better than the other. PCG does more mixing overall (since it emits only half the state per round, and does both multiplication and addition vs just addition), while SplitMix may have a stronger output function. Either way, I think getting good avalanche from low-order bits is much more important than from high-order bits due to the way users are likely to use this, and in this case I think both PRNGs are acceptable. But if @vigna knows better, please do correct me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#905 is not an "issue" so much as incorrect documentation.

I was referring to the issue with similar sequences for different seeds.

This really doesn't matter for seed_from_u64 since it only uses a single stream.

Yes, but a similar issue persists anyway: Different (but similar) seeds give very similar sequences, as discussed above.

I'm not entirely convinced the split between rand and rand_core still makes sense, but changing that now is more churn than justified in a mostly-mature library.

I agree. For Rand 1.0 I could imagine merging the two crates and making rand without default features equivalent to the old rand_core.

Between SplitMix64 and PCG32, I remain unconvinced that either is significantly better than the other. PCG does more mixing overall (since it emits only half the state per round, and does both multiplication and addition vs just addition), while SplitMix may have a stronger output function. Either way, I think getting good avalanche from low-order bits is much more important than from high-order bits due to the way users are likely to use this, and in this case I think both PRNGs are acceptable. But if @vigna knows better, please do correct me.

Alright, I will revert the splitmix commit. If necessary, we can perform this change in the future.

Copy link

@vigna vigna Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the issues with PCG is that some subsequences of the same generator are very correlated (similar bit patterns), as shown above. You're confusing this issue with the issue that sequences from different streams are similar. The first one is obviously proven by the example you're showing. This is the issue at hand.

The persistent bit patterns of PCG32 for correlated seed cannot happen with SplitMix, so, yes, there is a significant and measurable difference. You can take two states of PCG32 and get highly correlated subsequences in which repeated bit patterns appear. You cannot do the same with SplitMix. You might be unconvinced, but it's mathematics. I'm not claiming that SplitMix will have uncorrelated subsequences, because the state mixing happens only leftward, as with PCG, but the problem is mitigated enormously by the powerful mix function.

If you're OK with repeated bit patterns in the initial state of your generators, that's another question—it's your library, if you want them, keep them.

@vks
Copy link
Collaborator Author

vks commented Sep 10, 2020

@dhardy I restored the PCG32-based implementation of seed_from_u64. Is this ok to merge?

@dhardy
Copy link
Member

dhardy commented Sep 14, 2020

@vks thanks for the quick changes, but we should finish planning first. @vigna has a good argument that SplitMix is slightly better for seed_from_u64 and I cannot counter that, but...

  1. Breaking changes should be minimised. This isn't applicable for SmallRng since it's changing anyway, but is for other RNGs.
  2. There is now some vague possibility of a significant redesign which might impact what we ultimately wish to do here (after v0.8) by allowing the use of a stronger RNG/hash function.

Perhaps, then, the best option is to override seed_from_u64 to use SplitMix for the new SmallRng but not change anything else for now: that way, users of other RNGs are not impacted for now.

@vigna
Copy link

vigna commented Sep 14, 2020

Yes, that's probably the best solution. Fix it locally for now, put together a more comprehensive solution for the future.

Note that if you use a 64-bit seed to initialize a larger state, there are some things which are mathematically impossible to avoid. For example, there will be always seeds leading to a similar first word of state (or any fixed word of state): if the initializing generator you are using does not output every possible value (e.g., PCG32), there will be even some seeds leading to an identical first word of state; and even if the initializing generator outputs exactly once every possible 64-bit value, there will be first words of state sharing, say, the lower 63 bits. This is a fact of life—you cannot avoid this even with a crypto-strength initializing generator.

What you can avoid is that more than one word of state is correlated, because you are tapping into a much larger state space. A crypto-strength initializing generator will do that for you. SplitMix will provide words that have no visible similarity artifacts, and that are sufficiently uncorrelated to initialize a generator. It is true, however, that if you use two seeds differing only in the higher bit the difference in initialization will be only due to the spreading of the influence of that bit by the mixing function (which, once again, is more than sufficient IMHO).

@vks
Copy link
Collaborator Author

vks commented Sep 14, 2020

Alright, I migrated SmallRng::from_seed_u64 to SplitMix64, avoiding the value-breaking change to rand_core that can be considered after Rand 0.8 is released. I think this PR can be merged now?

Comment on lines 22 to 23
/// Note that depending on the application, [`StdRng`] is faster on many modern
/// platforms while providing higher-quality randomness. Furthermore, `SmallRng`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we really say it is faster on many platforms? Micro-benchmarks aren't fully representative, and I don't think we have enough other data to go on, so better simply to say that it may be faster?

Comment on lines 25 to 26
/// - Security against prediction or reproducibility are important.
/// Use [`StdRng`] instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is two separate things: don't recommend StdRng for reproducibility!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just remove "reproducibility", because this is discussed in detail below.

vks added 8 commits September 15, 2020 15:40
Due to close correlations of PCG streams (rust-random#907) and lack of right-state
propagation (rust-random#905), the `SmallRng` algorithm is switched to
xoshiro{128,256}++. The implementation is taken from the `rand_xoshiro`
crate and slightly simplified.

Fixes rust-random#910.
@vks vks merged commit f01b65d into rust-random:master Sep 15, 2020
@vks vks deleted the replace-pcg branch September 15, 2020 13:56
jfroy added a commit to jfroy/rrt that referenced this pull request Feb 22, 2021
This update in particular changes the SmallRng to xoshiro, which I
explored in an experimental branch. Benchmarking shows that this change
is performance-neutral and the output looks the same. See
rust-random/rand#1038 for more.

The experimental branch was more complicated because it replaced the
use of thread_rng with a custom rayon pool. I may eventually pick up
that branch again because it offers more control, but for now this new
rand version gets me the better PRNG.

AMD Ryzen 9 3900X 12-Core Processor (AMD64 Family 23 Model 113 Stepping 0)

tracescene/10x10x4      time:   [295.42 us 295.79 us 296.17 us]
                        change: [-1.4129% -1.1793% -0.9318%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  5 (5.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
jfroy added a commit to jfroy/rrt that referenced this pull request Feb 22, 2021
This update in particular changes `SmallRng` to xoshiro. `SmallRng` is
only used for scene construction, as the core continues to use
`thread_rng`, which wraps `StdRng`. See
rust-random/rand#1038 for more.

As expected, benchmarking shows that this change is performance-neutral
and the output looks correct.

There is an ongoing experimental branch that effectively implements a
custom thread-local RNG solution using a custom rayon pool. This will
in particular allow using xoshiro in the core trace loop.

Benchmarking:

AMD Ryzen 9 3900X 12-Core Processor (AMD64 Family 23 Model 113 Stepping 0)

tracescene/10x10x4      time:   [295.42 us 295.79 us 296.17 us]
                        change: [-1.4129% -1.1793% -0.9318%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  5 (5.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate replacements for SmallRng algorithm
4 participants