Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise Rng methods #293

Closed
5 tasks done
dhardy opened this issue Mar 10, 2018 · 13 comments
Closed
5 tasks done

Revise Rng methods #293

dhardy opened this issue Mar 10, 2018 · 13 comments
Labels
B-API Breakage: API F-new-int Functionality: new, within Rand

Comments

@dhardy
Copy link
Member

dhardy commented Mar 10, 2018

Summary:


Original post:

Rng has:

  • fill(dest), try_fill(dest) — will slices/arrays; mostly added to avoid need to use RngCore for its fill_bytes
  • sample(distribution) — alternative to distribution.sample(rng); possibly not very useful
  • gen() — sample anything supporting the Uniform distribution; @clarcharr suggested removing this but I don't think that is likely
  • gen_iter(), gen_ascii_chars() — deprecated
  • gen_range(low, high) — shortcut for Range::sample_single(low, high, rng); probably worth keeping
  • gen_weighted_bool(n) — simple extension of gen_range(); likely not used massively but simple and clear, so I don't see a motive to remove
  • choose(slice), choose_mut(slice) — get a ref to a random element; simple and somewhat useful
  • shuffle(slice) — shuffling algorithm; since this is an actual algorithm and not so directly related to RNGs it may be better removing from Rng and adding some shuffle trait instead (allowing slice.shuffle(rng))

Possible additions:

  • p(p) / chance(p) / gen_bernoulli(p) — sample a boolean with given chance (i.e. roughly rng.gen() < p, but we might implement a distribution with more accurate sampling for small p)

Summary (suggested changes):

  • gen_iter and gen_ascii_chars are being removed
  • shuffle may be removed
  • sample and gen could be removed, but less likely
  • a method to sample booleans may be added
@dhardy dhardy added E-question Participation: opinions wanted B-API Breakage: API labels Mar 10, 2018
@pitdicker
Copy link
Contributor

I always though of gen_weighted_bool(n) as a strange function. It will only work for small fractions like 1/2, 1/3, 1/4, 1/5. It can't be used to generate bools with something like 40, 60 or 80% chance. And I don't imagine it ever getting used with a large n, because the difference between 1/99 and 1/100 is not often meaningful.

For such small integers the implementation of gen_weighted_bool(n) to use gen_range is maybe not even the best choice. For gen_weighted_bool(5) it can be much faster to just compare a generated value against u32::MAX / 5 (instead of using gen_range), with just about unmeasurable bias.

Or gen_weighted_bool() could be changed to use a floating point fraction between 0.0 and 1.0. That wouls also be faster and more flexible to what we have now.

Because using gen_weighted_bool is not all that usable at the moment beyond chances between 1/2 and 1/6, and because writing things manually is just as short and easy, I would suggest removing it.

Examples of what to replace rng.gen_weighted_bool(3) with (from fastest to slowest):

  • rng.gen() < std::u32::MAX / 3
  • rng.gen() < (1.0f32 / 3.0)
  • rng.gen_range(0, 3) < 1

@pitdicker
Copy link
Contributor

Forgot to say, good overview!

  • sample(distribution) — alternative to distribution.sample(rng); possibly not very useful

I think it is very useful, for the reason that it makes it easy to use distributions. And I see it as the counterargument against removing gen() because it makes using other kinds of distributions as easy as Uniform.

  • shuffle(slice) — shuffling algorithm; since this is an actual algorithm and not so directly related to RNGs it may be better removing from Rng and adding some shuffle trait instead (allowing slice.shuffle(rng))

I don't think the reason "this is an actual algorithm and not so directly related to RNGs" is very strong. Especially because I don't see even the possibility of a different algorithm (although we could make it faster by making much better use of the bits generated by the RNG). But making it possible to write slice.shuffle(rng) seems like a win.

I would definitely like to keep gen() and gen_range(low, high), and choose(slice) and choose_mut(slice) also seem like a good idea to keep.

If we add some other method to sample from booleans, gen_weighted_bool(n) definitely has to go. I wouldn't pick names like p(p) or gen_bernoulli(p). If you have a mathematical background these names are fine, but I would pick a name that speaks to more people. Maybe just gen_bool(p). But I am happy to have none.

@dhardy
Copy link
Member Author

dhardy commented Mar 11, 2018

Fair point about gen_weighted_bool; I'm happy to lose that.

I don't think we should support rng.shuffle(slice) and slice.shuffle(rng). If we prefer the latter, we have to add another trait that users have to import.

I guess gen_bool is as good a name as any. We don't have to add this, but the alternative is that users tend to write rng.gen() < p a lot, which is not quite so clear and doesn't let us try to improve precision near 0.

@TheIronBorn
Copy link
Contributor

TheIronBorn commented Mar 11, 2018 via email

vks added a commit to vks/rand that referenced this issue Mar 12, 2018
@vks
Copy link
Collaborator

vks commented Mar 12, 2018

We could move shuffle and choose to rand::algorithms or similar, under a different trait or as free functions. This would be less ergonomic however. On the other hand, slice.shuffle(rng) is closer to how sort and friends work.

Generating weighted booleans seems very niche and could be supported more cleanly by distributions, so I thinks it makes sense to remove gen_weighted_bool.

sample(distribution) — alternative to distribution.sample(rng); possibly not very useful

Note that it is also possible to use Distribution::sample(&dist, &mut rng) and Rng::sample(&mut rng, dist). They are not equivalent, because the first burrows the distribution while the last consumes it. Currently, this does not make a difference in rand, because all distributions seem to be zero-sized types or implement Copy.

Personally, I prefer Distribution::sample, because this is where the more important documentation is (the RNGs are interchangeable, the distributions not so much). Also, this is where I would expect the other statistical functions/properties of the distribution that might be added in the future (see #290). So I think we should remove Rng::sample as well as Rng::gen and Rng::gen_range. (FWIW, this is similar to the C++ standard library.)

@pitdicker
Copy link
Contributor

To put one more idea in the mix: how would we describe the trait?

I would describe Rng as an ergonomic interface to various common uses of randomness. And RngCore the trait with basic functionality RNGs should implement.

Would it then make sense to rename Rng to Random, and RngCore to Rng?

@pitdicker
Copy link
Contributor

What about making a trait and module for use with slices? Than you can do both: slice.shuffle(&mut rng), and rng.shuffle(&mut slice). And the implementation could live in the specialized module, not in the Rng trait.

@dhardy
Copy link
Member Author

dhardy commented Mar 12, 2018

Yes, putting this all in one place makes sense, though I don't know that we need it in Rng too. See: dhardy#82 (comment)

@vks
Copy link
Collaborator

vks commented Mar 13, 2018

If we do that and move the sampling to distributions, we could get rid of the the Rng/RngCore distinction.

@dhardy
Copy link
Member Author

dhardy commented Mar 13, 2018

In theory yes, but I'm not inclined to remove gen or fill.

@dhardy dhardy added F-new-int Functionality: new, within Rand P-medium and removed E-question Participation: opinions wanted labels Mar 14, 2018
@dhardy
Copy link
Member Author

dhardy commented Mar 17, 2018

There are two minor issues with renaming Rng → Random:

  1. Breakage — the current renaming was carefully structured so that a lot of existing code using the old Rng will continue to work without modification
  2. Random as a name less obviously describes the purpose — e.g. foo<R: Random>(rand: &mut R) -> u32 { rand.gen() } is less clear than with the current names IMO

The idea has some merit, but needs more discussion than as a little question in a vaguely-related issue. If you want to push the idea further create a new issue and try to catch more attention (labels, maybe a reddit post, examples of user code). I kind of like the current approach though because RngCore is obviously more of a details name, so instantly says the type is less likely of interest to end users than Rng.

@pitdicker
Copy link
Contributor

You are right. I only thought about the imports that would have to change, but this also impacts things like trait bounds. And there is something to say for a clear 'details' trait name as RngCore.

@pitdicker
Copy link
Contributor

I think this issue is solved?

@dhardy dhardy closed this as completed Mar 26, 2018
pitdicker pushed a commit that referenced this issue Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-API Breakage: API F-new-int Functionality: new, within Rand
Projects
None yet
Development

No branches or pull requests

4 participants