Support sampling integers from discrete distributions #155

SabrinaJewson · 2021-11-05T07:53:08Z

No description provided.

henryjac · 2024-03-06T21:07:44Z

Very reasonable PR, adding to milestone 0.17 and we'll probably get this merged! Preliminary checks looks good to me.

vks · 2024-03-10T16:17:51Z

I would be careful with returning integers for distributions that use floats internally. For large numbers, this is lossy.

SabrinaJewson · 2024-03-10T21:15:26Z

For most the distributions, their implementations effectively use integers anyway and this PR just exposes that functionality.

The only exception is the geometric distribution:

x.log(1.0 - self.p).ceil() as u64

The largest finite value this expression can be is if x = 1.4e-45 and 1.0 - self.p = 0.9999999999999999, in which case we get 930262250532780300, which when casted to a u64 is 930262250532780288. Importantly it does stay within the range of a u64, so I would say the lossiness is acceptable here.

src/distribution/geometric.rs

SabrinaJewson · 2024-08-30T16:01:44Z

Now resolved the merge conflicts (and squashed everything to one commit).

codecov · 2024-08-31T15:21:19Z

Codecov Report

Attention: Patch coverage is 0% with 61 lines in your changes missing coverage. Please review.

Project coverage is 93.76%. Comparing base (a514992) to head (3deb714).
Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
src/distribution/multinomial.rs	0.00%	19 Missing ⚠️
src/distribution/geometric.rs	0.00%	10 Missing ⚠️
src/distribution/categorical.rs	0.00%	9 Missing ⚠️
src/distribution/binomial.rs	0.00%	6 Missing ⚠️
src/distribution/hypergeometric.rs	0.00%	6 Missing ⚠️
src/distribution/bernoulli.rs	0.00%	4 Missing ⚠️
src/distribution/discrete_uniform.rs	0.00%	4 Missing ⚠️
src/distribution/poisson.rs	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #155      +/-   ##
==========================================
- Coverage   94.03%   93.76%   -0.28%     
==========================================
  Files          53       53              
  Lines       12055    12090      +35     
==========================================
  Hits        11336    11336              
- Misses        719      754      +35

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

YeungOnion · 2024-09-01T00:54:49Z

Thanks for getting back on this despite how long this was left sitting!

Two requests,

Would you reword to use a conventional commit message so I can rely on automatic change log generation tools? Just a feat: prefix on the first line of the commit would be great.
Are you able to write some regression tests against the pre-existing float output?

YeungOnion · 2024-09-10T16:53:36Z

Hey, would you want me to resolve the conflicts? I imagine most of them are for the "rand" feature and you've already done the first merge conflict, and I think this is a good feature.

SabrinaJewson · 2024-09-22T13:07:20Z

I’ve rebased on master and used conventional commits. I’m not sure how to do the regression tests: random number generation is hard to test because, well, it’s random – we could fix a seed but I’m sceptical of adding that to the tests, because it’s not tesitng public API (meaning we could make a non-SemVer-breaking change and have to change the tests, which isn’t considered good testing practice).

That said, I’ve refactored some of the implementations to share as much code as possible, so it should be obviously the case that the correctness of the new algorithm is the same as the correctness of the old algorithm.

YeungOnion · 2024-09-22T18:33:33Z

Thanks! It all looks good, but I suppose it won't hurt to run CI checks again.

An aside regarding regression tests.
Initially I thought perhaps to just sample an integer and cast to float, then compare it to float, but as you mentioned at the start of this PR, most rely on integer internally, so that's just checking float representation accuracy.

YeungOnion · 2024-09-22T22:45:10Z

Thanks again!

(3 distributions) * (density evalution + sampling) lint: run clippy specify required features for random_clap random_clap can pmf for multinomial refactoring for multivariate densities set some default parameters, on distn basis fix: specify type sampled after merging statrs-dev#155 fix: sampling should use DistIter, but not collect it chore: lint for reserve-after-new doc: specify delimiter in help messages

henryjac added this to the 0.17 milestone Mar 6, 2024

YeungOnion reviewed May 24, 2024

View reviewed changes

src/distribution/geometric.rs Outdated Show resolved Hide resolved

SabrinaJewson force-pushed the sample-integer branch from b7efa3a to d09b3f4 Compare August 30, 2024 15:59

YeungOnion modified the milestones: 0.17, 0.18 Sep 18, 2024

feat: support sampling integers from discrete distributions

b97707c

SabrinaJewson force-pushed the sample-integer branch from d09b3f4 to b97707c Compare September 22, 2024 13:04

fix: MSRV

3deb714

YeungOnion merged commit ac52c63 into statrs-dev:master Sep 22, 2024
9 of 10 checks passed

SabrinaJewson deleted the sample-integer branch September 23, 2024 06:37

YeungOnion mentioned this pull request Sep 24, 2024

Feature request: Implement Distribution<u64> for Geometric #300

Closed

YeungOnion added a commit to YeungOnion/statrs that referenced this pull request Sep 24, 2024

fix: specify type sampled after merging statrs-dev#155

0457e05

YeungOnion added a commit to YeungOnion/statrs that referenced this pull request Dec 3, 2024

fix: specify type sampled after merging statrs-dev#155

2992dec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support sampling integers from discrete distributions #155

Support sampling integers from discrete distributions #155

SabrinaJewson commented Nov 5, 2021

henryjac commented Mar 6, 2024

vks commented Mar 10, 2024

SabrinaJewson commented Mar 10, 2024

SabrinaJewson commented Aug 30, 2024

codecov bot commented Aug 31, 2024 •

edited

Loading

YeungOnion commented Sep 1, 2024

YeungOnion commented Sep 10, 2024

SabrinaJewson commented Sep 22, 2024

YeungOnion commented Sep 22, 2024

YeungOnion commented Sep 22, 2024

Support sampling integers from discrete distributions #155

Support sampling integers from discrete distributions #155

Conversation

SabrinaJewson commented Nov 5, 2021

henryjac commented Mar 6, 2024

vks commented Mar 10, 2024

SabrinaJewson commented Mar 10, 2024

SabrinaJewson commented Aug 30, 2024

codecov bot commented Aug 31, 2024 • edited Loading

Codecov Report

YeungOnion commented Sep 1, 2024

YeungOnion commented Sep 10, 2024

SabrinaJewson commented Sep 22, 2024

YeungOnion commented Sep 22, 2024

YeungOnion commented Sep 22, 2024

codecov bot commented Aug 31, 2024 •

edited

Loading