-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-write median(Binomial) to improve speed #1928
Re-write median(Binomial) to improve speed #1928
Conversation
Follow suggestion from JuliaStats#1926 to reduce `cdf(Binomial)` evaluations.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1928 +/- ##
==========================================
+ Coverage 85.99% 86.01% +0.01%
==========================================
Files 144 144
Lines 8684 8696 +12
==========================================
+ Hits 7468 7480 +12
Misses 1216 1216 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be useful to link to the paper in a comment. It would make it easier to reason about the implementation for a future reader of the source.
Co-authored-by: Andreas Noack <[email protected]>
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: Andreas Noack <[email protected]>
Co-authored-by: David Widmann <[email protected]>
Following the suggestions, given rv = Binomial(25,0.3)
rv2 = Binomial(25,0.756) the previous version yielded julia> @benchmark median(rv)
BenchmarkTools.Trial: 10000 samples with 33 evaluations.
Range (min … max): 924.576 ns … 9.896 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 929.455 ns ┊ GC (median): 0.00%
Time (mean ± σ): 980.001 ns ± 151.139 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▅ ▁ ▁ ▄ ▃ ▁
██▅▇▇▇▇▅▇▅▃▇▄▅▇▁▃█▃██▁▁█▇▅███▃▆█▇▅▃█▆▃▄▇▄▁▁▄█▅▁▃▃█▅▄▁▅▇▆▄▁▁▃▇ █
925 ns Histogram: log(frequency) by time 1.49 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark median(rv2)
BenchmarkTools.Trial: 10000 samples with 26 evaluations.
Range (min … max): 941.346 ns … 13.187 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 945.231 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.007 μs ± 187.974 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█ ▁ ▂ ▂ ▄ ▃ ▁
█▇█▅▅█▅▆▇▄█▅▁▇▅▃██▁▃█▃▁█▅▁▃█▃▁▄█▅▆▇█▄▅▅█▇▄▃▁█▄▁▃▃█▃▁▁▁▆█▁▄▄▁█ █
941 ns Histogram: log(frequency) by time 1.51 μs <
Memory estimate: 0 bytes, allocs estimate: 0. and the version version yields julia> @benchmark median(rv)
BenchmarkTools.Trial: 10000 samples with 469 evaluations.
Range (min … max): 225.928 ns … 557.132 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 228.358 ns ┊ GC (median): 0.00%
Time (mean ± σ): 235.036 ns ± 21.306 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
██▂▆▇▁ ▁▁ ▁ ▂
████████▅▅▅▇▇▆▇▅█▇▄▄█▆▄▅▆▅▄▄▅▅▄▄██▆▅▄█▇▅▅▄▇▅▃▄▂▄▄▃▄▄▄▆▄▃▄▄▅▅▄ █
226 ns Histogram: log(frequency) by time 340 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark median(rv2)
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
Range (min … max): 17.037 ns … 52.826 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 18.434 ns ┊ GC (median): 0.00%
Time (mean ± σ): 19.582 ns ± 2.704 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅██▇▇▅▆▆▅▃ ▁ ▁▂ ▁▁▃▂▄▄▄▃▃▂▃▂ ▁▁ ▂
▄▇██████████▇█▇▄███▇██▇▇███████████████▇▆▇▇▇▆▇▇▇▄▆▇▃▇▆▅█▂▇▆ █
17 ns Histogram: log(frequency) by time 29 ns <
Memory estimate: 0 bytes, allocs estimate: 0. I will check how much the |
Here is a more thorough comparison. To get full coverage across the branches, I extended the benchmarks to rv = Binomial(25,0.3)
rv2 = Binomial(25,0.756)
rv3 = Binomial(25,1//2)
rv4 = Binomial(25, 3//5) and then ran function median(dist::Binomial)
bound = min(dist.p, 1-dist.p) # from http://dx.doi.org/10.1111/j.1467-9574.1980.tb00681.x
dist_mean = mean(dist)
floor_mean = floor(Int, dist_mean)
difference = dist_mean - floor_mean
if difference < bound
floor_mean
elseif difference > 1 - bound
floor_mean + 1
elseif cdf(dist, floor_mean) >= 0.5
floor_mean
else
floor_mean + 1
end
end The results are julia> for (a,b) in zip(b2,b3)
println(BenchmarkTools.judge(median(a),median(b)))
end
TrialJudgement(+0.40% => invariant)
TrialJudgement(-9.69% => improvement)
TrialJudgement(-0.04% => invariant)
TrialJudgement(-3.12% => invariant)
julia> for (a,b) in zip(b2,b3)
println(BenchmarkTools.judge(var(a),var(b)))
end
TrialJudgement(+12.14% => regression)
TrialJudgement(-49.32% => improvement)
TrialJudgement(-22.95% => improvement)
TrialJudgement(-12.07% => improvement) which means the simplification is a definite improvement, both for the median and variance. The new bug free implementations are still much slower than the buggy implementation, but that feels like an unfair comparison. julia> for (a,b) in zip(b1,b2)
println(BenchmarkTools.judge(median(a),median(b)))
end
TrialJudgement(+308.74% => regression)
TrialJudgement(+5373.54% => regression)
TrialJudgement(+343.01% => regression)
TrialJudgement(+3152.94% => regression)
julia> for (a,b) in zip(b1,b3)
println(BenchmarkTools.judge(median(a),median(b)))
end
TrialJudgement(+310.37% => regression)
TrialJudgement(+4843.23% => regression)
TrialJudgement(+342.83% => regression)
TrialJudgement(+3051.34% => regression) |
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
I see the following difference to the master branch: This PRjulia> using Distributions, Chairmarks
julia> @be median($(Binomial(44, 0.1)))
Benchmark: 3317 samples with 186 evaluations
min 135.301 ns
median 152.554 ns
mean 153.703 ns
max 250.892 ns
julia> @be median($(Binomial(40, 0.1)))
Benchmark: 4932 samples with 4265 evaluations
min 3.976 ns
median 4.298 ns
mean 4.556 ns
max 9.066 ns
julia> @be median($(Binomial(25, 0.3)))
Benchmark: 3891 samples with 147 evaluations
min 137.755 ns
median 157.592 ns
mean 163.907 ns
max 284.014 ns
julia> @be median($(Binomial(25, 0.756)))
Benchmark: 4933 samples with 3917 evaluations
min 4.266 ns
median 4.606 ns
mean 4.900 ns
max 21.966 ns
julia> @be median($(Binomial(25, 0.5)))
Benchmark: 4809 samples with 4545 evaluations
min 3.988 ns
median 4.171 ns
mean 4.351 ns
max 9.332 ns
julia> @be median($(Binomial(25, 0.6)))
Benchmark: 4921 samples with 4227 evaluations
min 3.982 ns
median 4.298 ns
mean 4.538 ns
max 9.148 ns Master branchjulia> using Distributions, Chairmarks
julia> @be median($(Binomial(44, 0.1)))
Benchmark: 3387 samples with 125 evaluations
min 214.664 ns
median 215.672 ns
mean 222.091 ns
max 366.672 ns
julia> @be median($(Binomial(40, 0.1)))
Benchmark: 4111 samples with 86 evaluations
min 239.826 ns
median 253.395 ns
mean 265.475 ns
max 431.686 ns
julia> @be median($(Binomial(25, 0.3)))
Benchmark: 3938 samples with 101 evaluations
min 215.347 ns
median 225.663 ns
mean 234.931 ns
max 417.901 ns
julia> @be median($(Binomial(25, 0.756)))
Benchmark: 3951 samples with 94 evaluations
min 225.617 ns
median 242.904 ns
mean 250.102 ns
max 419.777 ns
julia> @be median($(Binomial(25, 0.5)))
Benchmark: 3920 samples with 100 evaluations
min 216.250 ns
median 227.080 ns
mean 237.701 ns
max 419.170 ns
julia> @be median($(Binomial(25, 0.6)))
Benchmark: 3643 samples with 105 evaluations
min 217.057 ns
median 233.733 ns
mean 245.306 ns
max 732.143 ns |
Co-authored-by: David Widmann <[email protected]>
Fix #1926 by following suggestions therein to reduce avg number of
cdf(Binomial)
evaluations.