[BUG] Numerical inaccuracy in summation based routines #379

krachyon · 2021-04-08T10:28:35Z

Describe the bug

Bottleneck's implementation of algorithms containing a summation yields different results than numpy for floats. This seems to stem from the fact that numpy uses some sort of compensated summation algorithm to increase accuracy, while bottleneck uses a straight sum, e.g.:
bottleneck/src/reduce_template.c

FOR {
    const npy_DTYPE0 ai = AI(DTYPE0);
    if (!bn_isnan(ai)) {
        asum += ai;
    }

To Reproduce

import numpy as np
import bottleneck

# adding float32.eps to 2.f gives 2.f so e.g. Kahan-summation is needed to get result != 2.f
arr = np.hstack(([np.float32(2.)], np.repeat(np.finfo(np.float32).eps, 100000).astype(np.float32)))
print('numpy: ', np.nansum(arr))
print('bottleneck: ', bottleneck.nansum(arr))

numpy:  2.011919
bottleneck:  2.0

System:
Linux-5.11.11-arch1-1-x86_64-with-glibc2.33
Python 3.9.2 (default, Feb 20 2021, 18:40:11)
[GCC 10.2.0]
bottleneck 1.3.2

Expected behavior
As implementations can be switched due to non-obvious reasons (like a fallback to numpy routines in the case of non-native byteorder), results between bottleneck-routines and numpy should match.
If a complete match of results is not attainable, the documentation should state clearly that bottleneck does not always reproduce numpy results.

Additional context
astropy/astropy#11492

The text was updated successfully, but these errors were encountered:

sebasv · 2021-08-04T19:55:48Z

If I draw up a PR with Kahan summation, does it have a chance of being accepted? Or will bottleneck refuse to take a performance hit for the sake of precision?

qwhelan · 2021-08-04T20:02:51Z

@sebasv Sorry for the lack of response here, I've had significantly less bandwidth this year.

I believe it's possible to match numpy's output while also being faster. I have some local commits that are unfinished that accomplish part of this - biggest issue is that I would want to fix this for every function in one release.

sebasv · 2021-08-05T11:16:47Z

Thank you for the quick response! Let me know if I can help in some capacity.

krachyon · 2021-08-06T22:22:15Z

If I draw up a PR with Kahan summation, does it have a chance of being accepted? Or will bottleneck refuse to take a performance hit for the sake of precision?

As you seem to have put a little more work into this than I did, do you happen to know if numpy uses Kahan, pairwise summation or something completely different ? I couldn't really follow the dispatch logic...

sebasv · 2021-08-07T07:19:20Z

I believe now that Numpy uses pairwise summation (see numpy/numpy#3685).
Naive summation has a O(n) error, pairwise has an O(log(n)) error and Kahan has an O(1) error. With a large base case, Naive and pairwise have equivalent speed (just minimal recursion overhead). Kahan requires about 4 times the number of additions. So perhaps pairwise is the best fit for Bottleneck?

krachyon added the bug label Apr 8, 2021

krachyon assigned qwhelan Apr 8, 2021

sebasv mentioned this issue Aug 4, 2021

BUG: np.mean(pd.Series) != np.mean(pd.Series.values) pandas-dev/pandas#42878

Closed

3 tasks

mathause mentioned this issue Oct 21, 2021

Rolling() gives values different from pd.rolling() pydata/xarray#5877

Open

bluss mentioned this issue Jun 16, 2022

BUG: aggregation of np.float16/np.float32 is wrong for big dataset pandas-dev/pandas#47370

Closed

3 tasks

JMBurley mentioned this issue Jul 15, 2022

opt out of bottleneck for nanmean pandas-dev/pandas#47716

Merged

5 tasks

mathause mentioned this issue Oct 5, 2022

rolling().sum() is numerically unstable pydata/xarray#7128

Closed

3 tasks

saimn mentioned this issue Oct 16, 2024

Unexpected results from sigma_clipped_stats for large np.float32 input arrays astropy/astropy#17185

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Numerical inaccuracy in summation based routines #379

[BUG] Numerical inaccuracy in summation based routines #379

krachyon commented Apr 8, 2021 •

edited

Loading

sebasv commented Aug 4, 2021

qwhelan commented Aug 4, 2021

sebasv commented Aug 5, 2021

krachyon commented Aug 6, 2021

sebasv commented Aug 7, 2021

[BUG] Numerical inaccuracy in summation based routines #379

[BUG] Numerical inaccuracy in summation based routines #379

Comments

krachyon commented Apr 8, 2021 • edited Loading

sebasv commented Aug 4, 2021

qwhelan commented Aug 4, 2021

sebasv commented Aug 5, 2021

krachyon commented Aug 6, 2021

sebasv commented Aug 7, 2021

krachyon commented Apr 8, 2021 •

edited

Loading