You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An algorithm update to improve non-FMA precision would of course be appreciated. For my use-case, however, I don't need the full 1.0 ULP precision. It would be fine to just update the docs to state that the error bound is 2.0 ULP when FMA isn't available.
The text was updated successfully, but these errors were encountered:
Thanks for writing and maintaining such a fast and useful library!
I've noticed some precision issues where the erf function with f32 inputs may exceed the documented 1.0 ULP error bound when FMA is not available.
I ran my testing on x86, where scalar and AVX2 return consistently accurate values, but SSE4 and SSE2 sometimes return inaccurate values.
For example, the input
0x1.16945ap-126
(1.279174322e-38
) has a 1.3 ULP error, and the input-0x1.d1b6c8p-127
(-1.069226882e-38
) has 1.8 ULP error.Here is some example code to reproduce these results, tested on a fresh pull of the main branch (a99491a):
When run, this outputs:
It appears the differences arise in the call to
dfmul_vf2_vf2_vf
on this line 3524 ofsleefsimdsp.c
sleef/src/libm/sleefsimdsp.c
Line 3524 in a99491a
An algorithm update to improve non-FMA precision would of course be appreciated. For my use-case, however, I don't need the full 1.0 ULP precision. It would be fine to just update the docs to state that the error bound is 2.0 ULP when FMA isn't available.
The text was updated successfully, but these errors were encountered: