-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ULP requirements for fp16 divide #1278
Comments
As an additional consideration, for fp32 the specification defines the
In case implementations want to keep advertising a correctly rounded divide for half/double, we could consider extending the |
Discussed in the November 5th teleconference. The decisions we need to make are:
|
See related CTS issue: KhronosGroup/OpenCL-CTS#1996 |
Agreed on WG call 11/12 to:
|
With our current GPU implementation, we have a 0.5 ULP error with some inputs:
If I relax the ULP requirements to 0.5 ULP then the test passes:
Tested on an Arc A750 with the command line: Our current CPU implementation passes without relaxing the ULP requirements. |
Qualcomm would like to request 1 ULP for divide. |
Referring to OpenCL C spec on ULP requirements
ULP requirements for single precision divide (x/y) and reciprocal(1.0/x) are ≤ 2.5 ulp
However, for half precision these are defined as needing to be 'correctly rounded'
We would like to propose that these be defined with specific (lower) ULP, following the pattern for other built-ins.
We would further like to set a precision requirement of <= 1 ulp for both of these cases for fp16.
Double precision ULP for these cases also suffer from the same discrepancy (specific ULP for float, correctly rounded for double) so these should be reviewed as well.
The text was updated successfully, but these errors were encountered: