-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added corrections to re-enable reciprocal test in math_brute_force suite for relaxed math mode #2221
base: main
Are you sure you want to change the base?
Added corrections to re-enable reciprocal test in math_brute_force suite for relaxed math mode #2221
Conversation
Possibly dumb question: why are we introducing a special path through binary_operator_float to test reciprocal? Reciprocal is a unary operation, not a binary operation... |
Yep, good point. I went that way just because I wasn't able to choose a better spot. There is no unary float operator, right? To create one I would have to duplicate a lot of code. Any other clues? |
Treating reciprocal as a binary operator with its first argument being the constant 1 seems the least worst solution for now. I'm in the process of reducing code duplication in the math bruteforce suite, so eventually we may be able to move it into its own place. But for now I'm okay with @shajder's chosen solution. |
-reciprocal activated for relaxed float math -reciprocal test added for fp16 and fp64
INFINITY, | ||
FTZ_OFF, | ||
RELAXED_ON, | ||
binaryOperatorF }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I'm fine with this as-is, but one side effect of testing reciprocal as part of binaryOperatorF
is that we seem to be testing more than we need to. As one extreme example, if I only test the fp16 reciprocal, I'm seeing 128K kernel enqueues of 32K work-items.... even though there are only 64K possible fp16 values to take a reciprocal of. If I test a true fp16 unary function (e.g. asin) instead, I only see two 32K kernel enqueues, which covers all 64K fp16 values.
For a fp32 reciprocal, we're testing a total of 4B work-items (256K kernel enqueues of 16K work-items), but the data is generated randomly, so we're not guaranteed to have full coverage of all possible fp32 inputs (although we will certainly have fairly good coverage). I think if we properly considered reciprocal to be a "unary function" instead, we could guarantee full coverage of all fp16 and fp32 values. We will still need random sampling for fp64 testing, but that's unavoidable.
PS: With two input buffers for the "binary operator", we're also transferring twice as much input data as we need to. For some of the devices I'm testing it takes more time to transfer the data than it does to execute the reciprocal kernel, so the extra transfers are significant.
fixes #2145
As suggested by @svenvh reciprocal has different precision requirements than divide. This PR introduces special path for reciprocal for binar_float_operator to test reciprocal with relaxed math. If this PR will get approvals, invalidate PR #2162