SPIR-V Environment Spec does not describe required accuracy for OpFMod and OpFRem #859

bashbaug · 2022-11-10T00:18:37Z

See related discussion: KhronosGroup/OpenCL-CTS#1548

In short, the OpenCL SPIR-V environment spec currently does not describe the required accuracy for the OpFMod and OpFRem instructions (although it does describe the required accuracy for the fmod and remainder extended instructions).

OpenCL SPIR-V environment spec reference:

https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Env.html#_ulp_values_for_math_instructions_full_profile

The spirv_new CTS tests are currently testing that OpFMod and OpFRem matches the behavior of these OpenCL C instructions, which may or may not be correct:

#define spirv_frem(a, b) fmod(a, b)              
#define spirv_fmod(a, b) copysign(fmod(a,b),b)

For Vulkan, the accuracy of OpFMod and OpFRem is derived from trunc and floor, which makes them cheap but potentially highly inaccurate:

https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#spirvenv-op-prec

The OpFRem and OpFMod instructions use cheap approximations of remainder, and the error can be large due to the discontinuity in trunc() and floor(). This can produce mathematically unexpected results in some cases, such as FMod(x,x) computing x rather than 0, and can also cause the result to have a different sign than the infinitely precise result.

What do we want to do for OpenCL?

The text was updated successfully, but these errors were encountered:

karolherbst · 2022-11-10T10:26:59Z

For somebody like me trying to implement OpenCL on native drivers but also on top of Vulkan, I think it might be worth saying that if nothing is specified otherwise it follows Vulkan rules.

dealing with OpFDiv is already a bit of an issues, but manageable without too much overhead.

I think it's a good idea that with SPIR-V you can opt into getting less precision using the existing opcodes. It might make sense though to fill the gaps by adding more OpenCL C builtins if such exist e.g. adding more native_ variants which would map 1:1 to SPIR-V instructions.

StuartDBrady · 2022-11-11T12:29:54Z

Could you help explain a little about the layering with Vulkan, please? Naïvely, I had previously assumed the translation of OpFRem instructions in SPIR-V using the Kernel execution into SPIR-V using the GLCompute execution model would be similar to the translation of uses of OpExtInst OpenCL fmod, and that translation of OpFMod would be implemented in terms of translation of OpFRem. Does layering of OpenCL on Vulkan not work the way that I would have assumed?

Note in particular that in the SPIR-V/LLVM Translator, LLVM's frem instruction is translated into OpFRem. The LangRef states "This is the same output as a libm 'fmod' function". Perhaps this is a bug in the SPIR-V/LLVM Translator, and we should be translating to fmod in the OpenCL extended instruction set, instead?

karolherbst · 2022-11-11T14:46:01Z

Could you help explain a little about the layering with Vulkan, please? Naïvely, I had previously assumed the translation of OpFRem instructions in SPIR-V using the Kernel execution into SPIR-V using the GLCompute execution model would be similar to the translation of uses of OpExtInst OpenCL fmod, and that translation of OpFMod would be implemented in terms of translation of OpFRem. Does layering of OpenCL on Vulkan not work the way that I would have assumed?

That would make things just difficult. OpFRem should be the same on both sides. The OpenCL SPIR-V Env spec does not specify the precision of OpFRem at all, and imho it shouldn't have stricter rules than Vulkan here in the first place.

If one wants to have the precision of OpenCL C builtins, the OpExtInst ones exist for that reason. We shouldn't make it harder than it's supposed to be. If you want to write fast code in OpenCL having the option of using SPIR-V instructions directly to get more speed if the precision doesn't matter is I think something which we should keep, though I suspect some/most implementations are more strict here.

Note in particular that in the SPIR-V/LLVM Translator, LLVM's frem instruction is translated into OpFRem. The LangRef states "This is the same output as a libm 'fmod' function". Perhaps this is a bug in the SPIR-V/LLVM Translator, and we should be translating to fmod in the OpenCL extended instruction set, instead?

yeah.. from a gut feeling I'd say it shall translate to the OpExtInst instead as the SPIR-V env spec doesn't give any guarantees whatsoever on OpFRem. But that's kind of out of scope, because OpenCL C only has fmod here in the first place.

b-sumner · 2022-11-11T15:26:30Z

I was against introducing OpFRem, and still believe that it should be removed rather than producing completely wrong answers in certain cases. Why can't those that want an approximation that only works sometimes just generate the sequence of SPIR-V ops that produce it?

bashbaug · 2022-11-15T01:55:31Z

We need to support OpFRem and OpFMod because these are standard SPIR-V instructions, so removing them is not an option. We do have options how they should behave, though.

Option 1: Derive behavior from `trunc` and `floor`, similar to Vulkan.

Pros: Provides a faster option than the fmod and remainder extended instructions in cases when high accuracy is not required. Provides uniformity with Vulkan, simplifying layered implementations.

Cons: Does not match the currently tested behavior and changing the required accuracy could break existing SPIR-V users.

Option 2: Match the `fmod` and `remainder` extended instructions (0 ULP).

Pros: Matches the currently tested behavior, less likelihood for confusion.

Cons: Lower performance in the cases where high accuracy is not required, potentially complicates layered implementations or SPIR-V generators targeting OpenCL and Vulkan.

Other options?

We could look into other options like allowing the derived behavior with "unsafe math optimizations" enabled, but we'd need to justify the added complexity.

gfxstrand · 2022-11-15T15:51:29Z

I'm a bit torn. On the one hand, I want consistency. On the other hand, this is the libclc implementation of fmod(): https://github.com/llvm/llvm-project/blob/main/libclc/generic/lib/math/clc_fmod.cl#L30. It's horrible. If you've got a hardware instruction that does precise fmod, that's great. If you don't and need an emulation, getting 0 ULP is a giant pile of work that you don't want unless you really need the precision. Back to the first hand, if you want x - y * floor(x/y), maybe just type that? I guess it is 3-4x the instructions if someone happens to have a FREM instruction but are people really fmod-bound? That would be a weird app...

From a driver compiler point of view, I really want OpFMod and OpFRem to really be the same for Vulkan and OpenCL. Mesa currently tries very hard to not have special cases in our SPIR-V parser for the target API. I'd rather those mean the same thing regardless of whether you're targeting Vulkan or OpenCL and have a separate thing for if you want the super-precise version for which we probably have to invoke libclc. Yeah, ok, I think I convinced myself that these should match Vulkan. 😁

bashbaug · 2022-11-18T01:41:31Z

Yeah, our implementation of the precise fmod is similar (we don't have a dedicated HW instruction).

In some ways the only reason we're having this discussion is because we currently have two different ways to perform similar operations - in this case a core SPIR-V instruction and an extended instruction - therefore we can (possibly) assign different behavior (accuracy requirements) to each operation. Two interesting philosophical questions are:

Are we OK having two ways of performing a similar operation or should we try to have only a single way to perform an operation?
In cases where we have two ways of performing a similar operation, woudl we prefer them to behave the same, or is it OK if they behave differently?

If we want to match Vulkan I think we'll need at least some cases where there are two ways of performing a similar operation, because of the differences in behavior (specifically, accuracy requirements).

gfxstrand · 2022-11-18T07:07:22Z

The consistency argument cuts both ways, unfortunately. Do we want the SPIR-V opcode to behave consistently across APIs or do we want the two things with the same name in OpenCL to behave the same in OpenCL?

karolherbst · 2022-11-18T14:24:02Z

I'm a bit skeptical of treating SPIR-V opcodes and OpenCL C builtins the same.

For the builtins we have ExtInst and that's good and helpful. The biggest problem in making SPIR-V and CLC builtins the same is, that some builtins might be implemented on top of the SPIR-V instruction having the same name.

Not having a good example here, but e.g. to satisfy OpenCL's requirement on OpFDiv we have to do some scaling around hw div instructions. Some builtins might actually be implemented in a similar way.

Granted, that's mostly a Mesa issue as our internal program IR is SPIR-V, which we also use for linking, so other runtimes might not care about that as much as we do, but at least for us, everything goes through SPIR-V one way or the other.

bashbaug · 2022-11-29T17:13:24Z

Discussed in the November 29th teleconference. We're trending towards matching Vulkan for the SPIR-V OpFMod and OpFRem. I will make a PR to clarify this in the OpenCL SPIR-V Environment Spec.

StuartDBrady · 2022-11-29T17:27:58Z

As noted in today's WG call, FPFastMathMode decorations are always valid for OpFRem, but they are only valid for fmod in the OpenCL Extended Instruction Set when using a SPIR-V environment that explicitly allows it. As we don't yet have a SPIR-V environment for OpenCL for SPIR-V 1.3 or above, we can (and IMO, we should) make sure that this is included in any such environment.

@bashbaug, any thoughts as to how we should best track this? (Also, we should update the environment to allow ZeroExtend and SignExtend for images with SPIR-V 1.4 - and probably do a more general search for "missing before version 1.x".)

alycm · 2023-01-19T01:17:08Z

On the call on 2023-01-17 there was a concern that SPIR-V LLVM translator could be less accurate than the Vulkan requirements and thus not meet the changes in #871. That would need further investigation.

StuartDBrady · 2023-08-29T15:56:02Z

I've been asked to clarify my concerns with defining the required accuracy of OpFRem and OpFMod in terms of sequences using trunc and floor.

I don't have any specific cases for which this would cause a problem. My concern is whether all existing implementations already provide the required accuracy for all possible inputs, depending on rounding behavior. This is a concern for both OpFRem and OpFMod, regardless of the behavior of the SPIR-V/LLVM Translator.

Even if we assume that OpFRem is always implemented along the lines of x − y × trunc(x ÷ y), as per Vulkan (of which I do not know), I am unsure whether we can be sure that OpFMod would have accuracy as good as x − y × floor(x ÷ y) for all possible inputs when using the SPIR-V/LLVM Translator, given the algorithm used in the translator and how it may be affected by rounding.

The SPIR-V/LLVM Translator implements OpFMod in terms of LLVM's frem op with a correction applied to it. frem is stated by the specification to match the "libm 'fmod' function" – for OpenCL, there is no libm, but it seems reasonable (but not necessarily guaranteed) that this would be taken as OpenCL's fmod builtin function, which is defined in terms of x − y × trunc(x ÷ y). The correction is a final step that adds or subtracts y to produce the correct answer.

I agree that implementations using Vulkan's accuracy for OpFRem and OpFMod should be considered accurate enough.

I was previously concerned that there may be content that uses OpFRem for the OpenCL fmod function, but if the view is that such SPIR-V content is invalid and should be regenerated to use the OpenCL Extended Instruction Set, then there accuracy of OpFRem seems a lot less important. In any case, fmod is defined as x − y × trunc(x ÷ y).

The only question that remains is whether any existing implementations of OpFMod are less accurate than this, though.

bashbaug added the SPIR-V Environment Spec Issues related to the OpenCL SPIR-V Environment specification. label Nov 10, 2022

bashbaug mentioned this issue Nov 10, 2022

spirv_new frem/fmod tests may require excessive precision KhronosGroup/OpenCL-CTS#1548

Open

bashbaug self-assigned this Nov 29, 2022

bashbaug linked a pull request Jan 16, 2023 that will close this issue

clarify the required accuracy for OpFMod and OpFRem #871

Draft

bashbaug added this to OpenCL specification maintenance Aug 14, 2024

bashbaug moved this to Needs WG discussion in OpenCL specification maintenance Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPIR-V Environment Spec does not describe required accuracy for OpFMod and OpFRem #859

SPIR-V Environment Spec does not describe required accuracy for OpFMod and OpFRem #859

bashbaug commented Nov 10, 2022

karolherbst commented Nov 10, 2022 •

edited

Loading

StuartDBrady commented Nov 11, 2022

karolherbst commented Nov 11, 2022

b-sumner commented Nov 11, 2022

bashbaug commented Nov 15, 2022

gfxstrand commented Nov 15, 2022

bashbaug commented Nov 18, 2022

gfxstrand commented Nov 18, 2022

karolherbst commented Nov 18, 2022

bashbaug commented Nov 29, 2022

StuartDBrady commented Nov 29, 2022

alycm commented Jan 19, 2023

StuartDBrady commented Aug 29, 2023 •

edited

Loading

SPIR-V Environment Spec does not describe required accuracy for OpFMod and OpFRem #859

SPIR-V Environment Spec does not describe required accuracy for OpFMod and OpFRem #859

Comments

bashbaug commented Nov 10, 2022

karolherbst commented Nov 10, 2022 • edited Loading

StuartDBrady commented Nov 11, 2022

karolherbst commented Nov 11, 2022

b-sumner commented Nov 11, 2022

bashbaug commented Nov 15, 2022

Option 1: Derive behavior from trunc and floor, similar to Vulkan.

Option 2: Match the fmod and remainder extended instructions (0 ULP).

Other options?

gfxstrand commented Nov 15, 2022

bashbaug commented Nov 18, 2022

gfxstrand commented Nov 18, 2022

karolherbst commented Nov 18, 2022

bashbaug commented Nov 29, 2022

StuartDBrady commented Nov 29, 2022

alycm commented Jan 19, 2023

StuartDBrady commented Aug 29, 2023 • edited Loading

karolherbst commented Nov 10, 2022 •

edited

Loading

Option 1: Derive behavior from `trunc` and `floor`, similar to Vulkan.

Option 2: Match the `fmod` and `remainder` extended instructions (0 ULP).

StuartDBrady commented Aug 29, 2023 •

edited

Loading