Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPIR-V Environment Spec does not describe required accuracy for OpFMod and OpFRem #859

Open
bashbaug opened this issue Nov 10, 2022 · 13 comments · May be fixed by #871
Open

SPIR-V Environment Spec does not describe required accuracy for OpFMod and OpFRem #859

bashbaug opened this issue Nov 10, 2022 · 13 comments · May be fixed by #871
Assignees
Labels
SPIR-V Environment Spec Issues related to the OpenCL SPIR-V Environment specification.

Comments

@bashbaug
Copy link
Contributor

See related discussion: KhronosGroup/OpenCL-CTS#1548

In short, the OpenCL SPIR-V environment spec currently does not describe the required accuracy for the OpFMod and OpFRem instructions (although it does describe the required accuracy for the fmod and remainder extended instructions).

OpenCL SPIR-V environment spec reference:

https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_Env.html#_ulp_values_for_math_instructions_full_profile

The spirv_new CTS tests are currently testing that OpFMod and OpFRem matches the behavior of these OpenCL C instructions, which may or may not be correct:

#define spirv_frem(a, b) fmod(a, b)              
#define spirv_fmod(a, b) copysign(fmod(a,b),b)  

For Vulkan, the accuracy of OpFMod and OpFRem is derived from trunc and floor, which makes them cheap but potentially highly inaccurate:

https://registry.khronos.org/vulkan/specs/1.3-extensions/html/vkspec.html#spirvenv-op-prec

The OpFRem and OpFMod instructions use cheap approximations of remainder, and the error can be large due to the discontinuity in trunc() and floor(). This can produce mathematically unexpected results in some cases, such as FMod(x,x) computing x rather than 0, and can also cause the result to have a different sign than the infinitely precise result.

What do we want to do for OpenCL?

@bashbaug bashbaug added the SPIR-V Environment Spec Issues related to the OpenCL SPIR-V Environment specification. label Nov 10, 2022
@karolherbst
Copy link
Contributor

karolherbst commented Nov 10, 2022

For somebody like me trying to implement OpenCL on native drivers but also on top of Vulkan, I think it might be worth saying that if nothing is specified otherwise it follows Vulkan rules.

dealing with OpFDiv is already a bit of an issues, but manageable without too much overhead.

I think it's a good idea that with SPIR-V you can opt into getting less precision using the existing opcodes. It might make sense though to fill the gaps by adding more OpenCL C builtins if such exist e.g. adding more native_ variants which would map 1:1 to SPIR-V instructions.

@StuartDBrady
Copy link
Contributor

Could you help explain a little about the layering with Vulkan, please? Naïvely, I had previously assumed the translation of OpFRem instructions in SPIR-V using the Kernel execution into SPIR-V using the GLCompute execution model would be similar to the translation of uses of OpExtInst OpenCL fmod, and that translation of OpFMod would be implemented in terms of translation of OpFRem. Does layering of OpenCL on Vulkan not work the way that I would have assumed?

Note in particular that in the SPIR-V/LLVM Translator, LLVM's frem instruction is translated into OpFRem. The LangRef states "This is the same output as a libm 'fmod' function". Perhaps this is a bug in the SPIR-V/LLVM Translator, and we should be translating to fmod in the OpenCL extended instruction set, instead?

@karolherbst
Copy link
Contributor

Could you help explain a little about the layering with Vulkan, please? Naïvely, I had previously assumed the translation of OpFRem instructions in SPIR-V using the Kernel execution into SPIR-V using the GLCompute execution model would be similar to the translation of uses of OpExtInst OpenCL fmod, and that translation of OpFMod would be implemented in terms of translation of OpFRem. Does layering of OpenCL on Vulkan not work the way that I would have assumed?

That would make things just difficult. OpFRem should be the same on both sides. The OpenCL SPIR-V Env spec does not specify the precision of OpFRem at all, and imho it shouldn't have stricter rules than Vulkan here in the first place.

If one wants to have the precision of OpenCL C builtins, the OpExtInst ones exist for that reason. We shouldn't make it harder than it's supposed to be. If you want to write fast code in OpenCL having the option of using SPIR-V instructions directly to get more speed if the precision doesn't matter is I think something which we should keep, though I suspect some/most implementations are more strict here.

Note in particular that in the SPIR-V/LLVM Translator, LLVM's frem instruction is translated into OpFRem. The LangRef states "This is the same output as a libm 'fmod' function". Perhaps this is a bug in the SPIR-V/LLVM Translator, and we should be translating to fmod in the OpenCL extended instruction set, instead?

yeah.. from a gut feeling I'd say it shall translate to the OpExtInst instead as the SPIR-V env spec doesn't give any guarantees whatsoever on OpFRem. But that's kind of out of scope, because OpenCL C only has fmod here in the first place.

@b-sumner
Copy link

I was against introducing OpFRem, and still believe that it should be removed rather than producing completely wrong answers in certain cases. Why can't those that want an approximation that only works sometimes just generate the sequence of SPIR-V ops that produce it?

@bashbaug
Copy link
Contributor Author

We need to support OpFRem and OpFMod because these are standard SPIR-V instructions, so removing them is not an option. We do have options how they should behave, though.

Option 1: Derive behavior from trunc and floor, similar to Vulkan.

Pros: Provides a faster option than the fmod and remainder extended instructions in cases when high accuracy is not required. Provides uniformity with Vulkan, simplifying layered implementations.

Cons: Does not match the currently tested behavior and changing the required accuracy could break existing SPIR-V users.

Option 2: Match the fmod and remainder extended instructions (0 ULP).

Pros: Matches the currently tested behavior, less likelihood for confusion.

Cons: Lower performance in the cases where high accuracy is not required, potentially complicates layered implementations or SPIR-V generators targeting OpenCL and Vulkan.

Other options?

We could look into other options like allowing the derived behavior with "unsafe math optimizations" enabled, but we'd need to justify the added complexity.

@gfxstrand
Copy link

I'm a bit torn. On the one hand, I want consistency. On the other hand, this is the libclc implementation of fmod(): https://github.com/llvm/llvm-project/blob/main/libclc/generic/lib/math/clc_fmod.cl#L30. It's horrible. If you've got a hardware instruction that does precise fmod, that's great. If you don't and need an emulation, getting 0 ULP is a giant pile of work that you don't want unless you really need the precision. Back to the first hand, if you want x - y * floor(x/y), maybe just type that? I guess it is 3-4x the instructions if someone happens to have a FREM instruction but are people really fmod-bound? That would be a weird app...

From a driver compiler point of view, I really want OpFMod and OpFRem to really be the same for Vulkan and OpenCL. Mesa currently tries very hard to not have special cases in our SPIR-V parser for the target API. I'd rather those mean the same thing regardless of whether you're targeting Vulkan or OpenCL and have a separate thing for if you want the super-precise version for which we probably have to invoke libclc. Yeah, ok, I think I convinced myself that these should match Vulkan. 😁

@bashbaug
Copy link
Contributor Author

Yeah, our implementation of the precise fmod is similar (we don't have a dedicated HW instruction).

In some ways the only reason we're having this discussion is because we currently have two different ways to perform similar operations - in this case a core SPIR-V instruction and an extended instruction - therefore we can (possibly) assign different behavior (accuracy requirements) to each operation. Two interesting philosophical questions are:

  1. Are we OK having two ways of performing a similar operation or should we try to have only a single way to perform an operation?
  2. In cases where we have two ways of performing a similar operation, woudl we prefer them to behave the same, or is it OK if they behave differently?

If we want to match Vulkan I think we'll need at least some cases where there are two ways of performing a similar operation, because of the differences in behavior (specifically, accuracy requirements).

@gfxstrand
Copy link

The consistency argument cuts both ways, unfortunately. Do we want the SPIR-V opcode to behave consistently across APIs or do we want the two things with the same name in OpenCL to behave the same in OpenCL?

@karolherbst
Copy link
Contributor

I'm a bit skeptical of treating SPIR-V opcodes and OpenCL C builtins the same.

For the builtins we have ExtInst and that's good and helpful. The biggest problem in making SPIR-V and CLC builtins the same is, that some builtins might be implemented on top of the SPIR-V instruction having the same name.

Not having a good example here, but e.g. to satisfy OpenCL's requirement on OpFDiv we have to do some scaling around hw div instructions. Some builtins might actually be implemented in a similar way.

Granted, that's mostly a Mesa issue as our internal program IR is SPIR-V, which we also use for linking, so other runtimes might not care about that as much as we do, but at least for us, everything goes through SPIR-V one way or the other.

@bashbaug
Copy link
Contributor Author

Discussed in the November 29th teleconference. We're trending towards matching Vulkan for the SPIR-V OpFMod and OpFRem. I will make a PR to clarify this in the OpenCL SPIR-V Environment Spec.

@bashbaug bashbaug self-assigned this Nov 29, 2022
@StuartDBrady
Copy link
Contributor

As noted in today's WG call, FPFastMathMode decorations are always valid for OpFRem, but they are only valid for fmod in the OpenCL Extended Instruction Set when using a SPIR-V environment that explicitly allows it. As we don't yet have a SPIR-V environment for OpenCL for SPIR-V 1.3 or above, we can (and IMO, we should) make sure that this is included in any such environment.

@bashbaug, any thoughts as to how we should best track this? (Also, we should update the environment to allow ZeroExtend and SignExtend for images with SPIR-V 1.4 - and probably do a more general search for "missing before version 1.x".)

@bashbaug bashbaug linked a pull request Jan 16, 2023 that will close this issue
@alycm
Copy link
Contributor

alycm commented Jan 19, 2023

On the call on 2023-01-17 there was a concern that SPIR-V LLVM translator could be less accurate than the Vulkan requirements and thus not meet the changes in #871. That would need further investigation.

@StuartDBrady
Copy link
Contributor

StuartDBrady commented Aug 29, 2023

I've been asked to clarify my concerns with defining the required accuracy of OpFRem and OpFMod in terms of sequences using trunc and floor.

I don't have any specific cases for which this would cause a problem. My concern is whether all existing implementations already provide the required accuracy for all possible inputs, depending on rounding behavior. This is a concern for both OpFRem and OpFMod, regardless of the behavior of the SPIR-V/LLVM Translator.

Even if we assume that OpFRem is always implemented along the lines of xy × trunc(x ÷ y), as per Vulkan (of which I do not know), I am unsure whether we can be sure that OpFMod would have accuracy as good as xy × floor(x ÷ y) for all possible inputs when using the SPIR-V/LLVM Translator, given the algorithm used in the translator and how it may be affected by rounding.

The SPIR-V/LLVM Translator implements OpFMod in terms of LLVM's frem op with a correction applied to it. frem is stated by the specification to match the "libm 'fmod' function" – for OpenCL, there is no libm, but it seems reasonable (but not necessarily guaranteed) that this would be taken as OpenCL's fmod builtin function, which is defined in terms of xy × trunc(x ÷ y). The correction is a final step that adds or subtracts y to produce the correct answer.

I agree that implementations using Vulkan's accuracy for OpFRem and OpFMod should be considered accurate enough.

I was previously concerned that there may be content that uses OpFRem for the OpenCL fmod function, but if the view is that such SPIR-V content is invalid and should be regenerated to use the OpenCL Extended Instruction Set, then there accuracy of OpFRem seems a lot less important. In any case, fmod is defined as xy × trunc(x ÷ y).

The only question that remains is whether any existing implementations of OpFMod are less accurate than this, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SPIR-V Environment Spec Issues related to the OpenCL SPIR-V Environment specification.
Projects
Status: Needs WG discussion
Development

Successfully merging a pull request may close this issue.

6 participants