Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting floating point control work no longer works as expected with new Exception handling in 9.0 #109997

Closed
koyote opened this issue Nov 20, 2024 · 14 comments

Comments

@koyote
Copy link

koyote commented Nov 20, 2024

Description

_control87 can be used to set the floating point control word for the running process.
This changes the behaviour of floating point arithmetic to, for example, throw an exception on floating point errors such as division by zero.

Since .NET 9.0, there are no longer catchable exceptions thrown and instead the process exits with no information other than InternalError.

Is this breaking change expected?
Is the unhelpful error message expected?

Reproduction Steps

using System.Runtime.InteropServices;

[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
static extern uint _control87(uint @new, uint mask);

[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
static extern uint _clearfp();

const uint MCW_EM = 0x0008001f;
const uint _EM_DENORMAL = 0x00080000;
const uint _EM_UNDERFLOW = 0x00000002;
const uint _EM_INEXACT = 0x00000001;

_clearfp();
_control87(_EM_DENORMAL | _EM_UNDERFLOW | _EM_INEXACT, MCW_EM);

double zero = 0.0;
double theDouble = 0.0;
try
{
    theDouble = 123.321 / zero;
}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
}

Console.WriteLine("Result: " + theDouble);

Expected behavior

In <9.0 we get:

Attempted to divide by zero.
Result: 0

Actual behavior

Process terminated. InternalError
   at System.Environment.FailFast(System.Runtime.CompilerServices.StackCrawlMarkHandle, System.String, System.Runtime.CompilerServices.ObjectHandleOnStack, System.String)
   at System.Environment.FailFast(System.Threading.StackCrawlMark ByRef, System.String, System.Exception, System.String)
   at System.Environment.FailFast(System.String)
   at System.Runtime.EH.FallbackFailFast(System.Runtime.RhFailFastReason, System.Object)
   at System.Runtime.EH.RhThrowHwEx(UInt32, ExInfo ByRef)
   at Program.<Main>$(System.String[])

Regression?

Yes, works at least in .NET 7 and 8.

Known Workarounds

Setting:

"System.Runtime.LegacyExceptionHandling": true

In the runtimeconfig.json file.

Configuration

.NET 9.0
Windows 11 x64

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Nov 20, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Nov 20, 2024
@huoyaoyuan huoyaoyuan added area-ExceptionHandling-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Nov 20, 2024
@teo-tsirpanis
Copy link
Contributor

According to #76257 (comment), .NET does not support running with floating point exceptions.

@teo-tsirpanis teo-tsirpanis closed this as not planned Won't fix, can't repro, duplicate, stale Nov 20, 2024
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Nov 20, 2024
@tannergooding
Copy link
Member

According to #76257 (comment), .NET does not support running with floating point exceptions.

Right, .NET explicitly does not support modifying the floating-point control word and doing so will cause undefined behavior. This is true even prior to .NET 9 and it may have just happened or appeared to work in the isolated scenarios that you had been testing.

Enabling IEEE 754 floating-point exceptions is incredibly uncommon in many production apps (with many modern languages not providing support) and the spec explicitly allows for a language or runtime to deviate from the default where they would be enabled and even to not provide support for particular status flags.

Most of the "faulting" behaviors are expensive and are rarely needed, the spec required result irrespective of the fault is also often mathematically intuitive (particularly when getting into domains involving Complex numbers, scientific computing, ML/AI, and beyond). In such a case that detection of a fault is necessary, it is typically more appropriate to explicitly check for such a scenario such as via IsNaN, IsInfinity, IsSubnormal or other helper APIs that exist on the floating-point types such as float (System.Single), double (System.Double), or Half (System.Half)

@koyote
Copy link
Author

koyote commented Nov 20, 2024

Right, .NET explicitly does not support modifying the floating-point control word and doing so will cause undefined behavior. This is true even prior to .NET 9 and it may have just happened or appeared to work in the isolated scenarios that you had been testing.

Fair enough!

As you mention "explicitly", is this documented somewhere?

Just as an aside, this issue will probably break whatever @grospelliergilles was trying to do in the tagged issue #76257:

Even if you are not setting the floating-point control word in your C# app, if you are referencing a C# assembly in a C++ application that sets the control word (via C++/CLI or hosting the dotnet runtime), then the C# code will behave in a similar way.

(Although we've found that in .Net 9 it actually just crashes regardless of what the C# code does because the runtime jit now divides by zero on purpose).

Therefore, if there is documentation about this, it might be worth mentioning that it also impacts C# assemblies that are loaded into a non-C# process that has the control word set.

@grospelliergilles
Copy link
Contributor

Thank you @koyote for reporting this. I think the runtime should not divide by zero like you show in

weight_t const blockRelResidual = change / oldWeight;
because in C++ divide by zero may be undefined behavior (see c++ standard) and in the current context it is easy to prevent that.
I will do a PR to remove that divide by zero. I hope it will be accepted.

@tannergooding
Copy link
Member

@grospelliergilles Division by zero is well defined for C and C++ under the IEC 60559 (IEEE 754) annex which defines the required behavior for compilers implementing IEEE 754 compliant floating-point arithmetic

@grospelliergilles
Copy link
Contributor

Yes I agree with you @tannergooding. This is why I used 'may' be undefined behavior.

That being, it is easy to remove the division by zero if we add a small value in the division ( change / (oldWeight + 1.0e-16) for example). This is also possible to rewrite the test to not do the divide. I may be wrong but looking at the algorithm it seems that if oldWeight is zero then relResidual will always be infinite and the algorithm will never converge. Detecting this will speed up you function because you can go out of the loop before.

If you agree I can try to do a PR to remove this division.

@tannergooding
Copy link
Member

This is why I used 'may' be undefined behavior.

It is strictly never undefined behavior for .NET because we require the underlying C++ compiler to provide various baseline requirements, such as __STDC_IEC_559__, ensuring that int8_t, int16_t, int32_t, and int64_t exist, etc. We correspondingly have tests validating these behaviors and will fail if they are not met. -- Notably newer versions of the C language spec also allow an implementation to define __STDC_IEC_60559_BFP__ which more accurately defines what .NET actually requires/expects given its implementation.

That being, it is easy to remove the division by zero if we add a small value in the division ( change / (oldWeight + 1.0e-16) for example). This is also possible to rewrite the test to not do the divide. I may be wrong but looking at the algorithm it seems that if oldWeight is zero then relResidual will always be infinite and the algorithm will never converge. Detecting this will speed up you function because you can go out of the loop before.

This can lead to misleading results as well as other deviations that are undesirable. As per the surrounding comments, it already explicitly accounts for infinites and is already time boxed to ensure well-defined behavior.

If you agree I can try to do a PR to remove this division.

The code is already correct. .NET explicitly does not support IEEE 754 floating-point exceptions or modifying the control word, as has been covered by many different issues.

This is also covered under the ECMA-335 runtime spec which, notably, needs to be taken in coordination with https://github.com/dotnet/runtime/blob/main/docs/design/specs/Ecma-335-Augments.md. The sections covering IEC 60559 (IEEE 754) are a little bit outdated at this point, still referring to the 1989 spec and not having been updated to account for the 2008 or 2019 specifications and improvements that have been made to maintain compliance. It would likely be worthwhile for us to update the augments doc with changes relevant to 1.12.1.3 Handling of floating-point data types to account for said changes, such as requiring correct handling of denormal values, more explicitly calling out that having a modified floating-point environment while any part of the managed runtime is executing is undefined behavior, etc.
 
I do not think we should be trying to add changes that allow such unsupported and undefined behavior to potentially work in additional scenarios, it just hurts developers more in the long run than failing fast.

CC. @jkotas

@grospelliergilles
Copy link
Contributor

Thank you for your response.
I understand your arguments.
My proposition was only to make dotnet runs in HPC environment because it is a wonderful runtime far better than the Python one on many aspects (speed, multi-threading, static typing). To explain more my usage, in HPC environment, we often enable trapping FPE to find bugs and stops the execution at the source of the error.
But I understand this is not an important use case for you.

@tannergooding
Copy link
Member

To explain more my usage, in HPC environment, we often enable trapping FPE to find bugs and stops the execution at the source of the error.

Enabling IEEE 754 floating-point exceptions can come with a general overhead cost to execution by being enabled. There is then additional overhead incurred by toggling the floating-point environment. This overhead is typically far greater than inserting explicit checks at key points in your algorithm and is often considered to be quite orthogonal to achieving high performance. Correspondingly, newer CPUs have provided ways to control things like rounding behavior without requiring modification of the floating-point environment at all, which results in significant throughput increases. Use of SIMD often ignores such exceptions, GPUs often do not provide support for such exceptions at all, and the same tends to apply to distributed computing and other environments that are typical for HPC applications (which applies to all languages, even languages like C/C++ where enabling such features "may" be supported).

-- Notably some CPUs and situations do handle floating-point exceptions without significant performance penalty, but this is not universally applicable and is often microarchitecture and even instruction or input specific. Correspondingly the official optimization manuals from the various hardware vendors for both Arm64 and x64 tend to recommend that code avoid floating-point exceptions where possible. Some exception types such as denormalized operands, arithmetic underflow, and arithmetic overflow are then well known to not be efficiently handled and to frequently be the cause of performance degradation. Such degradation notably applies to when the exceptions are enabled and handling of such cases when the exceptions are disabled is often explicitly documented to be efficient.

But I understand this is not an important use case for you.

Performance itself is something that .NET cares about and focuses on, but that happens where it is relevant and with the consideration of how it applies to real world workloads, especially as they apply at scale.

IEEE 754 floating-point exceptions are typically not part of any scenario where achieving high performance or determinism is desirable and the broader industry optimization recommendations back this up.

@grospelliergilles
Copy link
Contributor

Thanks to take time to respond to me.

To explain more my usage, in HPC environment, we often enable trapping FPE to find bugs and stops the execution at the source of the error.

Enabling IEEE 754 floating-point exceptions can come with a general overhead cost to execution by being enabled. There is then additional overhead incurred by toggling the floating-point environment. This overhead is typically far greater than inserting explicit checks at key points in your algorithm and is often considered to be quite orthogonal to achieving high performance.

I am working in HPC for a lot of years and on our supercomputers (Adastra, Topaze) there is zero overhead to enable floating point exception because when they occurs we stop the execution (we are only trapping division by zero and invalid values). We do daily runs with several thousands CPU cores without any performance loss with this option. We have done a lot comparison with FPE enabled or disabled on several architectures and did not see a difference of performance for our case.

Correspondingly, newer CPUs have provided ways to control things like rounding behavior without requiring modification of the floating-point environment at all, which results in significant throughput increases. Use of SIMD often ignores such exceptions, GPUs often do not provide support for such exceptions at all, and the same tends to apply to distributed computing and other environments that are typical for HPC applications (which applies to all languages, even languages like C/C++ where enabling such features "may" be supported).

-- Notably some CPUs and situations do handle floating-point exceptions without significant performance penalty, but this is not universally applicable and is often microarchitecture and even instruction or input specific. Correspondingly the official optimization manuals from the various hardware vendors for both Arm64 and x64 tend to recommend that code avoid floating-point exceptions where possible. Some exception types such as denormalized operands, arithmetic underflow, and arithmetic overflow are then well known to not be efficiently handled and to frequently be the cause of performance degradation. Such degradation notably applies to when the exceptions are enabled and handling of such cases when the exceptions are disabled is often explicitly documented to be efficient.

I agree with you for some kind of FPE like denormalized values or underflow but we do not enable exceptions for theses cases. When these cases occurs there is some loss of precision but people are working to improve numerical scheme to handle that. We trap division by zero and invalid values (for example square root of negative number) because when we begin to have Inf or NaN in some computation they propagate and it is very difficult to get back to the source of the error. We are using a lot of external libraries like linear solvers (Hypre, PETSc) or partitioning tools (like Parmetis) and it is not always possible to selectivity enable or disable FPE trapping when calling these libraries. Our simulations run for several hours and when the numerical problem occurs in one of these libraries or in our code (often after many hours) we do not have many ways to find it. FPE trapping is one of them. Without it we had to launch again the simulation (which is costly with several thousand of cores) and make sure the run will produce exactly the same result. And on these case we can not use a debugger.

For your information, in our simulation we use C# as a way for users to customize some parts of the code. This is a secure environment and easier to use than C++ for the physicist. Because we use multi-threading we can not do that in Python without degrading performance (because of the GIL).

With dotnet we tried to find a way to disable these FPE before executing some part of the runtime but we did not find a way to do that. Maybe if there were a way to be notified when the jit or GC begin and finish its work and then we will be able to disable FPE at this moment and re-enable them after ?

But I understand this is not an important use case for you.

Performance itself is something that .NET cares about and focuses on, but that happens where it is relevant and with the consideration of how it applies to real world workloads, especially as they apply at scale.

IEEE 754 floating-point exceptions are typically not part of any scenario where achieving high performance or determinism is desirable and the broader industry optimization recommendations back this up.

@tannergooding
Copy link
Member

I am working in HPC for a lot of years and on our supercomputers (Adastra, Topaze) there is zero overhead to enable floating point exception because when they occurs we stop the execution (we are only trapping division by zero and invalid values).

The consideration here depends on the CPU in question. While many CPUs (particularly modern x64 and Arm64) have zero performance overhead for some floating-point exceptions being enabled, not all do. You are likely operating on a CPU set where these two exception types are handled without performance penalty, but migrating to another microarchitecture could see incurred cost even if the exceptions are never thrown.

The performance impact of denormalized values is another case that is CPU dependent. Some CPU architectures (particularly modern x64 and Arm64 chips) have zero overhead for handling such values for +, -, *, and /. While the impact on other intrinsic functions such as sqrt vary more greatly across hardware.

We are using a lot of external libraries like linear solvers (Hypre, PETSc) or partitioning tools (like Parmetis) and it is not always possible to selectivity enable or disable FPE trapping when calling these libraries.

Native is free to change the floating-point environment itself, but it must restore said environment prior to returning control to the managed runtime. This includes the P/Invoke returning normally or some reverse P/Invoke being invoked (such as via a delegate, function pointer, or callback; including to UnmanagedCallersOnly annotated functions).

The floating-point environment is often per core (or logical CPU thread) and is typically tracked per OS thread as part of the CONTEXT save/restore structure such that two different OS threads won't cause incorrect behavior.

Saving/restoring this state (or even just the "floating-point control word") as part of every p/invoke transition would be expensive and have a negative impact on the ecosystem for typical workloads, particularly given most never touch the fpenv themselves.

Maybe if there were a way to be notified when the jit or GC begin and finish its work and then we will be able to disable FPE at this moment and re-enable them after ?

I don't believe we have any such APIs, even in the profiling APIs (@jkotas ?).

@grospelliergilles
Copy link
Contributor

I am working in HPC for a lot of years and on our supercomputers (Adastra, Topaze) there is zero overhead to enable floating point exception because when they occurs we stop the execution (we are only trapping division by zero and invalid values).

The consideration here depends on the CPU in question. While many CPUs (particularly modern x64 and Arm64) have zero performance overhead for some floating-point exceptions being enabled, not all do. You are likely operating on a CPU set where these two exception types are handled without performance penalty, but migrating to another microarchitecture could see incurred cost even if the exceptions are never thrown.

I agree with what you said. Fortunately, HPC processors are only x64 and arm64. It is why we do not have performance impact enabling FPE in our case.

The performance impact of denormalized values is another case that is CPU dependent. Some CPU architectures (particularly modern x64 and Arm64 chips) have zero overhead for handling such values for +, -, *, and /. While the impact on other intrinsic functions such as sqrt vary more greatly across hardware.

I still agree with you. In our simulation we do not have denormalized values because we are using flush to zero.

We are using a lot of external libraries like linear solvers (Hypre, PETSc) or partitioning tools (like Parmetis) and it is not always possible to selectivity enable or disable FPE trapping when calling these libraries.

Native is free to change the floating-point environment itself, but it must restore said environment prior to returning control to the managed runtime. This includes the P/Invoke returning normally or some reverse P/Invoke being invoked (such as via a delegate, function pointer, or callback; including to UnmanagedCallersOnly annotated functions).

The floating-point environment is often per core (or logical CPU thread) and is typically tracked per OS thread as part of the CONTEXT save/restore structure such that two different OS threads won't cause incorrect behavior.

Saving/restoring this state (or even just the "floating-point control word") as part of every p/invoke transition would be expensive and have a negative impact on the ecosystem for typical workloads, particularly given most never touch the fpenv themselves.

Yes we tried to use this change floating-point environment in P/Invoke when we can control when the dotnet runtime will be called. But like you said this has a very negative impact on performance. And there are still some parts of the runtime for which we can not do anything (jit or GC for example).

Maybe if there were a way to be notified when the jit or GC begin and finish its work and then we will be able to disable FPE at this moment and re-enable them after ?

I don't believe we have any such APIs, even in the profiling APIs (@jkotas ?).

The dotnet runtime and the .Net environment are wonderful tools with high performance. If there was an API to do that it will be great !

@tannergooding
Copy link
Member

In our simulation we do not have denormalized values because we are using flush to zero.

Worth noting that flush to zero is similarly part of the floating-point environment that changing is unsupported for managed code. You will break APIs like T.IsSubnormal, T.Epsilon, and various other APIs where denormals are required for correctness.

If you're modifying it for native, it also must be restored to the original state (i.e. denormals are preserved) prior to managed code or the runtime itself executing. -- This applies to all aspects of the floating-point environment including but not limited to FTZ/DNZ, rounding direction, and exception masking.

And there are still some parts of the runtime for which we can not do anything (jit or GC for example).

The JIT/GC do not just spontaneously run on existing threads. They either have their own threads (in which case the normal context save/restore should be preserving the existing floating-point environment) or they are only invoked by control returning to managed (such as a P/Invoke returning or a reverse P/Invoke being executed by native).

Provided the floating-point environment is mutated only on the native side and is restored prior to any managed code executing (which includes prior to things like callbacks or reverse P/Invokes), then it should be safe and the JIT/GC and general managed code will never see the exception masking as enabled.

@grospelliergilles
Copy link
Contributor

In our simulation we do not have denormalized values because we are using flush to zero.

Worth noting that flush to zero is similarly part of the floating-point environment that changing is unsupported for managed code. You will break APIs like T.IsSubnormal, T.Epsilon, and various other APIs where denormals are required for correctness.

If you're modifying it for native, it also must be restored to the original state (i.e. denormals are preserved) prior to managed code or the runtime itself executing. -- This applies to all aspects of the floating-point environment including but not limited to FTZ/DNZ, rounding direction, and exception masking.

Thanks for the information. We are only using FTZ in some parts of our code and restore the environment after that so it should be safe relative to dotnet.

And there are still some parts of the runtime for which we can not do anything (jit or GC for example).

The JIT/GC do not just spontaneously run on existing threads. They either have their own threads (in which case the normal context save/restore should be preserving the existing floating-point environment) or they are only invoked by control returning to managed (such as a P/Invoke returning or a reverse P/Invoke being executed by native).

Is there a way to be notified (via profiling API for example) of the creation of threads for the GC ?

Provided the floating-point environment is mutated only on the native side and is restored prior to any managed code executing (which includes prior to things like callbacks or reverse P/Invokes), then it should be safe and the JIT/GC and general managed code will never see the exception masking as enabled.

Thanks for the suggestion. I am not sure if it is possible in our case. We are using SWIG to wrap C++ to C#. A C++ method may call a C# method which can itself call a C++ method (which may also call a C# method and so on). I will look at it and if there is performance penalty associated to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants