What to do about fast-math? #19743
Labels
codegen/llvm
LLVM code generation compiler backend
codegen
Shared code generation infrastructure and dialects
performance ⚡
Performance/optimization related work across the compiler and runtime
This is joint research with @kuhar.
Overview
At the moment, there is a disconnect in "fast math" semantics between what we do in MLIR rewrites, and what we allow LLVM to do after we have lowered to LLVM:
(x+y)+z -> x+(y+z)
).That inconsistency is problematic because it guarantees a suboptimal trade-off of performance and exactness. If a transformation is acceptable in MLIR, then we are leaving performance on the table by not enabling it also in LLVM.
The problems with LLVM's fast-math semantics
LLVM issue 1: fast-math introduces UB
LLVM's language reference says that fast-math flags are treated as assumptions and that when violated, the result can be poison. For example, this is the documentation on the
ninf
flag:Reading this, we started wondering if we could put together a crazy demo of UB conditioned on
-ffast-math
, and sure enough... in this Compiler Explorer experiment, we have the following C++ program:Here is what it prints across
{Clang,GCC} x {default, fast-math}
:-O2
-O2 -ffast-math
Thoughts about LLVM issue 1
-ffast-math
want infinities to be neglected in a different sense that doesn't make it UB if they actually arise.fast-math
flags. Unfortunately, there are 2 problems with that:ninf
andnnan
are necessary for the optimizations that we care about, such as reassociations and optimizing away divisions.LLVM issue 2: fine-grained fast-math flags are halfway implemented.
LLVM fast-math flags consist of fine-grained flags such as
nnan
,ninf
,reassoc
controlling individual aspects of fast-math semantics, and afast
flag that "is a shorthand for specifying all fast-math flags at once".In itself, that's a good thing, as that allows us to cherry-pick aspects of fast-math.
In practice, the only thing that really works as expected (ignoring the above issue 1) is wholesale
fast
. The fine-grained flags seem like a good idea with an unfinished implementation.The following Compiler Explorer experiment shows which specific flags are relevant to enabling specific rewrites. For example, looking at the first two functions,
a_plus_b_minus_a_1
anda_plus_b_minus_a_2
:Here is the result of
opt
:This shows that for the rewrite
a + b - a
intob
, LLVM requires these two flags:reassoc
andnsz
. Justreassoc
alone isn't enough, as seen ona_plus_b_minus_a_2
. That means that the semantics ofreassoc
alone aren't enough to neglect the difference between positive and negative zero. That's fine in itself, but in order to be consistent with that, this should then also require theninf
andnnan
flags, which is supposed to be how we allow neglecting infinities and NaNs respectively. The reassociation does lose both some infinites and some NaNs, for instance ifx
is a large positive finite value such thatx + x == +inf
, thenx + x - x
is also+inf
, but the rewrite results inx
instead of+inf
. So the bug here is that the rewrite was performed ona_plus_b_minus_a_1
even though it did not have theninf
andnnan
flags.Thoughts about LLVM issue 2
fast
, not the fine-grained flags, which Issue 2 shows to have half-way implemented semantics.What we can do for now in IREE or in MLIR
The text was updated successfully, but these errors were encountered: