Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What to do about fast-math? #19743

Open
bjacob opened this issue Jan 20, 2025 · 1 comment
Open

What to do about fast-math? #19743

bjacob opened this issue Jan 20, 2025 · 1 comment
Labels
codegen/llvm LLVM code generation compiler backend codegen Shared code generation infrastructure and dialects performance ⚡ Performance/optimization related work across the compiler and runtime

Comments

@bjacob
Copy link
Contributor

bjacob commented Jan 20, 2025

This is joint research with @kuhar.

Overview

At the moment, there is a disconnect in "fast math" semantics between what we do in MLIR rewrites, and what we allow LLVM to do after we have lowered to LLVM:

  • In MLIR rewrites, we are performing many "fast math"-like transformations in custom rewrites. For instance, reassociations ((x+y)+z -> x+(y+z)).
    • Reassociations are necessary to implement something like a matrix multiplication efficiently, at multiple levels. At the instruction level, if we are targeting matrix-multiplication instructions, that is in itself a reassociation. At workgroup-distribution level, whenever we split a reduction dimension, that is a also reassociation.
  • The LLVM IR that we bottom out on does not have fast-math flags.

That inconsistency is problematic because it guarantees a suboptimal trade-off of performance and exactness. If a transformation is acceptable in MLIR, then we are leaving performance on the table by not enabling it also in LLVM.

The problems with LLVM's fast-math semantics

LLVM issue 1: fast-math introduces UB

LLVM's language reference says that fast-math flags are treated as assumptions and that when violated, the result can be poison. For example, this is the documentation on the ninf flag:

No Infs - Allow optimizations to assume the arguments and result are not +/-Inf. If an argument is +/-Inf, or the result would be +/-Inf, it produces a poison value instead.

Reading this, we started wondering if we could put together a crazy demo of UB conditioned on -ffast-math, and sure enough... in this Compiler Explorer experiment, we have the following C++ program:

#include <stdio.h>

bool foo(float f) {
    float inf = 1.0f / 0.0f;
    return (f + inf) != f;
}

int main() {
    if (foo(0.0f)) {
        printf("Hello!\n");
    }
}

void some_function_we_never_call() {
    printf("How'd you get there?!\n");
}

Here is what it prints across {Clang,GCC} x {default, fast-math}:

Clang GCC
-O2 Hello! Hello!
-O2 -ffast-math How'd you get there?! Hello!

Thoughts about LLVM issue 1

  • While it is customary for compiler design to think in terms of assumptions and to treat violations as UB, the topic of fast-math flags is not a good fit for that model. The basic problem is that infinities and NaNs values are normal, even inevitable, things to happen. A compiler that treats these as violations isn't usable. Users of -ffast-math want infinities to be neglected in a different sense that doesn't make it UB if they actually arise.
  • Until this issue is resolved, in order to rely on LLVM fast-math semantics, we would have to be careful to avoid the affected fast-math flags. Unfortunately, there are 2 problems with that:
    1. Some of the affected flags like ninf and nnan are necessary for the optimizations that we care about, such as reassociations and optimizing away divisions.
    2. The "issue 2" below prevents us from using fine-grained flags in the near term.

LLVM issue 2: fine-grained fast-math flags are halfway implemented.

LLVM fast-math flags consist of fine-grained flags such as nnan, ninf, reassoc controlling individual aspects of fast-math semantics, and a fast flag that "is a shorthand for specifying all fast-math flags at once".

In itself, that's a good thing, as that allows us to cherry-pick aspects of fast-math.

In practice, the only thing that really works as expected (ignoring the above issue 1) is wholesale fast. The fine-grained flags seem like a good idea with an unfinished implementation.

The following Compiler Explorer experiment shows which specific flags are relevant to enabling specific rewrites. For example, looking at the first two functions, a_plus_b_minus_a_1 and a_plus_b_minus_a_2:

define dso_local noundef float @a_plus_b_minus_a_1(float noundef %0, float noundef %1) local_unnamed_addr #0 {
  %3 = fadd reassoc nsz float %0, %1
  %4 = fsub reassoc nsz float %3, %0
  ret float %4
}

define dso_local noundef float @a_plus_b_minus_a_2(float noundef %0, float noundef %1) local_unnamed_addr #0 {
  %3 = fadd reassoc float %0, %1
  %4 = fsub reassoc float %3, %0
  ret float %4
}

Here is the result of opt:

define dso_local noundef float @a_plus_b_minus_a_1(float noundef %0, float noundef returned %1) local_unnamed_addr #0 {
  ret float %1
}

define dso_local noundef float @a_plus_b_minus_a_2(float noundef %0, float noundef %1) local_unnamed_addr #0 {
  %3 = fadd reassoc float %0, %1
  %4 = fsub reassoc float %3, %0
  ret float %4
}

This shows that for the rewrite a + b - a into b, LLVM requires these two flags: reassoc and nsz. Just reassoc alone isn't enough, as seen on a_plus_b_minus_a_2. That means that the semantics of reassoc alone aren't enough to neglect the difference between positive and negative zero. That's fine in itself, but in order to be consistent with that, this should then also require the ninf and nnan flags, which is supposed to be how we allow neglecting infinities and NaNs respectively. The reassociation does lose both some infinites and some NaNs, for instance if x is a large positive finite value such that x + x == +inf, then x + x - x is also +inf, but the rewrite results in x instead of +inf. So the bug here is that the rewrite was performed on a_plus_b_minus_a_1 even though it did not have the ninf and nnan flags.

Thoughts about LLVM issue 2

  • If we did use LLVM fast-math (if somehow Issue 1 could be averted), for the time being, we would use wholesale fast, not the fine-grained flags, which Issue 2 shows to have half-way implemented semantics.
  • This means that we can't use solutions to Issue 1 of the form "just use this specific set of fine-grained flags that avoid the UB issues". We need UB to be rooted out of fast-math semantics altogether.

What we can do for now in IREE or in MLIR

  • In IREE:
    • We can just continue doing the rewrites we need on structured values (vectors, tensors etc) in MLIR.
    • We can introduce late MLIR arith dialect rewrite patterns to perform fast-math-style rewrites before going to LLVM IR.
    • We can introduce a non-default flag to enable LLVM fast-math semantics for experimentation. It may yield a useful speedup in some cases. We just won't be able to make it default until the above LLVM issues are resolved.
  • In MLIR:
    • We could introduce fast-math flags in MLIR dialects such as arith/math/vector, and design them to avoid the issues with the LLVM flags.
@bjacob bjacob added codegen Shared code generation infrastructure and dialects codegen/llvm LLVM code generation compiler backend performance ⚡ Performance/optimization related work across the compiler and runtime labels Jan 20, 2025
@kuhar
Copy link
Member

kuhar commented Jan 20, 2025

cc: @dcaballe @chelini

I think the last bullet point might interest you. In short, I think we might benefit from a set of MLIR fast math flags that is not just a mirror of those in LLVM IR, and instead decouples rewrite legality from assumptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen/llvm LLVM code generation compiler backend codegen Shared code generation infrastructure and dialects performance ⚡ Performance/optimization related work across the compiler and runtime
Projects
None yet
Development

No branches or pull requests

2 participants