Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump to 9d66dcaf172c (May 11, TOSA reshape fix) (43) #302

Merged
merged 916 commits into from
Aug 27, 2024

Conversation

mgehre-amd
Copy link
Collaborator

@mgehre-amd mgehre-amd commented Aug 23, 2024

Required by Xilinx/torch-mlir#259

topperc and others added 30 commits May 8, 2024 16:20
…lvm#90989)

The lowering produces fir.dummy_scope operation if the current
function has dummy arguments. Each hlfir.declare generated
for a dummy argument is then using the result of fir.dummy_scope
as its dummy_scope operand. This is only done for HLFIR.

I was not able to find a reliable way to identify dummy symbols
in `genDeclareSymbol`, so I added a set of registered dummy symbols
that is alive during the variables instantiation for the current
function. The set is initialized during the mapping of the dummy
argument symbols to their MLIR values. It is reset right after
all variables are instantiated - this is done to avoid generating
hlfir.declare operations with dummy_scope for the clones of
the dummy symbols (e.g. this happens with OpenMP privatization).

If this can be done in a cleaner way, please advise.
This adds the instrumentation lowering pass.

(Tracking Issue: llvm#89287, RFC referenced there)
…SCVISAInfo::parseArchString. NFC (llvm#91538)

We can use a SmallVector<StringRef>.

Adjust the code so we check for empty strings in the loop instead of
making a copy of the vector returned from StringRef::split.

This overlaps with llvm#91532 which also removed the std::vector, but
that PR may be more controversial.
It is inaccurate and needs to be corrected.
A compiler can generate a redundant indirection for a jump via a fixed
jump table target. Add a test case that covers such pattern that covers
PIC case. We already have non-PIC case detection.

Currently XFAIL.
…pression. (llvm#90918)

Fixes llvm#90498.

Same as 5337efc for atomic builtins, but for `std::atomic` this
time. This is useful because even though the actual builtin atomic is
still there, it may be buried beyond the inlining depth limit.

Also add one popular custom smart pointer class name to the name-based
heuristics, which isn't necessary to fix the bug but arguably a good
idea regardless.
…4132)

Under some circumstance (library loaded with the main program), TLS
initial-exec model can be applied to local-dynamic access(es). We
could use some simple heuristic to decide the update at function level:
* If there is equal or less than a number of TLS local-dynamic access(es)
in the function, use TLS initial-exec model. (the threshold which default to
1 is controlled by hidden option)
`Count` and `Skip` should use `uint64_t` as they are encoded/decoded
using 64-bit ULEB128.

In `*_OPCODE_DO_*_ULEB_TIMES_SKIPPING_ULEB`, `Skip` could be encoded as
a two's complement for moving `SegmentOffset` backwards. Having a 32-bit
`Skip` truncates the encoded value and leads to a malformed
`AdvanceAmount`
and invalid `SegmentOffset` that extends past valid sections.
These tests did not test what they were supposed to. The transform
fails to actually handle the commuted cases.
These used both lower and upper case variants of the same name,
resulting in malformed check lines when regenerated.
…88846)

String pool merging currently, for a reason that's not entirely clear to
me, tries to create GEP instructions instead of GEP constant expressions
when replacing constant references. It only uses constant expressions in
cases where this is required. However, it does not catch all cases where
such a requirement exists. For example, the landingpad catch clause has
to be a constant.

Fix this by always using the constant expression variant, which also
makes the implementation simpler.

Additionally, there are some edge cases where even replacement with a
constant GEP is not legal. The one I am aware of is the
llvm.eh.typeid.for intrinsic, so add a special case to forbid
replacements for it.

Fixes llvm#88844.
…#91546)

We failed to use BSETI when bit 31 was set and a few bits above bit 31
were set. We also failed to use multiple BSETI when the low 32 bits were
zero.

I've removed the special cases for constants 0x80000000-0xffffffff and
wrote a more generic algorithm for BSETI.

I've rewritten the BCLRI handling to be similar to the new BSETI
algorithm. This picks up cases where bit 31 is 0 and only a few high
bits are 0.
…o pass constructor.

It is already initialized in RISCVTargetMachine.cpp
…andled Arguments in __kmpc_fork_call_if (llvm#82221)

Root cause: Segmentation fault is caused by null pointer dereference
inside the __kmpc_fork_call_if function at
https://github.com/llvm/llvm-project/blob/main/openmp/runtime/src/z_Linux_asm.S#L1186
. __kmpc_fork_call_if is missing case to handle argc=0 .

Fix: Added a check inside the __kmp_invoke_microtask function to handle
the case when argc is 0.

---------

Co-authored-by: Singh <[email protected]>
llvm#91517)

Fixes bug llvm#90769. Many thanks
to @Meinersbur for providing the initial thought and solution to this.
Test all commuted variants of the pattern, most of which currently
fail to fold.
This pattern only handled commutation in the "or", while all
involved operations are commutative. Make sure we handle all
sixteen patterns.
…lvm#91189)

The implementation is straight-forward, but comes with a big disclaimer.
See llvm#91186 for details.
…lvm#91321)

A leaf function may not store the link register to stack, but we it can
still end up being a non-zero frame if it gets interrupted by a signal.
Currently, we were unable to unwind past this function because we could
not read the link register value.

To make this work, this patch:
- changes the function-entry unwind plan to include the `fp|lr = <same>`
rules. This in turn necessitated an adjustment in the generic
instruction emulation logic to ensure that `lr=[sp-X]` can override the
`<same>` rule.
- allows the `<same>` rule for pc and lr in all
`m_all_registers_available` frames (and not just frame zero).

The test verifies that we can unwind in a situation like this, and that
the backtrace matches the one we computed before getting a signal.
…lvm#91560)

I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  10 under mlir/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
…#89026)

Generally, IR and assembly test files benefit from being cleaned to
remove unnecessary details. However, for tests requiring elaborate
IR or assembly files where cleanup is less practical (e.g., large amount
of debug information output from Clang), the current practice is to
include the C/C++ source file and the generation instructions as
comments.

This is inconvenient when regeneration is needed. This patch adds
`llvm/utils/update_test_body.py` to allow easier regeneration.

`ld.lld --debug-names` tests (llvm#86508) utilize this script for
Clang-generated assembly tests.

Note: `-o pipefail` is standard (since
https://www.austingroupbugs.net/view.php?id=789) but not supported by
dash.

Link:
https://discourse.llvm.org/t/utility-to-generate-elaborated-assembly-ir-tests/78408
This PR address issue llvm#89002.

#### Changes in this PR

* Added a simple implementation of `cpp::lock_guard` (an equivalent of
`std::lock_guard`) in libc/src/__support/CPP inspired by the libstdc++
implementation
* Added tests for `cpp::lock_guard` in
/libc/test/src/__support/CPP/mutex_test.cpp
* Replaced all references to `MutexLock` with `cpp::lock_guard`

---------

Co-authored-by: Guillaume Chatelet <[email protected]>
The following program produces a diagnostic in Clang and EDG, but
compiles correctly in GCC and MSVC:
```cpp
#include <vector>

consteval std::vector<int> fn() { return {1,2,3}; }
constexpr int a = fn()[1];
```

Clang's diagnostic is as follows:
```cpp
<source>:6:19: error: call to consteval function 'fn' is not a constant expression
    6 | constexpr int a = fn()[1];
      |                   ^
<source>:6:19: note: pointer to subobject of heap-allocated object is not a constant expression
/opt/compiler-explorer/gcc-snapshot/lib/gcc/x86_64-linux-gnu/14.0.1/../../../../include/c++/14.0.1/bits/allocator.h:193:31: note: heap allocation performed here
  193 |             return static_cast<_Tp*>(::operator new(__n));
      |                                      ^
1 error generated.
Compiler returned: 1
```

Based on my understanding of
[`[dcl.constexpr]/6`](https://eel.is/c++draft/dcl.constexpr#6):
> In any constexpr variable declaration, the full-expression of the
initialization shall be a constant expression

It seems to me that GCC and MSVC are correct: the initializer `fn()[1]`
does not evaluate to an lvalue referencing a heap-allocated value within
the `vector` returned by `fn()`; it evaluates to an lvalue-to-rvalue
conversion _from_ that heap-allocated value.

This PR turns out to be a bug fix on the implementation of
[P2564R3](https://wg21.link/p2564r3); as such, it only applies to C++23
and later. The core problem is that the definition of a
constant-initialized variable
([`[expr.const/2]`](https://eel.is/c++draft/expr.const#2)) is contingent
on whether the initializer can be evaluated as a constant expression:

> A variable or temporary object o is _constant-initialized_ if [...]
the full-expression of its initialization is a constant expression when
interpreted as a _constant-expression_, [...]

That can't be known until we've finished parsing the initializer, by
which time we've already added immediate invocations and consteval
references to the current expression evaluation context. This will have
the effect of evaluating said invocations as full expressions when the
context is popped, even if they're subexpressions of a larger constant
expression initializer. If, however, the variable _is_
constant-initialized, then its initializer is [manifestly
constant-evaluated](https://eel.is/c++draft/expr.const#20):

> An expression or conversion is _manifestly constant-evaluated_ if it
is [...] **the initializer of a variable that is usable in constant
expressions or has constant initialization** [...]

which in turn means that any subexpressions naming an immediate function
are in an [immediate function
context](https://eel.is/c++draft/expr.const#16):

> An expression or conversion is in an immediate function context if it
is potentially evaluated and either [...] it is a **subexpression of a
manifestly constant-evaluated expression** or conversion

and therefore _are not to be considered [immediate
invocations](https://eel.is/c++draft/expr.const#16) or
[immediate-escalating
expressions](https://eel.is/c++draft/expr.const#17) in the first place_:

> An invocation is an _immediate invocation_ if it is a
potentially-evaluated explicit or implicit invocation of an immediate
function and **is not in an immediate function context**.

> An expression or conversion is _immediate-escalating_ if **it is not
initially in an immediate function context** and [...]


The approach that I'm therefore proposing is:
1. Create a new expression evaluation context for _every_ variable
initializer (rather than only nonlocal ones).
2. Attach initializers to `VarDecl`s _prior_ to popping the expression
evaluation context / scope / etc. This sequences the determination of
whether the initializer is in an immediate function context _before_ any
contained immediate invocations are evaluated.
3. When popping an expression evaluation context, elide all evaluations
of constant invocations, and all checks for consteval references, if the
context is an immediate function context. Note that if it could be
ascertained that this was an immediate function context at parse-time,
we [would never have
registered](https://github.com/llvm/llvm-project/blob/760910ddb918d77e7632be1678f69909384d69ae/clang/lib/Sema/SemaExpr.cpp#L17799)
these immediate invocations or consteval references in the first place.

Most of the test changes previously made for this PR are now reverted
and passing as-is. The only test updates needed are now as follows:
- A few diagnostics in `consteval-cxx2a.cpp` are updated to reflect that
it is the `consteval tester::tester` constructor, not the more narrow
`make_name` function call, which fails to be evaluated as a constant
expression.
- The reclassification of `warn_impcast_integer_precision_constant` as a
compile-time diagnostic adds a (somewhat duplicative) warning when
attempting to define an enum constant using a narrowing conversion. It
also, however, retains the existing diagnostics which @erichkeane
(rightly) objected to being lost from an earlier revision of this PR.

---------

Co-authored-by: cor3ntin <[email protected]>
RKSimon and others added 24 commits May 10, 2024 22:40
Initializes the MachineFunctionInfo for unittest introduced in llvm#88257

Additionally, also rephrases the test condition considering the abort/exit does not occur within emitKernel but is generally triggered through the Ctx.hadError() call.
We can extract each extension as we process them without much
complexity.
…lvm#91679)

libfuzzer's -jobs option will, depending on the number of CPUs, spin up
a
WorkerThread and end up printing the log file using CopyFileToErr.
This leads to an MSan false positive. This patch disables the MSan
interceptor checks,
similarly to other instances in https://reviews.llvm.org/D48891

Side-note: this false positive issue first appeared when printf()
was replaced by puts() (90b4d1b).
The interceptor check was always present; however, MSan does not
check_printf by default.
A few tests were checking for all driver flags.
We should revert this commit when we revert llvm#91811
…lvm#91782)

`vp.reduce.fmaximum/fminimum` are the VP version of
`vector.reduce.fmaximum/fminimum`.
…t instead of `APInt *`

The `APInt *` version is pretty useless as any case one needs an
`APInt *` out, they could just replace whatever they have the
`m_Checked...` lambda with direct checks on the `APInt`.

Leaving other helpers such as `m_Negative`, `m_Power2`,
etc... unchanged as the `APInt` out version is used mostly for
convenience and rarely change functionality when converted output a
`Constant *`.

Closes llvm#91377
The uses have been removed by commit 2b5cd8b
Remove FormatToken::isSimpleTypeSpecifier() and call
Token::isSimpleTypeSpecifier(LangOpts) instead.
There are 100+ references.
Use a wrapper similar to a623a4c
…sion expand from macro when ``IgnoreMacro`` option is enabled. (llvm#91757)

Fixes: llvm#91487
Align BAT YAML to fdata profile.

Test Plan: updated register-fragments-bolt-symbols.s

Reviewers: dcci, rafaelauler, ayermolo, maksfb

Reviewed By: dcci

Pull Request: llvm#91773
Align with DataReader::readProfile that sets entry block counts from
FuncBranchData->EntryData.

Test Plan: updated bolt-address-translation-yaml.test
[P1787R6](https://wg21.link/p1787r6):
> [CWG1820](https://cplusplus.github.io/CWG/issues/1820.html) is
resolved by requiring that a qualified declarator-id declare an entity.

P1787R6 changes wording of [dcl.pre]/9. Quote from the current draft
([[dcl.pre]/5](https://eel.is/c++draft/dcl.pre#5)):
> If a
[declarator-id](https://eel.is/c++draft/dcl.decl.general#nt:declarator-id)
is a name, the
[init-declarator](https://eel.is/c++draft/dcl.decl.general#nt:init-declarator)
and (hence) the declaration introduce that
name[.](https://eel.is/c++draft/dcl.pre#5.sentence-1)
> [Note [3](https://eel.is/c++draft/dcl.pre#note-3): Otherwise, the
[declarator-id](https://eel.is/c++draft/dcl.decl.general#nt:declarator-id)
is a
[qualified-id](https://eel.is/c++draft/expr.prim.id.qual#nt:qualified-id)
or names a destructor or its
[unqualified-id](https://eel.is/c++draft/expr.prim.id.unqual#nt:unqualified-id)
is a [template-id](https://eel.is/c++draft/temp.names#nt:template-id)
and no name is
introduced[.](https://eel.is/c++draft/dcl.pre#5.sentence-2)
— end note]
GCC 12 and 13 generate incorrect code for a pattern in the
tosa-to-tensor pass responsible for lowering tosa.reshape. This results
in the tosa.reshape lowering producing IR which fails to verify. I've
narrowed down the set of cmake flags needed to reproduce the issue to
this:

    cmake -G Ninja ../llvm \
      -DLLVM_ENABLE_PROJECTS="mlir" \
      -DLLVM_TARGETS_TO_BUILD=host \
      -DLLVM_ENABLE_PROJECTS=mlir \
      -DCMAKE_BUILD_TYPE="Release" \
      -DCMAKE_CXX_FLAGS_RELEASE="-O2" \
      -DCMAKE_CXX_FLAGS="-O2" \
      -DCMAKE_CXX_COMPILER=g++ \
      -DCMAKE_C_COMPILER=gcc

This is the failing test case:

func.func @fails_in_gcc_12(%arg0: tensor<?xf32>) -> tensor<1x1x1x?xf32>
{
%0 = tosa.reshape %arg0 {new_shape = array<i64: 1, 1, 1, -1>} :
(tensor<?xf32>) -> tensor<1x1x1x?xf32>
      return %0 : tensor<1x1x1x?xf32>
    }

This should lower to a tensor.expand_shape operation like so:

    func.func @foo(%arg0: tensor<?xf32>) -> tensor<1x1x1x?xf32> {
      %c0 = arith.constant 0 : index
      %dim = tensor.dim %arg0, %c0 : tensor<?xf32>
      %c1 = arith.constant 1 : index
%expanded = tensor.expand_shape %arg0 [[0, 1, 2, 3]] output_shape [1, 1,
1, %dim] : tensor<?xf32> into tensor<1x1x1x?xf32>
      return %expanded : tensor<1x1x1x?xf32>
    }

Under GCC 12/13 with the above cmake configuration, the
tensor.expand_shape looks like this

%2 = "tensor.expand_shape"(%arg0) <{reassociation = [[0, 1, 2, 3]],
static_output_shape = array<i64>}> : (tensor<?xf32>) ->
tensor<?x1x1x?xf32>

The key difference is the computed output type of `tensor<?x1x1x?xf32>`
rather than the expected `tensor<1x1x1x?xf32>`. This expand_shape fails
to verify with this error message:

error: 'tensor.expand_shape' op expected number of static shape dims to
be equal to the output rank (4) but found 0 inputs instead

The problematic code is calculating the intermediate shape of the
generated tensor.expand_shape operation in the
expand_shape/collapse_shape sequence that implements tosa.reshape.

    // Compute result shape
    bool resultIsStatic = true;
    auto resultShape = llvm::map_to_vector(newShape, [&](int64_t size) {
      // Omitted

// If we do not know the total size of the tensor, keep this dimension
      // dynamic in the result shape.
      if (!inputIsStatic) {
        resultIsStatic = false;
        return ShapedType::kDynamic;
      }
    });

    if (resultIsStatic) {
      // do something
      return;
    }

    // do something else
    return;

The failure point seems to be the update of the resultIsStatic variable
in the lambda body. The assignment of false is not propagated to the use
in the if-statement, resulting in the branch being taken when it should
not.

I've found several modification to the code that gets around the bug.
The version I settled on is one which makes the logic a little more
obvious.
@mgehre-amd mgehre-amd changed the title Bump to 9d66dcaf172c (May 11, TOSA reshape fix) Bump to 9d66dcaf172c (May 11, TOSA reshape fix) (43) Aug 23, 2024
@mgehre-amd mgehre-amd requested a review from cferry-AMD August 26, 2024 09:09
@mgehre-amd mgehre-amd merged commit d92bf11 into bump_to_82383d5f3fa8 Aug 27, 2024
5 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_9d66dcaf172c branch August 27, 2024 07:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.