Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with f4b9839d (Sep 04) (20) #373

Merged
merged 289 commits into from
Dec 10, 2024

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

xen0n and others added 30 commits August 31, 2024 07:06
…#106835)

The target name and the message are wrong -- both should say "cuda" for
the filtering to work.

Fixes commit 300e5b9 (llvm#93186).
In llvm#92581 the `LibomptargetUitls.cmake` helpers have been removed, but
only uses of `libomptarget_say` were migrated. Migrate the remaining few
warning and error messages so the `check-offload` target would not fail
due to missing `libomptarget_warning_say`.

While at it, update the `check-offload` unavailability message to say
`check-offload` instead of `check-libomptarget`.

Fixes llvm#92581
…lvm#106634)

Summary:
The `langinfo.h` header is a POSIX extension, so ideally we would be
able to build the C++ library without it. Currently the LLVM C library
doesn't support / provide it. This allows us to build the C++ library
with locales enabled. We can either disable it here, or just provide
stubs that do nothing as in
llvm#106620.
…6632)

Summary:
We currently do not provide a more complicated rune table, so we want
the
default.
This patch adds cost model support for [u|s]cmp.
/llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:21558:14: error: unused variable 'ValLMUL' [-Werror,-Wunused-variable]
    unsigned ValLMUL =
             ^
/llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp:21561:14: error: unused variable 'PartLMUL' [-Werror,-Wunused-variable]
    unsigned PartLMUL =
             ^
2 errors generated.
…mber of elements in operands.

Patch adds basic support for non-power-of-2 number of elements in
operands. The patch still requires that this number addresses whole
registers.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: llvm#106449
First step for support WaveSize attribute in
 https://microsoft.github.io/DirectX-Specs/d3d/HLSL_SM_6_6_WaveSize.html
and
https://microsoft.github.io/hlsl-specs/proposals/0013-wave-size-range.html

A new attribute HLSLWaveSizeAttr was supported in the AST.

Implement both the wave size and the wave size range, rather than
separately which might require more work.

For llvm#70118
HLSL output parameters are denoted with the `inout` and `out` keywords
in the function declaration. When an argument to an output parameter is
constructed a temporary value is constructed for the argument.

For `inout` pamameters the argument is initialized via copy-initialization
from the argument lvalue expression to the parameter type. For `out`
parameters the argument is not initialized before the call.

In both cases on return of the function the temporary value is written
back to the argument lvalue expression through an implicit assignment
binary operator with casting as required.

This change introduces a new HLSLOutArgExpr ast node which represents
the output argument behavior. The OutArgExpr has three defined children:
- An OpaqueValueExpr of the argument lvalue expression.
- An OpaqueValueExpr of the copy-initialized parameter.
- A BinaryOpExpr assigning the first with the value of the second.

Fixes llvm#87526

---------

Co-authored-by: Damyan Pepper <[email protected]>
Co-authored-by: John McCall <[email protected]>
…lvm#106828)

Stop adding liveins for virtual registers. In the livein interface, the
register goes through a MCPhysReg which is uint16_t. This causes the
virtual register bit to be dropped making it alias to some nonsense
physical register.

Recompute the liveins for the continue block to handle any live
registers that are needed by instructions that were spliced from the
original block. This fixing the machine verifier error so we can remove
that fixme now.
llvm#105740)

…limination

ArgumentPromotion and DeadArgumentElimination passes may change function
signature. This makes bpf tracing difficult since users either not aware
of signature change or need to poke into IR or assembly to understand
the function signature change.

This patch enabled to emit some remarks so if recompiling with
-foptimization-record-file=<file>, users can check remarks to see what
kind of signature changes for a particular function. The following are
some examples for implemented remarks:
```
  Pass:            deadargelim
  Name:            ReturnValueRemoved
  DebugLoc:        { File: 'bpf-next/net/mptcp/protocol.c', Line: 572, Column: 0 }
  Function:        mptcp_check_data_fin
  Args:
    - String:          'removing return value '
    - String:          '0'

  Pass:            deadargelim
  Name:            ArgumentRemoved
  DebugLoc:        { File: 'bpf-next/kernel/bpf/syscall.c', Line: 1670, Column: 0 }
  Function:        map_delete_elem
  Args:
      - String:          'eliminating argument '
      - ArgName:         uattr.coerce0
      - String:          '('
      - ArgIndex:        '1'
      - String:          ')'

  Pass:            argpromotion
  Name:            ArgumentPromoted
  DebugLoc:        { File: 'bpf-next/net/mptcp/protocol.h', Line: 570, Column: 0 }
  Function:        mptcp_subflow_ctx
  Args:
    - String:          'promoting argument '
    - ArgName:         sk
    - String:          '('
    - ArgIndex:        '0'
    - String:          ')'
    - String:          ' to pass by value'
```
  [1] llvm#104678
This applies to function template non-call partial ordering the same
provisional wording change applied in the call context: Don't perform
the consistency check on return type and parameters which didn't have
any template parameters deduced from.

Fixes regression introduced in llvm#100692, which was reported on the PR.
…m#101414)

We have been discussing changes to our commit access polices recently
and based on some feedback from clattner here:

https://discourse.llvm.org/t/rfc-new-criteria-for-commit-access/76290/81

We need to update our Developer Policy so that it matches what we are
actually doing in this project. We currently grant commit access to
anyone with a valid justification, not just contributors who have
submitted high-quality patches in the past.

---------

Co-authored-by: Shilei Tian <[email protected]>
This matches the MachineBasicBlock liveins used to populate it.
…vm#105728)

Basic infrastructure to collect Function properties in Metadata Analysis
- Add a `SmallVector` of entry properties to the metadata information.
- Add a structure to represent function properties. Currently
`numthreads` and shader kind properties of shader entry functions are
represented.
…llvm#105510)

This patch replaces all dominated uses of condition with true/false to
improve context-sensitive optimizations. It eliminates a bunch of
branches in llvm-opt-benchmark.

As a side effect, it may introduce new phi nodes in some corner cases.
See the following case:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
   br i1 %cond, label %bb1, label %bb2
bb1:
   br i1 %cmp, label %if.then, label %if.else
if.then:
   br %bb2
if.else:
   br %bb2
bb2:
  %res = phi i1 [%cmp, %entry], [%cmp, %if.then], [%cmp, %if.else]
  ret i1 %res
}
```
It will be simplified into:
```
define i1 @test(i1 %cmp, i1 %cond) {
entry:
   br i1 %cond, label %bb1, label %bb2
bb1:
   br i1 %cmp, label %if.then, label %if.else
if.then:
   br %bb2
if.else:
   br %bb2
bb2:
  %res = phi i1 [%cmp, %entry], [true, %if.then], [false, %if.else]
  ret i1 %res
}
```

I am planning to fix this in late pipeline/CGP since this problem exists
before the patch.
Fix the DeclID not being set in global temporaries and use the same
strategy for deciding if a temporary is readable as the current
interpreter.
As far as I can tell, there's no way to call this. There are no calls in
the X86 directory. It has the same name as a function in MCRegisterInfo,
but that function takes a MCRegister and isn't virtual.

The function in MCRegisterInfo uses a DenseMap populated by
`X86_MC::initLLVMToSEHAndCVRegMapping`. The DenseMap is populated for
every physical register using the encoding value. I think that means the
function in MCRegisterInfo would return the same value as the function
in X86RegisterInfo.
…106886)

The LegalizeDAG expansion will go through memory since i16 isn't a legal
type. Avoid this by using FMV nodes.
…n Windows (llvm#106794)

Suppresses the copyright banner for `ml64` compiling BLAKE3 assembly
sources with MSVC and Ninja on Windows:

```
[157/3758] Building ASM_MASM object lib\Support\BLAKE3\CMa...upportBlake3.dir\blake3_avx512_x86-64_windows_msvc.asm.obj
Microsoft (R) Macro Assembler (x64) Version 14.41.34120.0
Copyright (C) Microsoft Corporation.  All rights reserved.

 Assembling: C:\path\to\llvm-project\llvm\lib\Support\BLAKE3\blake3_avx512_x86-64_windows_msvc.asm
```

is now just:

```
 Assembling: C:\path\to\llvm-project\llvm\lib\Support\BLAKE3\blake3_avx512_x86-64_windows_msvc.asm
```

We can suppress that last line with `/quiet` in more recent versions of
`ml64` (from MSVC 2022 17.6) but it is not supported by all potential
MASM compilers.
)

This doesn't seem to have any use other than the possibility of merge
conflicts and accidentally forgetting to update `NUM_PREDEF_DECL_IDS`.
When the shuffle masks are `PoisonMaskElem`, there is not need to check
the cost of `SK_ExtractSubvector`. It is free. Otherwise, it will cause
the compiler to crash.

Assertion `(Idx + EltsPerVector) <= alignTo(NumElts, EltsPerVector) &&
"SK_ExtractSubvector index out of range"' failed.
topperc and others added 25 commits September 3, 2024 16:04
Select only needs branches and moves so we don't need to promote it.
Promoting would canonicalize NaNs which select shouldn't do.
The only instance where we weren't already passing a `StringRef` with a
known length to `Symbol`'s constructor is where the argument is a string
literal. Even in that case, lazy `strlen` calls don't make sense, as the
compiler can constant-evaluate the `StringRef(const char*)` constructor.

For symbols that go into the symbol table we need the length when
calculating the hash anyway. We could get away with not calling
`getName()` for local symbols, but the total contribution of `strlen` to
the run time is already below 1%, so that would just complicate the code
for a negligible benefit.
Add an overload of `InlineFunction` that updates the contextual profile. If there is no contextual profile, this overload is equivalent to the non-contextual profile variant.

Post-inlining, the update mainly consists of:
- making the PGO instrumentation of the callee "the caller's": the owner function (the "name" parameter of the instrumentation instructions) becomes the caller, and new index values are allocated for each of the callee's indices (this happens for both increment and callsite instrumentation instructions)
- in the contextual profile:
   - each context corresponding to the caller has its counters updated to incorporate the counters inherited from the callee at the inlined callsite. Counter values are copied as-is because no scaling is required since the profile is contextual.
   - the contexts of the callee (at the inlined callsite) are moved to the caller.
   - the callee context at the inlined callsite is deleted.
Uses a static lock to ensure multiple threads reporting issues at the
same time don't have printing collisions. This isn't so important now,
but will be with continue mode in the future.
`memory read` will return an error if you try to read more than 1k bytes
in a single command, instructing you to set
`target.max-memory-read-size` or use `--force` if you intended to read
more than that. This is a safeguard for a command where people are being
explicit about how much memory they would like lldb to read (either to
display, or save to a file) and is an annoyance every time you need to
read more than a small amount. If someone confuses the --count argument
with the start address, lldb may begin dumping gigabytes of data but I'd
rather that behavior than requiring everyone to special-case their way
around a common use case.

I don't want to remove the setting because many people have added (much
larger) default max read sizes to their ~/.lldbinit files after hitting
this behavior. Another option would be to stop reading/using the value
in Target.cpp, but I see no harm in leaving the setting if someone
really does prefer to have a small cap on their memory read size.
Despite the stale comments, none of these actually use TTI, and they're
solely generating standard LLVM IR.
…ion) (llvm#107131)

[CWG2486](https://cplusplus.github.io/CWG/issues/2486.html) "Call to
`noexcept` function via `noexcept(false)` pointer/lvalue" allows
`noexcept` functions to be called via `noexcept(false)` pointers or
values. There appears to be no implementation divergence whatsoever:
https://godbolt.org/z/3afTfeEM8. That said, in C++14 and earlier we do
not issue all the diagnostics we issue in C++17 and newer, so I'm
specifying the status of the issue accordingly.
This patch implements sandboxir:: ConstantAggregate, ConstantStruct,
ConstantArray and ConstantVector, mirroring LLVM IR.
…in/Zvfhmin+Zfbfmin/Zfhmin. (llvm#106637)

Previously, if Zfbfmin/Zfhmin were enabled, we only handled
build_vectors that could be turned into splat_vectors. We promoted them
to f32 splats by extending in the scalar domain and narrowing in the
vector domain.

This patch fixes a crash where we failed to account for whether the f32
vector type fit in LMUL<=8.

Because the new lowering occurs after type legalization, we have to be
careful to use XLenVT for the scalar integer type and use custom cast
nodes.
…m#107157)

The `Kind` argument does not need to passed separately.
This patch covers Core issues about language linkage during declaration
matching resolved in
[P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html),
namely [CWG563](https://cplusplus.github.io/CWG/issues/563.html) and
[CWG1818](https://cplusplus.github.io/CWG/issues/1818.html).

[CWG563](https://cplusplus.github.io/CWG/issues/563.html) "Linkage
specification for objects"
-----------

[P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html):
> [CWG563](https://cplusplus.github.io/CWG/issues/563.html) is resolved
by simplifications that follow its suggestions.

Wording ([[dcl.link]/5](https://eel.is/c++draft/dcl.link#5)):
> In a
[linkage-specification](https://eel.is/c++draft/dcl.link#nt:linkage-specification),
the specified language linkage applies to the function types of all
function declarators and to all functions and variables whose names have
external linkage[.](https://eel.is/c++draft/dcl.link#5.sentence-5)

Now the wording clearly says that linkage-specification applies to
variables with external linkage.

[CWG1818](https://cplusplus.github.io/CWG/issues/1818.html) "Visibility
and inherited language linkage"
------------

[P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html):
>
[CWG386](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#386),
[CWG1839](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1839),
[CWG1818](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1818),
[CWG2058](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2058),
[CWG1900](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1900),
and Richard’s observation in [“are non-type names ignored in a
class-head-name or
enum-head-name?”](http://lists.isocpp.org/core/2017/01/1604.php) are
resolved by describing the limited lookup that occurs for a
declarator-id, including the changes in Richard’s [proposed resolution
for
CWG1839](http://wiki.edg.com/pub/Wg21cologne2019/CoreWorkingGroup/cwg1839.html)
(which also resolves CWG1818 and what of CWG2058 was not resolved along
with CWG2059) and rejecting the example from
[CWG1477](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1477).

Wording ([[dcl.link]/6](https://eel.is/c++draft/dcl.link#6)):
> A redeclaration of an entity without a linkage specification inherits
the language linkage of the entity and (if applicable) its
type[.](https://eel.is/c++draft/dcl.link#6.sentence-2).

Answer to the question in the example is `extern "C"`, and not linkage
mismatch. Further analysis of the example is provided as inline comments
in the test itself. Note that https://eel.is/c++draft/dcl.link#7 does
NOT apply in this example, as it's focused squarely at declarations that
are already known to have C language linkage, and declarations of
variables in the global scope.
These don't look like they've been used since the original 'use-diet'
branch was merged in 2008 ( f6caff6)
…5614)

In this patch, we implement the `computeCost()` function in
`VPWidenMemoryRecipe`.
…vm#105686)

The lowering happens in post-legalizer lowering if any source registers
from G_BUILD_VECTOR are not constants.

Add pattern pragment setting `scalar_to_vector ($src)` asequivalent to
`vector_insert (undef), ($src), (i61 0)`
…lvm#106094)

Fix `RankedTensorType` equality check in unpack op canonicalization.
…7074)

A macro definition needs its own scope stack in the annotator, so we add
the MacroBodyScopes stack and use ScopeStack to refer to it when in the
macro definition body.

Also, we need to have a scope type for a child block because its parent
line is parsed (and thus the scope type for the braces is popped off the
scope stack) before the lines in the child block are.

Fixes llvm#99271.
…SPIRV (llvm#107110)

This patch add a type check for `tensor.extract` in TensorToSPIRV.
Only convert `tensor.extract` with supported element type. Fix llvm#74466.
Base automatically changed from bump_to_1293ab35 to feature/fused-ops December 9, 2024 18:02
@mgehre-amd mgehre-amd merged commit 379903b into feature/fused-ops Dec 10, 2024
38 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_f4b9839d branch December 10, 2024 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.