[AutoBump] Merge with 9997e039 (5) #252

mgehre-amd · 2024-08-13T10:01:51Z

No description provided.

For the singless and signed integers overloads exist, so that the width does not need to be specified as an argument. This adds the same for integers without checking for signedness.

…NDVB instruction names Matches the SSE variants (which has a 0 qualifier to indicate the xmm0 explicit dependency)

…tage When we copied the IceLake model from the SkylakeServer model we missed this diff Confirmed with uops.info and Agner

This includes capturing symbols for global variables, functions, classes, and templated defintions. As pre-determing what symbols are generated from C++ declarations can be non-trivial, InstallAPI only parses select declarations for symbol generation when parsing c++. For example, installapi only looks at explicit template instantiations or full template specializations, instead of general function or class templates, for symbol emittion.

Testcase shows miscompile when dropping disjoint flag from disjoint or during vectorization.

This allows sharing the LLVM version number in libc++.

Recently building libc++ requires building libunwind too. This updates the LLDB instructions. I noticed this recently and it was separately filed as llvm#84053

llvm#84154) If the `m_editor_status` is `EditorStatus::Editing`, PrintAsync clears the currently edited line. In some situations, the edited line is not saved. After the stream flushes, PrintAsync tries to display the unsaved line, causing the loss of the edited line. The issue arose while I was debugging REPRLRun in [Fuzzilli](https://github.com/googleprojectzero/fuzzilli). I started LLDB and attempted to set a breakpoint in libreprl-posix.c. I entered `breakpoint set -f lib` and used the "tab" key for command completion. After completion, the edited line was flushed, leaving a blank line.

Introduced by llvm#83441.

Walter Erquinigo added optional instruction annotations for x86 instructions in 2022 for the `thread trace dump instruction` command, and code to DisassemblerLLVMC to add annotations for instructions that change flow control, v. https://reviews.llvm.org/D128477 This was added as an option to `disassemble`, and the trace dump command enables it by default, but several other instruction dumpers were changed to display them by default as well. These are only implemented for Intel instructions, so our disassembly on other targets ends up looking like ``` (lldb) x/5i 0x1000086e4 0x1000086e4: 0xa9be6ffc unknown stp x28, x27, [sp, #-0x20]! 0x1000086e8: 0xa9017bfd unknown stp x29, x30, [sp, #0x10] 0x1000086ec: 0x910043fd unknown add x29, sp, #0x10 0x1000086f0: 0xd11843ff unknown sub sp, sp, #0x610 0x1000086f4: 0x910c63e8 unknown add x8, sp, #0x318 ``` instead of `disassemble`'s output style of ``` lldb`main: lldb[0x1000086e4] <+0>: stp x28, x27, [sp, #-0x20]! lldb[0x1000086e8] <+4>: stp x29, x30, [sp, #0x10] lldb[0x1000086ec] <+8>: add x29, sp, #0x10 lldb[0x1000086f0] <+12>: sub sp, sp, #0x610 lldb[0x1000086f4] <+16>: add x8, sp, #0x318 ``` Adding symbolic annotations for assembly instructions is something I'm interested in too, because we may have users investigating a crash or apparent-incorrect behavior who must debug optimized assembly and they may not be familiar with the ISA they're using, so short of flipping through a many-thousand-page PDF to understand each instruction, they're lost. They don't write assembly or work at that level, but to understand a bug, they have to understand what the instructions are actually doing. But the annotations that exist today don't move us forward much on that front - I'd argue that the flow control instructions on Intel are not hard to understand from their names, but that might just be my personal bias. Much trickier instructions exist in any event. Displaying this information by default for all targets when we only have one class of instructions on one target is not a good default. Also, in 2011 when Greg implemented the `memory read -f i` (aka `x/i`) command ``` commit 5009f9d Author: Greg Clayton <[email protected]> Date: Thu Oct 27 17:55:14 2011 +0000 [...] eFormatInstruction will print out disassembly with bytes and it will use the current target's architecture. The format character for this is "i" (which used to be being used for the integer format, but the integer format also has "d", so we gave the "i" format to disassembly), the long format is "instruction". ``` he had DumpDataExtractor's DumpInstructions print the bytes of the instruction -- that's the first field we see above for the `x/5i` after the address -- and this is only useful for people who are debugging the disassembler itself, I would argue. I don't want this displayed by default either. tl;dr this patch removes both fields from `memory read -f -i` and I think this is the right call today. While I'm really interested in instruction annotation, I don't think `x/i` is the right place to have it enabled by default unless it's really compelling on at least some of our major targets.

This is a bit awkward.

This testcase was added to show miscompile in llvm#81872

This pull request fixes llvm#72116 where a new flag is introduced for compatibility with GCC 14, the functionality of -Wreturn-type is modified to split some of its behaviors into -Wreturn-mismatch Fixes llvm#72116

G_INSERT and G_EXTRACT are not sufficient to use to represent both INSERT/EXTRACT on a subregister and INSERT/EXTRACT on a vector. We would like to be able to INSERT/EXTRACT on vectors in cases that INSERT/EXTRACT on vector subregisters are not sufficient, so we add these opcodes. I tried to do a patch where we treated G_EXTRACT as both G_EXTRACT_SUBVECTOR and G_EXTRACT_SUBREG, but ran into an infinite loop at this [point](https://github.com/llvm/llvm-project/blob/8b5b294ec2cf876bc5eb5bd5fcb56ef487e36d60/llvm/lib/Target/RISCV/RISCVISelLowering.cpp#L9932) in the SDAG equivalent code.

) NFC. Test coverage on VOPC shows NotHasTrue16BitInsts on the pre-gfx11 instructions is necessary (we cannot use the default NoTrue16Predicate). Update the VOP2 instructions in the same manner.

Some comparison intrinsics were described as returning the "result" without specifying how. The "cmp" intrinsics return zero or all 1's in the corresponding elements of a returned vector; the "com" intrinsics return an integer 0 or 1. Also removed some redundant information.

llvm#84789)

…84717) This is for cased that we simply want to rename from ps.Mnemonic, but ps.Mnemonic itself is not supported as an alias.

…llvm#84382) Account for the descriptor containing a zero-length string. Also, avoid iterating backwards too far. This was detected by address sanitizer.

…lvm#84813) Reverts llvm#82899 Per the discussion on the PR, this needs more design and justification.

…84257) delete other static_assert

When pretty printing an Objective-C interface declaration, Clang previously didn't print any attributes that are applied to the declaration.

llvm#84583) symbol This patch puts the default breakpoint on the sanitizers_address_on_report symbol, and uses the old symbol as a backup if the default case is not found rdar://123911522

With llvm#83471 it reduces UBSAN overhead from 44% to 6%. Measured as "Geomean difference" on "test-suite/MultiSource/Benchmarks" with PGO build. On real large server binary we see 95% of code is still instrumented, with 10% -> 1.5% UBSAN overhead improvements. We can pass this test only with subset of UBSAN, so base overhead is smaller. We have followup patches to improve it even further.

Summary: The current behavior of HIP is that when --offload-device-only is set it still bundles the outputs into a fat binary. Even though this is different from how all the other targets handle this, it seems to be dependned on by some tooling so just make it backwards compatible for the `-fno-gpu-rdc` case.

The foreign TU list immediately follows the local TU list and they both use the same index, so that if there are N local TU entries, the index for the first foreign TU is N. Changed so that the size of local TU is accounted for when setting foreign TU index.

Problem description: llvm#81008 (comment) Solution: llvm#81008 (comment) (choose plan2)

This header guard is wrong and conflicts with the one from Transport.h

@danakj

…m#78876) Previously, `__bounded_iter` only checked `operator*`. It allowed the pointer to go out of bounds with `operator++`, etc., and relied on `operator*` (which checked `begin <= current < end`) to handle everything. This has several unfortunate consequences: First, pointer arithmetic is UB if it goes out of bounds. So by the time `operator*` checks, it may be too late and the optimizer may have done something bad. Checking both operations is safer. Second, `std::copy` and friends currently bypass bounded iterator checks. I think the only hope we have to fix this is to key on `iter + n` doing a check. See llvm#78771 for further discussion. Note this PR is not sufficient to fix this. It adds the output bounds check, but ends up doing it after the `memmove`, which is too late. Finally, doing these checks is actually *more* optimizable. See llvm#78829, which is fixed by this PR. Keeping the iterator always in bounds means `operator*` can rely on some invariants and only needs to check `current != end`. This aligns better with common iterator patterns, which use `!=` instead of `<`, so it's easier to delete checks with local reasoning. See https://godbolt.org/z/vEWrWEf8h for how this new `__bounded_iter` impacts compiler output. The old `__bounded_iter` injected checks inside the loops for all the `sum()` functions, which not only added a check inside a loop, but also impeded Clang's vectorization. The new `__bounded_iter` allows all the checks to be optimized out and we emit the same code as if it wasn't here. Not everything is ideal however. `add_and_deref` ends up emitting two comparisons now instead of one. This is because a missed optimization in Clang. I've filed llvm#78875 for that. I suspect (with no data) that this PR is still a net performance win because impeding ranged-for loops is particularly egregious. But ideally we'd fix the optimizer and make `add_and_deref` fine too. There's also something funny going on with `std::ranges::find` which I have not yet figured out yet, but I suspect there are some further missed optimization opportunities. Fixes llvm#78829. (CC @danakj)

Factor complex unary operations into their own function.

These tests were accidentally missed in llvm#83863

The DIV32/64 throughput was improved since Goldmont in the Atom architecture. The Alder Lake-E shows similar number too. So we shouldn't add such tunings to Gracemont and later products. Checked from Agner Fog's table and uops.info.

…vm#84873)

…ing G_BUILD_VECTOR (llvm#84452) It is safe to ignore undef values when selecting G_BUILD_VECTOR as undef values choose random registers for copying values from.

This only happens in C as far as I can tell. The complex varialbe will have undergone a conversion to bool in C++ before reaching the unary operator.

For non polymorphic entities, semantics knows the type size and rewrite sizeof to `"cst element size" * size(x)`. Lowering has to deal with the polymorphic case where the type size must be retrieved from the descriptor (note that the lowering implementation would work with any entity, polymorphic on not, it is just not used for the non polymorphic cases).

atom.add.noftz.f16 is supported since SM 7.0

)

This patch adds the thread ID to the subprocess memory shared memory names. This avoids conflicts for downstream consumers that might want to consume llvm-exegesis across multiple threads, which would otherwise run into conflicts due to the same PID running multiple instances.

…vm#84451)" This reverts commit 6bbe8a2. This breaks building LLVM on macOS, failing with llvm/tools/llvm-exegesis/lib/SubprocessMemory.cpp:146:33: error: out-of-line definition of 'setupAuxiliaryMemoryInSubprocess' does not match any declaration in 'llvm::exegesis::SubprocessMemory' Expected<int> SubprocessMemory::setupAuxiliaryMemoryInSubprocess(

…her (llvm#84339) At the moment, getUnderlyingObjects simply continues for phis that do not refer to the same underlying object in loops, without adding them to the list of underlying objects, effectively ignoring those phis. Instead of ignoring those phis, add them to the list of underlying objects. This fixes a miscompile where LoopAccessAnalysis fails to identify a memory dependence, because no underlying objects can be found for a set of memory accesses. Fixes llvm#82665. PR: llvm#84339

A mold argument need to be added to the hlfir.element_addr and set in lowering so that when the hlfir.element_addr need to be turned into an hlfir.elemental operation because the designator must be turned into a value, the mold can be set on the hlfir.elemental to later allocate the temporary according the the dynamic type. This situation happens whenever the vector subscripted polymorphic designator does not appear as an assignment left-hand side, or as an IO-input item. I initially thought retrieving the mold would be tricky if the dynamic type of the designator was set by a part-ref of the right of the vector subscripts ("array(vector)%polymorphic_comp"), but this turned out to be impossible because: 1. A derived type component can be polymorphic only if it has the POINTER or ALLOCATABLE attribute (F2023 C708). 2. Vector-subscripted part are ranked and F2023 C919 prohibits any part-ref on the right of the rank part to have the POINTER or ALLOCATABLE attribute. => If a vector subscripted designator is polymorphic, the vector subscripted part is the rightmost part, and the mold is the base of the vector subscripted part. This makes the retrieval of the mold easy in lowering. The mold argument is always set to be the base of the vector subscripted part when lowering the vector subscripted part, and it is removed at the end of the designator lowering if the designator is not polymorphic. This way there is no need to find back the mold from the inside of the hlfir.element_addr body.

COMPILER_RT_HAS_AUXV is used now in builtins so the test need to be in the builtin-config-ix.cmake too.

This is a one line fix for a Windows specific (I believe) build break. The build failure looks like this: `D:\a\_work\1\s\lldb\source\Symbol\Symtab.cpp(128): error C2440: '<function-style-cast>': cannot convert from 'lldb_private::ConstString' to 'llvm::StringRef' D:\a\_work\1\s\lldb\source\Symbol\Symtab.cpp(128): note: 'llvm::StringRef::StringRef': ambiguous call to overloaded function D:\a\_work\1\s\llvm\include\llvm/ADT/StringRef.h(840): note: could be 'llvm::StringRef::StringRef(llvm::StringRef &&)' D:\a\_work\1\s\llvm\include\llvm/ADT/StringRef.h(104): note: or 'llvm::StringRef::StringRef(std::string_view)' D:\a\_work\1\s\lldb\source\Symbol\Symtab.cpp(128): note: while trying to match the argument list '(lldb_private::ConstString)' D:\a\_work\1\s\lldb\source\Symbol\Symtab.cpp(128): error C2672: 'std::multimap<llvm::StringRef,const lldb_private::Symbol *,std::less<llvm::StringRef>,std::allocator<std::pair<const llvm::StringRef,const lldb_private::Symbol *>>>::emplace': no matching overloaded function found C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.37.32822\include\map(557): note: could be 'std::_Tree_iterator<std::_Tree_val<std::_Tree_simple_types<std::pair<const llvm::StringRef,const lldb_private::Symbol *>>>> std::multimap<llvm::StringRef,const lldb_private::Symbol *,std::less<llvm::StringRef>,std::allocator<std::pair<const llvm::StringRef,const lldb_private::Symbol *>>>::emplace(_Valty &&...)' ` The StringRef constructor here is intended to take a ConstString object, which I assume is implicitly converted to a std::string_view by compilers other than Visual Studio's. To fix the VS build I made the StringRef initialization more explicit, as you can see in the diff.

Use the new range attribute from llvm#84617 to simplify comparisons where both sides have range information.

The TranspBlocks set was used to cache aliasing decision for all processed loads in the parent loop. This is incorrect, because each load can access a different location, which means one load not being modified in a block doesn't translate to another load not being modified in the same block. All loads access the same underlying object, so we could perhaps use a location without size for all loads and retain the cache, but that would mean we loose precision. For now, just drop the cache. Fixes llvm#84807 PR: llvm#84835

Add the ability to set the number of tablegen jobs that can run in parallel similar to the LLVM_PARALLEL_[COMPILE|LINK]_JOBS options that already exist.

…#84739) Have DIBuilder conditionally insert either debug intrinsics or DbgRecord depending on the module's IsNewDbgInfoFormat flag. The insertion methods now return a `DbgInstPtr` (a `PointerUnion<Instruction *, DbgRecord *>`). Add a unittest for both modes (I couldn't find an existing test testing insertion behaviours specifically). This patch changes the existing assumption that DbgRecords are only ever inserted if there's an instruction to insert-before because clang currently inserts debug intrinsics while CodeGening (like any other instruction) meaning it'll try inserting to the end of a block without a terminator. We already have machinery in place to maintain the DbgRecords when a terminator is removed - these become "trailing DbgRecords" which are re-attached when a new instruction is inserted. All I've done is allow this state to occur while inserting DbgRecords too, i.e., it's not only removing terminators that causes this valid transient state, but inserting DbgRecords into incomplete blocks too. The C API will be updated in follow up patches. --- Note: this doesn't mean clang is emitting DbgRecords yet, because the modules it creates are still always in the old debug mode. That will come in a future patch.

jayfoad and others added 30 commits March 11, 2024 15:42

[CodeGen] Remove unused MachineRegisterInfo methods

575ca67

[mlir][IR] Add isInteger() (without width) (llvm#84467)

a924da6

For the singless and signed integers overloads exist, so that the width does not need to be specified as an argument. This adds the same for integers without checking for signedness.

[X86] Add missing register qualifier to the VBLENDVPD/VBLENDVPS/VPBLE…

0858c90

…NDVB instruction names Matches the SSE variants (which has a 0 qualifier to indicate the xmm0 explicit dependency)

[X86] (V)MPSADBW instructions can run on Port1 or Port5 for one uop s…

ad8c828

…tage When we copied the IceLake model from the SkylakeServer model we missed this diff Confirmed with uops.info and Agner

Precommit testcase for pr81872 (llvm#84782)

34acdb3

Testcase shows miscompile when dropping disjoint flag from disjoint or during vectorization.

[X86] Add AVX512 (x86-64-v4) coverage to generic shift combines tests

7dc4d5f

[X86] Add base SSE2 coverage to SRL/SRA combines tests

6cd68c2

[cmake] Exposes LLVM version number in the runtimes. (llvm#84641)

81e2047

This allows sharing the LLVM version number in libc++.

[LLDB][doc] Updates build instructions. (llvm#84630)

9a9aa41

Recently building libc++ requires building libunwind too. This updates the LLDB instructions. I noticed this recently and it was separately filed as llvm#84053

[libc] Fix forward arm32 builtbot (llvm#84794)

07d7b9c

Introduced by llvm#83441.

[bazel] Grab correct version info after 81e2047

36a2752

This is a bit awkward.

[LV] Address postcommit review for PR84782 (llvm#84797)

866ac9a

This testcase was added to show miscompile in llvm#81872

Add new flag -Wreturn-mismatch (llvm#82872)

8467457

This pull request fixes llvm#72116 where a new flag is introduced for compatibility with GCC 14, the functionality of -Wreturn-type is modified to split some of its behaviors into -Wreturn-mismatch Fixes llvm#72116

[AMDGPU][True16] Make NotHasTrue16BitInsts a True16Predicate (llvm#84771

2a3f27c

) NFC. Test coverage on VOPC shows NotHasTrue16BitInsts on the pre-gfx11 instructions is necessary (we cannot use the default NoTrue16Predicate). Update the VOP2 instructions in the same manner.

[AMDGPU] Add missing tests for GFX10 (t)buffer format d16 instructions (

2126046

llvm#84789)

AMDGPU: Add an argument to DS_Real_gfx12 to disable alias, NFC (llvm#…

23be732

…84717) This is for cased that we simply want to rename from ps.Mnemonic, but ps.Mnemonic itself is not supported as an alias.

[flang][unittests] Fix buffer underrun in LengthWithoutTrailingSpaces (…

5b4c350

…llvm#84382) Account for the descriptor containing a zero-length string. Also, avoid iterating backwards too far. This was detected by address sanitizer.

Revert "[CMake][LIT] Add option to run lit testsuites in parallel" (l…

8846b91

…lvm#84813) Reverts llvm#82899 Per the discussion on the PR, this needs more design and justification.

[NFC] [scudo] move static_assert closer to class it relates to (llvm#…

b4e0890

…84257) delete other static_assert

[Clang][AST] Print attributes of Obj-C interfaces

a8eb2f0

When pretty printing an Objective-C interface declaration, Clang previously didn't print any attributes that are applied to the declaration.

[NFC] [scudo] Move static_assert to class it concerns (llvm#84245)

337a200

[LLDB] ASanLibsanitizers Use sanitizers_address_on_report breakpoint (

08a9207

llvm#84583) symbol This patch puts the default breakpoint on the sanitizers_address_on_report symbol, and uses the old symbol as a backup if the default case is not found rdar://123911522

lifengxiang1025 and others added 25 commits March 12, 2024 11:00

[MemProf] Match function's summary and definition strictly (llvm#83665)

e40cabf

Problem description: llvm#81008 (comment) Solution: llvm#81008 (comment) (choose plan2)

[MLIR][LSP][NFC] Fix a header guard (llvm#84862)

e4a5467

This header guard is wrong and conflicts with the one from Transport.h

[clang][Interp] Implement _Complex negation

d02d8df

Factor complex unary operations into their own function.

[X86][test] Add missing enc/dec tests for CTEST

71590e7

These tests were accidentally missed in llvm#83863

[flang] Fixed compiler build on glibc 2.17 systems after 3149c93. (ll…

f95710c

…vm#84873)

[AArch64][GlobalISel] Avoid generating inserts for undefs when select…

1d900e2

…ing G_BUILD_VECTOR (llvm#84452) It is safe to ignore undef values when selecting G_BUILD_VECTOR as undef values choose random registers for copying values from.

[clang][Interp] Implement _Complex Not unary operators

1dd104d

This only happens in C as far as I can tell. The complex varialbe will have undergone a conversion to bool in C++ before reaching the unary operator.

[clang][Interp] Implement more easy _Complex unary operators

103469b

[NVPTX] Add support for atomic add for f16 type (llvm#84295)

8e0f4b9

atom.add.noftz.f16 is supported since SM 7.0

[AMDGPU] Add missing GFX10 buffer format d16 hi instructions (llvm#84809

36dece0

)

[AArch64] Fix COMPILER_RT_HAS_AUXV for builtins. (llvm#84816)

9d16e79

COMPILER_RT_HAS_AUXV is used now in builtins so the test need to be in the builtin-config-ix.cmake too.

[InstSimpliy] Use range attribute to simplify comparisons (llvm#84627)

a3b5250

Use the new range attribute from llvm#84617 to simplify comparisons where both sides have range information.

[CMake] Add tablegen job pool support (llvm#84762)

9228859

Add the ability to set the number of tablegen jobs that can run in parallel similar to the LLVM_PARALLEL_[COMPILE|LINK]_JOBS options that already exist.

Update test past bdbad0d (llvm#84889)

ce1fd92

[AutoBump] Merge with 9997e03

10013b0

cferry-AMD approved these changes Aug 13, 2024

View reviewed changes

Base automatically changed from bump_to_818af71b to feature/fused-ops August 15, 2024 07:01

An error occurred while trying to automatically change base from bump_to_818af71b to feature/fused-ops August 15, 2024 07:01

mgehre-amd merged commit 6b89ba9 into feature/fused-ops Aug 15, 2024
10 checks passed

mgehre-amd deleted the bump_to_9997e039 branch August 15, 2024 07:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 9997e039 (5) #252

[AutoBump] Merge with 9997e039 (5) #252

mgehre-amd commented Aug 13, 2024

[AutoBump] Merge with 9997e039 (5) #252

[AutoBump] Merge with 9997e039 (5) #252

Conversation

mgehre-amd commented Aug 13, 2024