forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 9997e039 (5) #252
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
For the singless and signed integers overloads exist, so that the width does not need to be specified as an argument. This adds the same for integers without checking for signedness.
…NDVB instruction names Matches the SSE variants (which has a 0 qualifier to indicate the xmm0 explicit dependency)
…tage When we copied the IceLake model from the SkylakeServer model we missed this diff Confirmed with uops.info and Agner
This includes capturing symbols for global variables, functions, classes, and templated defintions. As pre-determing what symbols are generated from C++ declarations can be non-trivial, InstallAPI only parses select declarations for symbol generation when parsing c++. For example, installapi only looks at explicit template instantiations or full template specializations, instead of general function or class templates, for symbol emittion.
Testcase shows miscompile when dropping disjoint flag from disjoint or during vectorization.
This allows sharing the LLVM version number in libc++.
Recently building libc++ requires building libunwind too. This updates the LLDB instructions. I noticed this recently and it was separately filed as llvm#84053
llvm#84154) If the `m_editor_status` is `EditorStatus::Editing`, PrintAsync clears the currently edited line. In some situations, the edited line is not saved. After the stream flushes, PrintAsync tries to display the unsaved line, causing the loss of the edited line. The issue arose while I was debugging REPRLRun in [Fuzzilli](https://github.com/googleprojectzero/fuzzilli). I started LLDB and attempted to set a breakpoint in libreprl-posix.c. I entered `breakpoint set -f lib` and used the "tab" key for command completion. After completion, the edited line was flushed, leaving a blank line.
Introduced by llvm#83441.
Walter Erquinigo added optional instruction annotations for x86 instructions in 2022 for the `thread trace dump instruction` command, and code to DisassemblerLLVMC to add annotations for instructions that change flow control, v. https://reviews.llvm.org/D128477 This was added as an option to `disassemble`, and the trace dump command enables it by default, but several other instruction dumpers were changed to display them by default as well. These are only implemented for Intel instructions, so our disassembly on other targets ends up looking like ``` (lldb) x/5i 0x1000086e4 0x1000086e4: 0xa9be6ffc unknown stp x28, x27, [sp, #-0x20]! 0x1000086e8: 0xa9017bfd unknown stp x29, x30, [sp, #0x10] 0x1000086ec: 0x910043fd unknown add x29, sp, #0x10 0x1000086f0: 0xd11843ff unknown sub sp, sp, #0x610 0x1000086f4: 0x910c63e8 unknown add x8, sp, #0x318 ``` instead of `disassemble`'s output style of ``` lldb`main: lldb[0x1000086e4] <+0>: stp x28, x27, [sp, #-0x20]! lldb[0x1000086e8] <+4>: stp x29, x30, [sp, #0x10] lldb[0x1000086ec] <+8>: add x29, sp, #0x10 lldb[0x1000086f0] <+12>: sub sp, sp, #0x610 lldb[0x1000086f4] <+16>: add x8, sp, #0x318 ``` Adding symbolic annotations for assembly instructions is something I'm interested in too, because we may have users investigating a crash or apparent-incorrect behavior who must debug optimized assembly and they may not be familiar with the ISA they're using, so short of flipping through a many-thousand-page PDF to understand each instruction, they're lost. They don't write assembly or work at that level, but to understand a bug, they have to understand what the instructions are actually doing. But the annotations that exist today don't move us forward much on that front - I'd argue that the flow control instructions on Intel are not hard to understand from their names, but that might just be my personal bias. Much trickier instructions exist in any event. Displaying this information by default for all targets when we only have one class of instructions on one target is not a good default. Also, in 2011 when Greg implemented the `memory read -f i` (aka `x/i`) command ``` commit 5009f9d Author: Greg Clayton <[email protected]> Date: Thu Oct 27 17:55:14 2011 +0000 [...] eFormatInstruction will print out disassembly with bytes and it will use the current target's architecture. The format character for this is "i" (which used to be being used for the integer format, but the integer format also has "d", so we gave the "i" format to disassembly), the long format is "instruction". ``` he had DumpDataExtractor's DumpInstructions print the bytes of the instruction -- that's the first field we see above for the `x/5i` after the address -- and this is only useful for people who are debugging the disassembler itself, I would argue. I don't want this displayed by default either. tl;dr this patch removes both fields from `memory read -f -i` and I think this is the right call today. While I'm really interested in instruction annotation, I don't think `x/i` is the right place to have it enabled by default unless it's really compelling on at least some of our major targets.
This is a bit awkward.
This testcase was added to show miscompile in llvm#81872
This pull request fixes llvm#72116 where a new flag is introduced for compatibility with GCC 14, the functionality of -Wreturn-type is modified to split some of its behaviors into -Wreturn-mismatch Fixes llvm#72116
G_INSERT and G_EXTRACT are not sufficient to use to represent both INSERT/EXTRACT on a subregister and INSERT/EXTRACT on a vector. We would like to be able to INSERT/EXTRACT on vectors in cases that INSERT/EXTRACT on vector subregisters are not sufficient, so we add these opcodes. I tried to do a patch where we treated G_EXTRACT as both G_EXTRACT_SUBVECTOR and G_EXTRACT_SUBREG, but ran into an infinite loop at this [point](https://github.com/llvm/llvm-project/blob/8b5b294ec2cf876bc5eb5bd5fcb56ef487e36d60/llvm/lib/Target/RISCV/RISCVISelLowering.cpp#L9932) in the SDAG equivalent code.
Some comparison intrinsics were described as returning the "result" without specifying how. The "cmp" intrinsics return zero or all 1's in the corresponding elements of a returned vector; the "com" intrinsics return an integer 0 or 1. Also removed some redundant information.
…84717) This is for cased that we simply want to rename from ps.Mnemonic, but ps.Mnemonic itself is not supported as an alias.
…llvm#84382) Account for the descriptor containing a zero-length string. Also, avoid iterating backwards too far. This was detected by address sanitizer.
…lvm#84813) Reverts llvm#82899 Per the discussion on the PR, this needs more design and justification.
…84257) delete other static_assert
When pretty printing an Objective-C interface declaration, Clang previously didn't print any attributes that are applied to the declaration.
llvm#84583) symbol This patch puts the default breakpoint on the sanitizers_address_on_report symbol, and uses the old symbol as a backup if the default case is not found rdar://123911522
With llvm#83471 it reduces UBSAN overhead from 44% to 6%. Measured as "Geomean difference" on "test-suite/MultiSource/Benchmarks" with PGO build. On real large server binary we see 95% of code is still instrumented, with 10% -> 1.5% UBSAN overhead improvements. We can pass this test only with subset of UBSAN, so base overhead is smaller. We have followup patches to improve it even further.
Summary: The current behavior of HIP is that when --offload-device-only is set it still bundles the outputs into a fat binary. Even though this is different from how all the other targets handle this, it seems to be dependned on by some tooling so just make it backwards compatible for the `-fno-gpu-rdc` case.
The foreign TU list immediately follows the local TU list and they both use the same index, so that if there are N local TU entries, the index for the first foreign TU is N. Changed so that the size of local TU is accounted for when setting foreign TU index.
Problem description: llvm#81008 (comment) Solution: llvm#81008 (comment) (choose plan2)
This header guard is wrong and conflicts with the one from Transport.h
…m#78876) Previously, `__bounded_iter` only checked `operator*`. It allowed the pointer to go out of bounds with `operator++`, etc., and relied on `operator*` (which checked `begin <= current < end`) to handle everything. This has several unfortunate consequences: First, pointer arithmetic is UB if it goes out of bounds. So by the time `operator*` checks, it may be too late and the optimizer may have done something bad. Checking both operations is safer. Second, `std::copy` and friends currently bypass bounded iterator checks. I think the only hope we have to fix this is to key on `iter + n` doing a check. See llvm#78771 for further discussion. Note this PR is not sufficient to fix this. It adds the output bounds check, but ends up doing it after the `memmove`, which is too late. Finally, doing these checks is actually *more* optimizable. See llvm#78829, which is fixed by this PR. Keeping the iterator always in bounds means `operator*` can rely on some invariants and only needs to check `current != end`. This aligns better with common iterator patterns, which use `!=` instead of `<`, so it's easier to delete checks with local reasoning. See https://godbolt.org/z/vEWrWEf8h for how this new `__bounded_iter` impacts compiler output. The old `__bounded_iter` injected checks inside the loops for all the `sum()` functions, which not only added a check inside a loop, but also impeded Clang's vectorization. The new `__bounded_iter` allows all the checks to be optimized out and we emit the same code as if it wasn't here. Not everything is ideal however. `add_and_deref` ends up emitting two comparisons now instead of one. This is because a missed optimization in Clang. I've filed llvm#78875 for that. I suspect (with no data) that this PR is still a net performance win because impeding ranged-for loops is particularly egregious. But ideally we'd fix the optimizer and make `add_and_deref` fine too. There's also something funny going on with `std::ranges::find` which I have not yet figured out yet, but I suspect there are some further missed optimization opportunities. Fixes llvm#78829. (CC @danakj)
Factor complex unary operations into their own function.
These tests were accidentally missed in llvm#83863
The DIV32/64 throughput was improved since Goldmont in the Atom architecture. The Alder Lake-E shows similar number too. So we shouldn't add such tunings to Gracemont and later products. Checked from Agner Fog's table and uops.info.
…ing G_BUILD_VECTOR (llvm#84452) It is safe to ignore undef values when selecting G_BUILD_VECTOR as undef values choose random registers for copying values from.
This only happens in C as far as I can tell. The complex varialbe will have undergone a conversion to bool in C++ before reaching the unary operator.
For non polymorphic entities, semantics knows the type size and rewrite sizeof to `"cst element size" * size(x)`. Lowering has to deal with the polymorphic case where the type size must be retrieved from the descriptor (note that the lowering implementation would work with any entity, polymorphic on not, it is just not used for the non polymorphic cases).
atom.add.noftz.f16 is supported since SM 7.0
This patch adds the thread ID to the subprocess memory shared memory names. This avoids conflicts for downstream consumers that might want to consume llvm-exegesis across multiple threads, which would otherwise run into conflicts due to the same PID running multiple instances.
…vm#84451)" This reverts commit 6bbe8a2. This breaks building LLVM on macOS, failing with llvm/tools/llvm-exegesis/lib/SubprocessMemory.cpp:146:33: error: out-of-line definition of 'setupAuxiliaryMemoryInSubprocess' does not match any declaration in 'llvm::exegesis::SubprocessMemory' Expected<int> SubprocessMemory::setupAuxiliaryMemoryInSubprocess(
…her (llvm#84339) At the moment, getUnderlyingObjects simply continues for phis that do not refer to the same underlying object in loops, without adding them to the list of underlying objects, effectively ignoring those phis. Instead of ignoring those phis, add them to the list of underlying objects. This fixes a miscompile where LoopAccessAnalysis fails to identify a memory dependence, because no underlying objects can be found for a set of memory accesses. Fixes llvm#82665. PR: llvm#84339
A mold argument need to be added to the hlfir.element_addr and set in lowering so that when the hlfir.element_addr need to be turned into an hlfir.elemental operation because the designator must be turned into a value, the mold can be set on the hlfir.elemental to later allocate the temporary according the the dynamic type. This situation happens whenever the vector subscripted polymorphic designator does not appear as an assignment left-hand side, or as an IO-input item. I initially thought retrieving the mold would be tricky if the dynamic type of the designator was set by a part-ref of the right of the vector subscripts ("array(vector)%polymorphic_comp"), but this turned out to be impossible because: 1. A derived type component can be polymorphic only if it has the POINTER or ALLOCATABLE attribute (F2023 C708). 2. Vector-subscripted part are ranked and F2023 C919 prohibits any part-ref on the right of the rank part to have the POINTER or ALLOCATABLE attribute. => If a vector subscripted designator is polymorphic, the vector subscripted part is the rightmost part, and the mold is the base of the vector subscripted part. This makes the retrieval of the mold easy in lowering. The mold argument is always set to be the base of the vector subscripted part when lowering the vector subscripted part, and it is removed at the end of the designator lowering if the designator is not polymorphic. This way there is no need to find back the mold from the inside of the hlfir.element_addr body.
COMPILER_RT_HAS_AUXV is used now in builtins so the test need to be in the builtin-config-ix.cmake too.
This is a one line fix for a Windows specific (I believe) build break. The build failure looks like this: `D:\a\_work\1\s\lldb\source\Symbol\Symtab.cpp(128): error C2440: '<function-style-cast>': cannot convert from 'lldb_private::ConstString' to 'llvm::StringRef' D:\a\_work\1\s\lldb\source\Symbol\Symtab.cpp(128): note: 'llvm::StringRef::StringRef': ambiguous call to overloaded function D:\a\_work\1\s\llvm\include\llvm/ADT/StringRef.h(840): note: could be 'llvm::StringRef::StringRef(llvm::StringRef &&)' D:\a\_work\1\s\llvm\include\llvm/ADT/StringRef.h(104): note: or 'llvm::StringRef::StringRef(std::string_view)' D:\a\_work\1\s\lldb\source\Symbol\Symtab.cpp(128): note: while trying to match the argument list '(lldb_private::ConstString)' D:\a\_work\1\s\lldb\source\Symbol\Symtab.cpp(128): error C2672: 'std::multimap<llvm::StringRef,const lldb_private::Symbol *,std::less<llvm::StringRef>,std::allocator<std::pair<const llvm::StringRef,const lldb_private::Symbol *>>>::emplace': no matching overloaded function found C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.37.32822\include\map(557): note: could be 'std::_Tree_iterator<std::_Tree_val<std::_Tree_simple_types<std::pair<const llvm::StringRef,const lldb_private::Symbol *>>>> std::multimap<llvm::StringRef,const lldb_private::Symbol *,std::less<llvm::StringRef>,std::allocator<std::pair<const llvm::StringRef,const lldb_private::Symbol *>>>::emplace(_Valty &&...)' ` The StringRef constructor here is intended to take a ConstString object, which I assume is implicitly converted to a std::string_view by compilers other than Visual Studio's. To fix the VS build I made the StringRef initialization more explicit, as you can see in the diff.
Use the new range attribute from llvm#84617 to simplify comparisons where both sides have range information.
The TranspBlocks set was used to cache aliasing decision for all processed loads in the parent loop. This is incorrect, because each load can access a different location, which means one load not being modified in a block doesn't translate to another load not being modified in the same block. All loads access the same underlying object, so we could perhaps use a location without size for all loads and retain the cache, but that would mean we loose precision. For now, just drop the cache. Fixes llvm#84807 PR: llvm#84835
Add the ability to set the number of tablegen jobs that can run in parallel similar to the LLVM_PARALLEL_[COMPILE|LINK]_JOBS options that already exist.
…#84739) Have DIBuilder conditionally insert either debug intrinsics or DbgRecord depending on the module's IsNewDbgInfoFormat flag. The insertion methods now return a `DbgInstPtr` (a `PointerUnion<Instruction *, DbgRecord *>`). Add a unittest for both modes (I couldn't find an existing test testing insertion behaviours specifically). This patch changes the existing assumption that DbgRecords are only ever inserted if there's an instruction to insert-before because clang currently inserts debug intrinsics while CodeGening (like any other instruction) meaning it'll try inserting to the end of a block without a terminator. We already have machinery in place to maintain the DbgRecords when a terminator is removed - these become "trailing DbgRecords" which are re-attached when a new instruction is inserted. All I've done is allow this state to occur while inserting DbgRecords too, i.e., it's not only removing terminators that causes this valid transient state, but inserting DbgRecords into incomplete blocks too. The C API will be updated in follow up patches. --- Note: this doesn't mean clang is emitting DbgRecords yet, because the modules it creates are still always in the old debug mode. That will come in a future patch.
cferry-AMD
approved these changes
Aug 13, 2024
An error occurred while trying to automatically change base from
bump_to_818af71b
to
feature/fused-ops
August 15, 2024 07:01
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.