[AutoBump] Merge with cc04bbb2 (Jun 11) (70) #334

mgehre-amd · 2024-09-10T12:20:53Z

No description provided.

…reTable (llvm#95082) The wasm backend fetches the tan runtime lib call in `llvm/include/llvm/IR/RuntimeLibcalls.def` via `StaticLibcallNameMap()`, but ignores the runtime function because a function sinature mapping is not specified in RuntimeLibcallSignatureTable(). The fix is to specify the function signatures for float32-128. This is a fix for a build break reported on PR llvm#94559 (comment).

Following a rather direct approach to expose PDL usage from C and then Python. This doesn't yes plumb through adding support for custom matchers through this interface, so constrained to basics initially. This also exposes greedy rewrite driver. Only way currently to define patterns is via PDL (just to keep small). The creation of the PDL pattern module could be improved to avoid folks potentially accessing the module used to construct it post construction. No ergonomic work done yet. --------- Signed-off-by: Jacques Pienaar <[email protected]>

Otherwise this would fail when using gnuwin32.

llvm#95014) Part of llvm#93566.

…EG + SETO/SETNO (llvm#94948) For i64 this avoids loading a 64-bit value into register, for smaller registers this just avoids an immediate operand. For i8+i16, limit to one use case as we save fewer bytes and these can be wasted entirely on extra register moves. Fixes llvm#67709

Ignore the base and visit the Member decl like a regular DeclRefExpr.

Fragments are allocated with `operator new` and stored in an ilist with Prev/Next/Parent pointers. A more efficient representation would be an array of fragments without the overhead of Prev/Next pointers. As the first step, replace ilist with singly-linked lists. * `getPrevNode` uses have been eliminated by previous changes. * The last use of the `Prev` pointer remains: for each subsection, there is an insertion point and the current insertion point is stored at `CurInsertionPoint`. * `HexagonAsmBackend::finishLayout` needs a backward iterator. Save all fragments within `Frags`. Hexagon programs are usually small, and the performance does not matter that much. To eliminate `Prev`, change the subsection representation to singly-linked lists for subsections and a pointer to the active singly-linked list. The fragments from all subsections will be chained together at layout time. Since fragment lists are disconnected before layout time, we can remove `MCFragment::SubsectionNumber` (https://reviews.llvm.org/D69411). The current implementation of `AttemptToFoldSymbolOffsetDifference` requires future improvement for robustness. Pull Request: llvm#95077

Like many other tests, this one times out when run under the address sanitizer. To reduce noise, this commit skips it in those builds.

These tests pass on Linux using lit's internal shell.

…5040) Co-authored-by: David Parks <[email protected]>

In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC. There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction(). 1) Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass(). 2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler. Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph. Compile time: https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u

We hit this downstream and the only evidence of the mistake was that the results of `Find` on `SubtargetFeatureKV` were corrupted.

…5076) The character reduce runtime functions expect a pointer to a scalar character of the correct length for the result of character reduce. A descriptor was passed so far. Fix the lowering so a proper temporary is created and passed to the runtime.

… check (llvm#94920) Before this PR, clangd forcefully disabled misc-const-correctness in disableUnusableChecks(). Now we have a FastCheckFilter configuration whose default value (Strict) also disables it. This patch removes misc-const-correctness from disableUnusableChecks() so it's possible to enable by setting FastCheckFilter to None. Fixes llvm#89758

Remove old usages of GDB Index functions after replacing them with new ones.

`GetDeclContextDIEs` and `DIEDeclContextsMatch` are unused (possibly since we added support for simplified template names, but I haven't checked). `GetDeclContextDIEs` is also very similar (but subtly different) from `GetDeclContext` and `GetTypeLookupContext`. I am keeping `GetParentDeclContextDIE` as that one still has some callers, but I want to look into the possibility of merging it with at least one of the functions mentioned above.

.altinstructions section contains a list of structures where fields can have different sizes while other fields could be present or not depending on the kernel version. Add automatic detection of such variations and use it by default. The user can still overwrite the automatic detection with `--alt-inst-has-padlen` and `--alt-inst-feature-size` options.

This change makes sure the preferred switch condition int type size remains the same throughout CodeGen optimizations. The change fixes running several OpenCL applications with -O2 or higher opt levels, and fixes Basic/stream/stream_max_stmt_exceed.cpp DPC++ E2E test with -O2.

`convertCallToIndirectCall` applies the PLTCall optimization and returns an (updated if needed) iterator to the converted call instruction. Since AArch64 requires to inject additional instructions to implement this pass, the relevant BasicBlock and an iterator was passed to the `convertCallToIndirectCall`. `NumCallsOptimized` is updated only on successful application of the pass. Tests: - Inputs/plt-tailcall.c: an example of a tail call optimized PLT call. - AArch64/plt-call.test: it is the actual A64 test, that runs the PLTCall optimization on the above input file and verifies the application of the pass to the calls: 'printf' and 'puts'.

Avoid wastefully setting CanVecMem in several places in analyzeLoop, complicating the logic, to get the function to return a bool, and set CanVecMem in the caller.

std::list default-constructs itself as an empty list, so we don't need to call ValueData.clear() in the constructor.

…pointee types (llvm#94952) This PR is a tweak to ensure that DuplicatesTracker is working with TypedPointers pointee types rather than with original llvm's untyped pointers. This enforces DuplicatesTracker promise to avoid emission of several identical OpTypePointer instructions.

…tions (llvm#95055) This PR implements insertion of OpGenericCastToPtr using builtin functions (both opencl `to_global|local|private` and `__spirv_` wrappers), and improves type inference.

…#95054) As stated in `UnwindInfoSectionImpl::prepareRelocations`'s comments, the unwind info uses section+addend relocations for personality functions defined in the same file as the function itself. As personality functions are always accessed via the GOT, we need to resolve those to a symbol. Previously, we did this by keeping a map which resolves these to symbols, creating a synthetic symbol if we didn't find it in the map. This approach has an issue: if we process the object file containing the personality function before any external uses, the entry in the map remains unpopulated, so we create a synthetic symbol and a corresponding GOT entry. If we encounter a relocation to it in a later file which requires GOT (such as in `__eh_frame`), we add that symbol to the GOT, too, effectively creating two entries which point to the same piece of code. This commit fixes that by searching the personality function's section for a symbol at that offset which already has a GOT entry, and only creating a synthetic symbol if there is none. As all non-unwind sections are already processed by this point, it ensures no duplication. This should only really affect our tests (and make them clearer), as personality functions are usually defined in platform runtime libraries. Or even if they are local, they are likely not in the first object file to be linked.

This PR improves legalization process of SPIR-V instructions. Namely, it introduces validation and fixing of bit width of scalar registers as a part of pre-legalizer. A test case is added that demonstrates ability to legalize instructions with non 8/16/32/64 bit width both with and without vendor-specific SPIR-V extension (SPV_INTEL_arbitrary_precision_integers). In the case of absence of the extension, a generated SPIR-V code will fallback to 8/16/32/64 bit width in OpTypeInt, but SPIR-V Backend still is able to legalize operations with original integer sizes.

…ithTypeAndScope (llvm#95146) `thread step-in` (and other step commands) take a `<thread-index>`, not a `<thread-id>`.

…m#94996) This avoids breaking code that should arguably be valid but technically isn't after enforcing the constraints on shared_ptr's constructors. A new LWG issue was filed to fix this in the Standard. This patch applies the expected resolution of this issue to avoid flip-flopping users whose code should always be considered valid. See llvm#93071 for more context.

Instead of hardcoding a loop for small strings, always call char_traits::compare which ends up desugaring to __builtin_memcmp. Note that the original code dates back 11 years, when we didn't lower to intrinsics in `char_traits::compare`. Fixes llvm#94222

It looks like the last references got removed in c747bd0. It removed a __zero() function, which was probably created at some point in the ancient past to optimize copying the string representation. The __zero() function got simplified to an assignment as part of making string constexpr, rendering this code unnecessary.

llvm#94846) The function that calculated the declaration context for a DIE was incorrectly transparently traversing acrosss DW_TAG_subprogram dies when climbing the parent DIE chain. This meant that types defined in functions would appear to have the declaration context of anything above the function. I fixed the GetTypeLookupContextImpl(...) function in DWARFDIE.cpp to not transparently skip over functions, lexical blocks and inlined functions and compile and type units. Added a test to verify things are working.

These are the HLSL specific fixes from llvm#93193. Thanks klensy!

Some of the options only fed into the full sparse pipeline. However, some backends prefer to use the sparse minipipeline. This change exposes some important optimization flags to the pass as well. This prepares some SIMDization of PyTorch sparsified code.

…ereferencing pointer to pointers. llvm#94100" (llvm#95174) The option is causing the binary output to be different when compiled under `-O0`, because it introduce dbg.declare on pseudovariables. Going to change this implementation to use dbg.value instead.

BytesInBG is always greater or equal to BG->BytesInBGAtLastCheckpoint. Note that the bug led to unnecessary attempts of page releasing and doesn't have critical impact on the correctness.

farzonl and others added 30 commits June 11, 2024 10:43

Restore 'REQUIRES: shell' for another test after 878deae

b746bab

Otherwise this would fail when using gnuwin32.

[libc][math][c23] Add {totalorder,totalordermag}f16 C23 math functions (

f5dcfb9

llvm#95014) Part of llvm#93566.

[HWASan] make get_info.local_time.pass.cpp UNSUPPORTED

00c5474

[clang][Interp] Fix visiting non-FieldDecl MemberExprs

4cf607f

Ignore the base and visit the Member decl like a regular DeclRefExpr.

[lldb] Skip TestAttachDenied under asan

2e007b8

Like many other tests, this one times out when run under the address sanitizer. To reduce noise, this commit skips it in those builds.

[test] Skip some tests on Windows only (llvm#95095)

5ccdce9

These tests pass on Linux using lit's internal shell.

[flang] Add runtime support for Fortran intrinsic ERFC_SCALED (llvm#9…

a03e93e

…5040) Co-authored-by: David Parks <[email protected]>

[Tablegen][NFC] Add a check for duplicate features (llvm#94223)

41f81ad

We hit this downstream and the only evidence of the mistake was that the results of `Find` on `SubtargetFeatureKV` were corrupted.

[BOLT][DWARF][NFC] Remove old GDB Index functions (llvm#95019)

727ecbe

Remove old usages of GDB Index functions after replacing them with new ones.

[clang] Fix a few comment typos to cycle bots

9b4f8ac

LAA: refactor analyzeLoop to return bool (NFC) (llvm#93824)

18a8983

Avoid wastefully setting CanVecMem in several places in analyzeLoop, complicating the logic, to get the function to return a bool, and set CanVecMem in the caller.

[ProfileData] Simplify InstrProfValueSiteRecord (NFC) (llvm#95143)

3af3525

std::list default-constructs itself as an empty list, so we don't need to call ValueData.clear() in the constructor.

[mlir][sparse] implement sparse space collapse pass. (llvm#89003)

c6d85ba

[SPIR-V] Implement insertion of OpGenericCastToPtr using builtin func…

5752098

…tions (llvm#95055) This PR implements insertion of OpGenericCastToPtr using builtin functions (both opencl `to_global|local|private` and `__spirv_` wrappers), and improves type inference.

[lldb] Fix declaration of thread argument in CommandObjectThreadStepW…

982b4b6

…ithTypeAndScope (llvm#95146) `thread step-in` (and other step commands) take a `<thread-index>`, not a `<thread-id>`.

ldionne and others added 9 commits June 11, 2024 16:48

[libc++] Update with LWG issue number for shared-ptr constructor

f638f7b

[HLSL] Fix FileCheck annotation typos (llvm#95155)

c6ee562

These are the HLSL specific fixes from llvm#93193. Thanks klensy!

[scudo] Fix the calculation of PushedBytesDelta (llvm#95177)

cc04bbb

BytesInBG is always greater or equal to BG->BytesInBGAtLastCheckpoint. Note that the bug led to unnecessary attempts of page releasing and doesn't have critical impact on the correctness.

[AutoBump] Merge with cc04bbb (Jun 11)

e36fc76

Base automatically changed from bump_to_d5863721 to feature/fused-ops September 11, 2024 12:07

cferry-AMD approved these changes Sep 11, 2024

View reviewed changes

mgehre-amd merged commit e6eae35 into feature/fused-ops Sep 11, 2024
11 checks passed

mgehre-amd deleted the bump_to_cc04bbb2 branch September 11, 2024 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with cc04bbb2 (Jun 11) (70) #334

[AutoBump] Merge with cc04bbb2 (Jun 11) (70) #334

mgehre-amd commented Sep 10, 2024

[AutoBump] Merge with cc04bbb2 (Jun 11) (70) #334

[AutoBump] Merge with cc04bbb2 (Jun 11) (70) #334

Conversation

mgehre-amd commented Sep 10, 2024