[AutoBump] Merge with 21ba91c4 (Jun 17) (81) #345

mgehre-amd · 2024-09-12T19:51:11Z

No description provided.

Update test due to llvm#91724

…95775) Saying that a call preserves $noreg seems weird and required a workaround in MachineLICM.

This PR adds debug support for allocatable. The allocatable arrays use the existing functionality to read the array information from descriptor. The allocatable for the scalar shows up as pointer to the scalar. While testing this, I notices that values of allocated and associated flags were swapped. This is also fixed in this PR. Here is how the debugging of the allocatable looks like with this patch in place. integer, allocatable :: ar1(:, :) real, allocatable :: sc allocate(sc) allocate(ar1(3, 4)) (gdb) ptype ar1 type = integer, allocatable (3,4) (gdb) p ar1 $1 = ((5, 6, 7) (9, 10, 11) (13, 14, 15) (17, 18, 19)) (gdb) p sc $2 = (PTR TO -> ( real )) 0x205300 (gdb) p *sc $3 = 3.1400001

... in preparation for llvm#92528

…bf16

…ro (llvm#95686) This is a follow-up to llvm#80282. The transitive includes of `<locale>` in `<vector>` were all guarded by the availability macro -- the new include should also be guarded, otherwise any users who compile with localization disabled will start getting errors trying to include `<vector>`.

This patch enables the -mlink-builtin-bitcode flag in fc1 so that bitcode libraries can be linked in. This is needed for OpenMP offloading libraries.

We can't preserve the context across a non-speculatable instruction, as this might introduce a trap. Alternatively, we could also insert all the replacement instruction at the use-site, but that would be a more intrusive change for the sake of this edge case. Fixes llvm#95547.

This patch improves the ROCDL gpu serialization API by: - Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN device code libraries to use during linking. - Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`. Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode librariesm now it will only load the requested libraries. - Exposing the `compileToBinary` method and making it virtual, allowing downstream users to re-use this method. - Exposing `moduleToObjectImpl`, this method provides a prototype flow for compiling to binary, allowing downstream users to re-use this method. - It also avoids constructing the control variables if no device libraries are being used. This patch also changes the behavior of the CMake flag `DEFAULT_ROCM_PATH`. Before it would fall back to a default value of `/opt/rocm` if not specified. However, that default value causes fragile builds in environments with ROCm. Now, the flag falls back to the empty string, making it clear that **the user must provide a value at LLVM build time**.

Share the implementation for floating-point complex-complex multiplication with the current interpreter. This means we need a new opcode for this, but there's no good way around that.

Reverts llvm#95456

When looking up through shuffles, a Value can be multiple different leaf types (for example an identity from one position, a splat from another). We currently detect this by recalculating which type of leaf it is when generating, but as more types of leafs are added (llvm#94954) this doesn't scale very well. This patch switches it to use Use, not Value, to more accurately detect which type of leaf each Use should have.

…flat (llvm#95394)" This reverts commit 95b77d9.

In review around llvm#94686, we had a discussion about a possible O0 specific miscompile case without test coverage. The particular case turned out not be possible to exercise in practice, but improving our test coverage remains a good idea if we're going to have differences in the dataflow with and without live intervals.

…#95800) This PR relands the commit reverted in llvm#95607. Fixes: - Now literals are only used for the indices of `vector.insert` and `vector.extract`. - `arith.constant` needs to be used for the `memref.load` and `memref.store` since otherwise there will be a failure to parse the input IR.

In addition to looking for dependent (input) PDB files next to the associated .OBJ file, we now also look into the output folder as well. This mimics MSVC link.exe behavior. Fixes llvm#94152

As noted in one of the existing comments, the job AVLIsIgnored was filing was really more of a demanded field role. Since we recently realized we can use the values of VL on MI even in the backwards pass, let's exploit that to improve demanded fields, and delete AVLIsIgnored. Note that the test change is a real regression, but only incidental to this patch. The backwards pass doesn't have the information that the VL following a VL-preserving vtype is non-zero. This is an existing problem, this patch just adds a few more cases where we prove vl-preserving is legal.

…alence class (llvm#95729) Fixes: llvm#95658 Unqualified canonical type should be used instead of normal QualType for type equality comparison

…elease script (llvm#95781) Before this fix, when building the Windows LLVM package with the latest cmake 3.29.3 I was seeing: ``` C:\git\llvm-project>llvm\utils\release\build_llvm_release.bat --version 19.0.0 --x64 --skip-checkout --local-python ... -- Looking for FE_INEXACT -- Looking for FE_INEXACT - found -- Performing Test HAVE_BUILTIN_THREAD_POINTER -- Performing Test HAVE_BUILTIN_THREAD_POINTER - Failed -- Looking for mach/mach.h -- Looking for mach/mach.h - not found -- Looking for CrashReporterClient.h -- Looking for CrashReporterClient.h - not found -- Looking for pfm_initialize in pfm -- Looking for pfm_initialize in pfm - not found -- Could NOT find ZLIB (missing: ZLIB_LIBRARY ZLIB_INCLUDE_DIR) CMake Error at C:/Program Files/CMake/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find LibXml2 (missing: LIBXML2_INCLUDE_DIR) Call Stack (most recent call first): C:/Program Files/CMake/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE) C:/Program Files/CMake/share/cmake-3.29/Modules/FindLibXml2.cmake:108 (FIND_PACKAGE_HANDLE_STANDARD_ARGS) cmake/config-ix.cmake:167 (find_package) CMakeLists.txt:921 (include) -- Configuring incomplete, errors occurred! ``` It looks like `LIBXML2_INCLUDE_DIRS` (with the extra 'S') is a result variable that is set by cmake after a call to `find_package(LibXml2)`. It is actually `LIBXML2_INCLUDE_DIR` (without the 'S') that shold be used as a input before the `find_package` call, since the 'S' variable is unconditionally overwritten, see https://github.com/Kitware/CMake/blob/master/Modules/FindLibXml2.cmake#L96. I am unsure exactly why that worked with older cmake versions.

To fix CI after llvm#93712 landed.

@ayalz

splitBlock will create a unconditional branch between the middle block and scalar preheader. Instead of creating and replacing the same branch again when scalar epilogue is needed, simply add an early exit. As suggested by @ayalz in llvm#92651 to clarify the existing code.

…m#94632) Currently we use DW_OP_plus_uconst to handle the bitfield offset and handle the bitfield size by choosing a type size that matches, but this doesn't work if either offset or size aren't byte-aligned. Extracting the bits using DW_OP_LLVM_extract_bits means we can handle any kind of offset or size.

For DO CONCURRENT REDUCE, every nested loop should have a REDUCE clause so that we can lower reduction without analysis.

…#95623) When Jason was looking into the issue caused by llvm#95606 he suggested using the Checksum from the original file in LineEntry. I like the idea because it makes sense semantically, but also allows us to get rid of the Update method and ensures we make a new copy, in case someone else is holding onto the old SupportFile.

…NFC. (llvm#95808)

Since clang-format 18.1.4, there have been a number of commits that fixed various kinds of issues: - Bug 3ceccbd - Regression 6dbaa89 51ff7f3 35fea10 7699b34 768118d 8c0fe0d - Crash f1491c7 - Invalid code generation 0abb89a

In a similar manner as in https://reviews.llvm.org/D133494 use `TBL` to place bytes in the *upper* part of `i32` elements and then convert to float using fixed-point `scvtf`, i.e. scvtf Vd.4s, Vn.4s, #24

…ding handling. Move load/store folding 'free costs' inside the adjustTableCost helper so we can some additional intrinsics in the future. The plan is to do something similar for other costs callbacks as well (getArithmeticInstrCost etc.).

…avg(x, y)) folds m_BinOp doesn't need a compile time opcode - so we can merge these into signed/unsigned cases.

More reliably detect whether the API tests are running in a virtual environment by comparing sys.prefix and sys.base_prefix [1]. [1] https://docs.python.org/3/library/sys.html#sys.base_prefix

… invalid pointer offset computation (llvm#95479) Fixes llvm#95366

…-pedantic` (llvm#95762)

Used to implement CWG2191 where `typeid` for a polymorphic glvalue only becomes potentially-throwing if the `typeid` operand was already potentially throwing or a `nullptr` check was inserted: https://cplusplus.github.io/CWG/issues/2191.html Also change `Expr::hasSideEffects` for `CXXTypeidExpr` to check the operand for side-effects instead of always reporting that there are side-effects Remove `IsDeref` parameter of `CGCXXABI::shouldTypeidBeNullChecked` because it should never return `true` if `!IsDeref` (we shouldn't add a null check that wasn't there in the first place)

`icmp ult (add X, C2), C` can be folded to `icmp ne (and X, C), 2C`, subject to `C == -C2` and C2 being a power of 2. Proofs: https://alive2.llvm.org/ce/z/P-VVmQ. Fixes: llvm#75613.

…#95542) The algorithm added by PR llvm#87375 can be potentially quadratic in the number of anchors. This is almost never a problem because normally functions have a reasonable number of function calls. However, in some rare cases of auto-generated code we observed very large functions that trigger quadratic behaviour here (resulting in >130GB of peak heap memory usage for clang). Let's add a knob for controlling the max number of callsites in a function above which stale profile matching won't be performed.

john-brawn-arm and others added 30 commits June 17, 2024 13:38

[DebugInfo] Update sroa-extract-bits.ll test (llvm#95774)

fb59d9b

Update test due to llvm#91724

[CodeGen] Do not include $noreg in any regmask operands. NFCI. (llvm#…

457e895

…95775) Saying that a call preserves $noreg seems weird and required a workaround in MachineLICM.

[AArch64] Refactor creation of a shuffle mask for TBL (NFC) (llvm#92529)

96e8d0f

... in preparation for llvm#92528

AMDGPU: Fix legalization for llvm.amdgcn.struct.buffer.atomic.fadd.v2…

405882d

…bf16

[flang] Add -mlink-builtin-bitcode option to fc1 (llvm#94763)

b75e7c6

This patch enables the -mlink-builtin-bitcode flag in fc1 so that bitcode libraries can be linked in. This is needed for OpenMP offloading libraries.

[InstCombine] Add test for llvm#95547 (NFC)

7767f0d

[clang][Interp] Implement Complex-complex multiplication (llvm#94891)

4bf160e

Share the implementation for floating-point complex-complex multiplication with the current interpreter. This means we need a new opcode for this, but there's no good way around that.

Revert [mlir][Target] Improve ROCDL gpu serialization API (llvm#95790)

57b8be4

Reverts llvm#95456

Reapply "AMDGPU: Handle legal v2f16/v2bf16 atomicrmw fadd for global/…

4cf1a19

…flat (llvm#95394)" This reverts commit 95b77d9.

[TableGen][Docs] Fix !range markup (llvm#95540)

c659e3a

[LLD][COFF] Support finding pdb files from outputpath (llvm#94153)

c11677e

In addition to looking for dependent (input) PDB files next to the associated .OBJ file, we now also look into the output folder as well. This mimics MSVC link.exe behavior. Fixes llvm#94152

[clang][analyzer] use unqualified canonical type during merging equiv…

0851d7b

…alence class (llvm#95729) Fixes: llvm#95658 Unqualified canonical type should be used instead of normal QualType for type equality comparison

[lldb] Add packaging to testing requirements.txt (llvm#95806)

13c6638

To fix CI after llvm#93712 landed.

[flang] Add a REDUCE clause to each nested loop (llvm#95555)

85f4593

For DO CONCURRENT REDUCE, every nested loop should have a REDUCE clause so that we can lower reduction without analysis.

[AMDGPU] Move FeatureMaxHardClauseLength32 into FeatureISAVersion12. …

e577f96

…NFC. (llvm#95808)

[AArch64] Lower extending sitofp using tbl (llvm#92528)

d1a4f0c

In a similar manner as in https://reviews.llvm.org/D133494 use `TBL` to place bytes in the *upper* part of `i32` elements and then convert to float using fixed-point `scvtf`, i.e. scvtf Vd.4s, Vn.4s, #24

RKSimon and others added 9 commits June 17, 2024 18:01

[DAG] visitAVG - avoid duplication in the avg(ext(x), ext(y)) -> ext(…

7e3507e

…avg(x, y)) folds m_BinOp doesn't need a compile time opcode - so we can merge these into signed/unsigned cases.

[lldb] More reliably detect a virtual environment

a1994ae

More reliably detect whether the API tests are running in a virtual environment by comparing sys.prefix and sys.base_prefix [1]. [1] https://docs.python.org/3/library/sys.html#sys.base_prefix

[Clang] Disallow non-lvalue values in constant expressions to prevent…

2ebe479

… invalid pointer offset computation (llvm#95479) Fixes llvm#95366

[NFC] Refactor [[nodiscard]] test to not use macros and run under `…

4447e25

…-pedantic` (llvm#95762)

[InstCombine] Canonicalize icmp ult (add X, C2), C expressions

a4b44c0

`icmp ult (add X, C2), C` can be folded to `icmp ne (and X, C), 2C`, subject to `C == -C2` and C2 being a power of 2. Proofs: https://alive2.llvm.org/ce/z/P-VVmQ. Fixes: llvm#75613.

[mlir][sparse] implement lowering rules for IterateOp. (llvm#95286)

3a2e442

[AutoBump] Merge with 21ba91c (Jun 17)

75073a8

cferry-AMD approved these changes Sep 13, 2024

View reviewed changes

Base automatically changed from bump_to_3cead572 to feature/fused-ops September 16, 2024 10:59

mgehre-amd merged commit 75073a8 into feature/fused-ops Sep 16, 2024
10 checks passed

mgehre-amd deleted the bump_to_21ba91c4 branch September 16, 2024 10:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 21ba91c4 (Jun 17) (81) #345

[AutoBump] Merge with 21ba91c4 (Jun 17) (81) #345

mgehre-amd commented Sep 12, 2024

[AutoBump] Merge with 21ba91c4 (Jun 17) (81) #345

[AutoBump] Merge with 21ba91c4 (Jun 17) (81) #345

Conversation

mgehre-amd commented Sep 12, 2024