forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 21ba91c4 (Jun 17) (81) #345
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Update test due to llvm#91724
…95775) Saying that a call preserves $noreg seems weird and required a workaround in MachineLICM.
This PR adds debug support for allocatable. The allocatable arrays use the existing functionality to read the array information from descriptor. The allocatable for the scalar shows up as pointer to the scalar. While testing this, I notices that values of allocated and associated flags were swapped. This is also fixed in this PR. Here is how the debugging of the allocatable looks like with this patch in place. integer, allocatable :: ar1(:, :) real, allocatable :: sc allocate(sc) allocate(ar1(3, 4)) (gdb) ptype ar1 type = integer, allocatable (3,4) (gdb) p ar1 $1 = ((5, 6, 7) (9, 10, 11) (13, 14, 15) (17, 18, 19)) (gdb) p sc $2 = (PTR TO -> ( real )) 0x205300 (gdb) p *sc $3 = 3.1400001
... in preparation for llvm#92528
…ro (llvm#95686) This is a follow-up to llvm#80282. The transitive includes of `<locale>` in `<vector>` were all guarded by the availability macro -- the new include should also be guarded, otherwise any users who compile with localization disabled will start getting errors trying to include `<vector>`.
This patch enables the -mlink-builtin-bitcode flag in fc1 so that bitcode libraries can be linked in. This is needed for OpenMP offloading libraries.
We can't preserve the context across a non-speculatable instruction, as this might introduce a trap. Alternatively, we could also insert all the replacement instruction at the use-site, but that would be a more intrusive change for the sake of this edge case. Fixes llvm#95547.
This patch improves the ROCDL gpu serialization API by: - Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN device code libraries to use during linking. - Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`. Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode librariesm now it will only load the requested libraries. - Exposing the `compileToBinary` method and making it virtual, allowing downstream users to re-use this method. - Exposing `moduleToObjectImpl`, this method provides a prototype flow for compiling to binary, allowing downstream users to re-use this method. - It also avoids constructing the control variables if no device libraries are being used. This patch also changes the behavior of the CMake flag `DEFAULT_ROCM_PATH`. Before it would fall back to a default value of `/opt/rocm` if not specified. However, that default value causes fragile builds in environments with ROCm. Now, the flag falls back to the empty string, making it clear that **the user must provide a value at LLVM build time**.
Share the implementation for floating-point complex-complex multiplication with the current interpreter. This means we need a new opcode for this, but there's no good way around that.
When looking up through shuffles, a Value can be multiple different leaf types (for example an identity from one position, a splat from another). We currently detect this by recalculating which type of leaf it is when generating, but as more types of leafs are added (llvm#94954) this doesn't scale very well. This patch switches it to use Use, not Value, to more accurately detect which type of leaf each Use should have.
…flat (llvm#95394)" This reverts commit 95b77d9.
In review around llvm#94686, we had a discussion about a possible O0 specific miscompile case without test coverage. The particular case turned out not be possible to exercise in practice, but improving our test coverage remains a good idea if we're going to have differences in the dataflow with and without live intervals.
…#95800) This PR relands the commit reverted in llvm#95607. Fixes: - Now literals are only used for the indices of `vector.insert` and `vector.extract`. - `arith.constant` needs to be used for the `memref.load` and `memref.store` since otherwise there will be a failure to parse the input IR.
In addition to looking for dependent (input) PDB files next to the associated .OBJ file, we now also look into the output folder as well. This mimics MSVC link.exe behavior. Fixes llvm#94152
As noted in one of the existing comments, the job AVLIsIgnored was filing was really more of a demanded field role. Since we recently realized we can use the values of VL on MI even in the backwards pass, let's exploit that to improve demanded fields, and delete AVLIsIgnored. Note that the test change is a real regression, but only incidental to this patch. The backwards pass doesn't have the information that the VL following a VL-preserving vtype is non-zero. This is an existing problem, this patch just adds a few more cases where we prove vl-preserving is legal.
…alence class (llvm#95729) Fixes: llvm#95658 Unqualified canonical type should be used instead of normal QualType for type equality comparison
…elease script (llvm#95781) Before this fix, when building the Windows LLVM package with the latest cmake 3.29.3 I was seeing: ``` C:\git\llvm-project>llvm\utils\release\build_llvm_release.bat --version 19.0.0 --x64 --skip-checkout --local-python ... -- Looking for FE_INEXACT -- Looking for FE_INEXACT - found -- Performing Test HAVE_BUILTIN_THREAD_POINTER -- Performing Test HAVE_BUILTIN_THREAD_POINTER - Failed -- Looking for mach/mach.h -- Looking for mach/mach.h - not found -- Looking for CrashReporterClient.h -- Looking for CrashReporterClient.h - not found -- Looking for pfm_initialize in pfm -- Looking for pfm_initialize in pfm - not found -- Could NOT find ZLIB (missing: ZLIB_LIBRARY ZLIB_INCLUDE_DIR) CMake Error at C:/Program Files/CMake/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:230 (message): Could NOT find LibXml2 (missing: LIBXML2_INCLUDE_DIR) Call Stack (most recent call first): C:/Program Files/CMake/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE) C:/Program Files/CMake/share/cmake-3.29/Modules/FindLibXml2.cmake:108 (FIND_PACKAGE_HANDLE_STANDARD_ARGS) cmake/config-ix.cmake:167 (find_package) CMakeLists.txt:921 (include) -- Configuring incomplete, errors occurred! ``` It looks like `LIBXML2_INCLUDE_DIRS` (with the extra 'S') is a result variable that is set by cmake after a call to `find_package(LibXml2)`. It is actually `LIBXML2_INCLUDE_DIR` (without the 'S') that shold be used as a input before the `find_package` call, since the 'S' variable is unconditionally overwritten, see https://github.com/Kitware/CMake/blob/master/Modules/FindLibXml2.cmake#L96. I am unsure exactly why that worked with older cmake versions.
To fix CI after llvm#93712 landed.
splitBlock will create a unconditional branch between the middle block and scalar preheader. Instead of creating and replacing the same branch again when scalar epilogue is needed, simply add an early exit. As suggested by @ayalz in llvm#92651 to clarify the existing code.
…m#94632) Currently we use DW_OP_plus_uconst to handle the bitfield offset and handle the bitfield size by choosing a type size that matches, but this doesn't work if either offset or size aren't byte-aligned. Extracting the bits using DW_OP_LLVM_extract_bits means we can handle any kind of offset or size.
For DO CONCURRENT REDUCE, every nested loop should have a REDUCE clause so that we can lower reduction without analysis.
…#95623) When Jason was looking into the issue caused by llvm#95606 he suggested using the Checksum from the original file in LineEntry. I like the idea because it makes sense semantically, but also allows us to get rid of the Update method and ensures we make a new copy, in case someone else is holding onto the old SupportFile.
In a similar manner as in https://reviews.llvm.org/D133494 use `TBL` to place bytes in the *upper* part of `i32` elements and then convert to float using fixed-point `scvtf`, i.e. scvtf Vd.4s, Vn.4s, #24
…ding handling. Move load/store folding 'free costs' inside the adjustTableCost helper so we can some additional intrinsics in the future. The plan is to do something similar for other costs callbacks as well (getArithmeticInstrCost etc.).
…avg(x, y)) folds m_BinOp doesn't need a compile time opcode - so we can merge these into signed/unsigned cases.
More reliably detect whether the API tests are running in a virtual environment by comparing sys.prefix and sys.base_prefix [1]. [1] https://docs.python.org/3/library/sys.html#sys.base_prefix
… invalid pointer offset computation (llvm#95479) Fixes llvm#95366
Used to implement CWG2191 where `typeid` for a polymorphic glvalue only becomes potentially-throwing if the `typeid` operand was already potentially throwing or a `nullptr` check was inserted: https://cplusplus.github.io/CWG/issues/2191.html Also change `Expr::hasSideEffects` for `CXXTypeidExpr` to check the operand for side-effects instead of always reporting that there are side-effects Remove `IsDeref` parameter of `CGCXXABI::shouldTypeidBeNullChecked` because it should never return `true` if `!IsDeref` (we shouldn't add a null check that wasn't there in the first place)
`icmp ult (add X, C2), C` can be folded to `icmp ne (and X, C), 2C`, subject to `C == -C2` and C2 being a power of 2. Proofs: https://alive2.llvm.org/ce/z/P-VVmQ. Fixes: llvm#75613.
…#95542) The algorithm added by PR llvm#87375 can be potentially quadratic in the number of anchors. This is almost never a problem because normally functions have a reasonable number of function calls. However, in some rare cases of auto-generated code we observed very large functions that trigger quadratic behaviour here (resulting in >130GB of peak heap memory usage for clang). Let's add a knob for controlling the max number of callsites in a function above which stale profile matching won't be performed.
cferry-AMD
approved these changes
Sep 13, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.