Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 21ba91c4 (Jun 17) (81) #345

Merged
merged 39 commits into from
Sep 16, 2024

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

john-brawn-arm and others added 30 commits June 17, 2024 13:38
…95775)

Saying that a call preserves $noreg seems weird and required a
workaround in MachineLICM.
This PR adds debug support for allocatable. The allocatable arrays use
the existing functionality to read the array information from
descriptor. The allocatable for the scalar shows up as pointer to the
scalar.

While testing this, I notices that values of allocated and associated
flags were swapped. This is also fixed in this PR.

Here is how the debugging of the allocatable looks like with this patch
in place.

integer, allocatable :: ar1(:, :)
real, allocatable :: sc

allocate(sc)
allocate(ar1(3, 4))

(gdb) ptype ar1
type = integer, allocatable (3,4)
(gdb) p ar1
$1 = ((5, 6, 7) (9, 10, 11) (13, 14, 15) (17, 18, 19)) (gdb) p sc
$2 = (PTR TO -> ( real )) 0x205300
(gdb) p *sc
$3 = 3.1400001
…ro (llvm#95686)

This is a follow-up to llvm#80282.
The transitive includes of `<locale>` in `<vector>` were all guarded by
the availability macro -- the new include should also be guarded,
otherwise any users who compile with localization disabled will start
getting errors trying to include `<vector>`.
This patch enables the -mlink-builtin-bitcode flag in fc1 so that
bitcode libraries can be linked in. This is needed for OpenMP offloading
libraries.
We can't preserve the context across a non-speculatable instruction,
as this might introduce a trap. Alternatively, we could also
insert all the replacement instruction at the use-site, but that
would be a more intrusive change for the sake of this edge case.

Fixes llvm#95547.
This patch improves the ROCDL gpu serialization API by:
- Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN
device code libraries to use during linking.
- Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`.
Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode
librariesm now it will only load the requested libraries.
- Exposing the `compileToBinary` method and making it virtual, allowing
downstream users to re-use this method.
- Exposing `moduleToObjectImpl`, this method provides a prototype flow
for compiling to binary, allowing downstream users to re-use this
method.
- It also avoids constructing the control variables if no device
libraries are being used.

This patch also changes the behavior of the CMake flag
`DEFAULT_ROCM_PATH`. Before it would fall back to a default value of
`/opt/rocm` if not specified. However, that default value causes fragile
builds in environments with ROCm. Now, the flag falls back to the empty
string, making it clear that **the user must provide a value at LLVM
build time**.
Share the implementation for floating-point complex-complex
multiplication with the current interpreter. This means we need a new
opcode for this, but there's no good way around that.
When looking up through shuffles, a Value can be multiple different leaf types
(for example an identity from one position, a splat from another). We currently
detect this by recalculating which type of leaf it is when generating, but as
more types of leafs are added (llvm#94954) this doesn't scale very well.

This patch switches it to use Use, not Value, to more accurately detect which
type of leaf each Use should have.
In review around llvm#94686, we had
a discussion about a possible O0 specific miscompile case without test
coverage. The particular case turned out not be possible to exercise in
practice, but improving our test coverage remains a good idea if we're
going to have differences in the dataflow with and without live intervals.
…#95800)

This PR relands the commit reverted in llvm#95607. 

Fixes:
- Now literals are only used for the indices of `vector.insert` and
`vector.extract`.
- `arith.constant` needs to be used for the `memref.load` and
`memref.store` since otherwise there will be a failure to parse the
input IR.
In addition to looking for dependent (input) PDB files next to the associated .OBJ file, we now also look into the output folder as well. This mimics MSVC link.exe behavior.

Fixes llvm#94152
As noted in one of the existing comments, the job AVLIsIgnored was
filing was really more of a demanded field role. Since we recently
realized we can use the values of VL on MI even in the backwards pass,
let's exploit that to improve demanded fields, and delete AVLIsIgnored.

Note that the test change is a real regression, but only incidental to
this patch. The backwards pass doesn't have the information that the VL
following a VL-preserving vtype is non-zero. This is an existing
problem, this patch just adds a few more cases where we prove
vl-preserving is legal.
…alence class (llvm#95729)

Fixes: llvm#95658
Unqualified canonical type should be used instead of normal QualType for
type equality comparison
…elease script (llvm#95781)

Before this fix, when building the Windows LLVM package with the latest
cmake 3.29.3 I was seeing:
```
C:\git\llvm-project>llvm\utils\release\build_llvm_release.bat --version 19.0.0 --x64 --skip-checkout --local-python
...
-- Looking for FE_INEXACT
-- Looking for FE_INEXACT - found
-- Performing Test HAVE_BUILTIN_THREAD_POINTER
-- Performing Test HAVE_BUILTIN_THREAD_POINTER - Failed
-- Looking for mach/mach.h
-- Looking for mach/mach.h - not found
-- Looking for CrashReporterClient.h
-- Looking for CrashReporterClient.h - not found
-- Looking for pfm_initialize in pfm
-- Looking for pfm_initialize in pfm - not found
-- Could NOT find ZLIB (missing: ZLIB_LIBRARY ZLIB_INCLUDE_DIR)
CMake Error at C:/Program Files/CMake/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find LibXml2 (missing: LIBXML2_INCLUDE_DIR)
Call Stack (most recent call first):
  C:/Program Files/CMake/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
  C:/Program Files/CMake/share/cmake-3.29/Modules/FindLibXml2.cmake:108 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
  cmake/config-ix.cmake:167 (find_package)
  CMakeLists.txt:921 (include)


-- Configuring incomplete, errors occurred!
```
It looks like `LIBXML2_INCLUDE_DIRS` (with the extra 'S') is a result
variable that is set by cmake after a call to `find_package(LibXml2)`.
It is actually `LIBXML2_INCLUDE_DIR` (without the 'S') that shold be
used as a input before the `find_package` call, since the 'S' variable
is unconditionally overwritten, see
https://github.com/Kitware/CMake/blob/master/Modules/FindLibXml2.cmake#L96.
I am unsure exactly why that worked with older cmake versions.
splitBlock will create a unconditional branch between the middle block
and scalar preheader. Instead of creating and replacing the same branch
again when scalar epilogue is needed, simply add an early exit.

As suggested by @ayalz in
llvm#92651 to clarify the existing
code.
…m#94632)

Currently we use DW_OP_plus_uconst to handle the bitfield offset and
handle the bitfield size by choosing a type size that matches, but this
doesn't work if either offset or size aren't byte-aligned. Extracting
the bits using DW_OP_LLVM_extract_bits means we can handle any kind of
offset or size.
For DO CONCURRENT REDUCE, every nested loop should have a REDUCE clause
so that we can lower reduction without analysis.
…#95623)

When Jason was looking into the issue caused by llvm#95606 he suggested
using the Checksum from the original file in LineEntry. I like the idea
because it makes sense semantically, but also allows us to get rid of
the Update method and ensures we make a new copy, in case someone else
is holding onto the old SupportFile.
Since clang-format 18.1.4, there have been a number of commits that
fixed various kinds of issues:

- Bug
3ceccbd

- Regression
6dbaa89
51ff7f3
35fea10
7699b34
768118d
8c0fe0d

- Crash
f1491c7

- Invalid code generation
0abb89a
In a similar manner as in https://reviews.llvm.org/D133494
use `TBL` to place bytes in the *upper* part of `i32` elements
and then convert to float using fixed-point `scvtf`, i.e.

    scvtf Vd.4s, Vn.4s, #24
…ding handling.

Move load/store folding 'free costs' inside the adjustTableCost helper so we can some additional intrinsics in the future.

The plan is to do something similar for other costs callbacks as well (getArithmeticInstrCost etc.).
RKSimon and others added 9 commits June 17, 2024 18:01
…avg(x, y)) folds

m_BinOp doesn't need a compile time opcode - so we can merge these into signed/unsigned cases.
More reliably detect whether the API tests are running in a virtual
environment by comparing sys.prefix and sys.base_prefix [1].

[1] https://docs.python.org/3/library/sys.html#sys.base_prefix
Used to implement CWG2191 where `typeid` for a polymorphic glvalue only
becomes potentially-throwing if the `typeid` operand was already
potentially throwing or a `nullptr` check was inserted:
https://cplusplus.github.io/CWG/issues/2191.html

Also change `Expr::hasSideEffects` for `CXXTypeidExpr` to check the
operand for side-effects instead of always reporting that there are
side-effects

Remove `IsDeref` parameter of `CGCXXABI::shouldTypeidBeNullChecked`
because it should never return `true` if `!IsDeref` (we shouldn't add a
null check that wasn't there in the first place)
`icmp ult (add X, C2), C` can be folded to `icmp ne (and X, C), 2C`,
subject to `C == -C2` and C2 being a power of 2.

Proofs: https://alive2.llvm.org/ce/z/P-VVmQ.

Fixes: llvm#75613.
…#95542)

The algorithm added by PR llvm#87375 can be potentially quadratic in the
number of anchors. This is almost never a problem because normally
functions have a reasonable number of function calls.

However, in some rare cases of auto-generated code we observed very
large functions that trigger quadratic behaviour here (resulting in
>130GB of peak heap memory usage for clang). Let's add a knob for
controlling the max number of callsites in a function above which stale
profile matching won't be performed.
Base automatically changed from bump_to_3cead572 to feature/fused-ops September 16, 2024 10:59
@mgehre-amd mgehre-amd merged commit 75073a8 into feature/fused-ops Sep 16, 2024
10 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_21ba91c4 branch September 16, 2024 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.