Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 9997e039 (5) #252

Merged
merged 96 commits into from
Aug 15, 2024
Merged

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

jayfoad and others added 30 commits March 11, 2024 15:42
For the singless and signed integers overloads exist, so that the width
does not need to be specified as an argument. This adds the same for
integers without checking for signedness.
…NDVB instruction names

Matches the SSE variants (which has a 0 qualifier to indicate the xmm0 explicit dependency)
…tage

When we copied the IceLake model from the SkylakeServer model we missed this diff

Confirmed with uops.info and Agner
This includes capturing symbols for global variables, functions,
classes, and templated defintions. As pre-determing what symbols are
generated from C++ declarations can be non-trivial, InstallAPI only
parses select declarations for symbol generation when parsing c++.

For example, installapi only looks at explicit template instantiations
or full template specializations, instead of general function or class
templates, for symbol emittion.
Testcase shows miscompile when dropping disjoint flag from disjoint or
during vectorization.
This allows sharing the LLVM version number in libc++.
Recently building libc++ requires building libunwind too. This updates
the LLDB instructions.

I noticed this recently and it was separately filed as
llvm#84053
llvm#84154)

If the `m_editor_status` is `EditorStatus::Editing`, PrintAsync clears
the currently edited line. In some situations, the edited line is not
saved. After the stream flushes, PrintAsync tries to display the unsaved
line, causing the loss of the edited line.

The issue arose while I was debugging REPRLRun in
[Fuzzilli](https://github.com/googleprojectzero/fuzzilli). I started
LLDB and attempted to set a breakpoint in libreprl-posix.c. I entered
`breakpoint set -f lib` and used the "tab" key for command completion.
After completion, the edited line was flushed, leaving a blank line.
Walter Erquinigo added optional instruction annotations for x86
instructions in 2022 for the `thread trace dump instruction` command,
and code to DisassemblerLLVMC to add annotations for instructions that
change flow control, v. https://reviews.llvm.org/D128477

This was added as an option to `disassemble`, and the trace dump command
enables it by default, but several other instruction dumpers were
changed to display them by default as well. These are only implemented
for Intel instructions, so our disassembly on other targets ends up
looking like

```
(lldb) x/5i 0x1000086e4
0x1000086e4: 0xa9be6ffc   unknown     stp    x28, x27, [sp, #-0x20]!
0x1000086e8: 0xa9017bfd   unknown     stp    x29, x30, [sp, #0x10]
0x1000086ec: 0x910043fd   unknown     add    x29, sp, #0x10
0x1000086f0: 0xd11843ff   unknown     sub    sp, sp, #0x610
0x1000086f4: 0x910c63e8   unknown     add    x8, sp, #0x318
```

instead of `disassemble`'s output style of

```
lldb`main:
lldb[0x1000086e4] <+0>:  stp    x28, x27, [sp, #-0x20]!
lldb[0x1000086e8] <+4>:  stp    x29, x30, [sp, #0x10]
lldb[0x1000086ec] <+8>:  add    x29, sp, #0x10
lldb[0x1000086f0] <+12>: sub    sp, sp, #0x610
lldb[0x1000086f4] <+16>: add    x8, sp, #0x318
```

Adding symbolic annotations for assembly instructions is something I'm
interested in too, because we may have users investigating a crash or
apparent-incorrect behavior who must debug optimized assembly and they
may not be familiar with the ISA they're using, so short of flipping
through a many-thousand-page PDF to understand each instruction, they're
lost. They don't write assembly or work at that level, but to understand
a bug, they have to understand what the instructions are actually doing.

But the annotations that exist today don't move us forward much on that
front - I'd argue that the flow control instructions on Intel are not
hard to understand from their names, but that might just be my personal
bias. Much trickier instructions exist in any event.

Displaying this information by default for all targets when we only have
one class of instructions on one target is not a good default.

Also, in 2011 when Greg implemented the `memory read -f i` (aka `x/i`)
command
```
commit 5009f9d
Author: Greg Clayton <[email protected]>
Date:   Thu Oct 27 17:55:14 2011 +0000
[...]
    eFormatInstruction will print out disassembly with bytes and it will use the
    current target's architecture. The format character for this is "i" (which
    used to be being used for the integer format, but the integer format also has
    "d", so we gave the "i" format to disassembly), the long format is
    "instruction".
```

he had DumpDataExtractor's DumpInstructions print the bytes of the
instruction -- that's the first field we see above for the `x/5i` after
the address -- and this is only useful for people who are debugging the
disassembler itself, I would argue. I don't want this displayed by
default either.

tl;dr this patch removes both fields from `memory read -f -i` and I
think this is the right call today. While I'm really interested in
instruction annotation, I don't think `x/i` is the right place to have
it enabled by default unless it's really compelling on at least some of
our major targets.
This testcase was added to show miscompile in
llvm#81872
This pull request fixes llvm#72116 where a new flag is introduced for
compatibility with GCC 14, the functionality of -Wreturn-type is
modified to split some of its behaviors into -Wreturn-mismatch

Fixes llvm#72116
G_INSERT and G_EXTRACT are not sufficient to use to represent both
INSERT/EXTRACT on a subregister and INSERT/EXTRACT on a vector.

We would like to be able to INSERT/EXTRACT on vectors in cases that
INSERT/EXTRACT on vector subregisters are not sufficient, so we add
these opcodes.

I tried to do a patch where we treated G_EXTRACT as both
G_EXTRACT_SUBVECTOR and G_EXTRACT_SUBREG, but ran into an infinite loop
at this
[point](https://github.com/llvm/llvm-project/blob/8b5b294ec2cf876bc5eb5bd5fcb56ef487e36d60/llvm/lib/Target/RISCV/RISCVISelLowering.cpp#L9932)
in the SDAG equivalent code.
)

NFC.
Test coverage on VOPC shows NotHasTrue16BitInsts on the pre-gfx11
instructions is necessary (we cannot use the default NoTrue16Predicate).
Update the VOP2 instructions in the same manner.
Some comparison intrinsics were described as returning the "result" without
specifying how. The "cmp" intrinsics return zero or all 1's in the
corresponding elements of a returned vector; the "com" intrinsics return
an integer 0 or 1.

Also removed some redundant information.
…84717)

This is for cased that we simply want to rename from ps.Mnemonic, but
ps.Mnemonic itself is not supported as an alias.
…llvm#84382)

Account for the descriptor containing a zero-length string. Also, avoid
iterating backwards too far.

This was detected by address sanitizer.
…lvm#84813)

Reverts llvm#82899

Per the discussion on the PR, this needs more design and justification.
When pretty printing an Objective-C interface declaration, Clang
previously didn't print any attributes that are applied to the
declaration.
llvm#84583)

symbol

This patch puts the default breakpoint on the
sanitizers_address_on_report symbol, and uses the old symbol as a backup
if the default case is not found

rdar://123911522
With llvm#83471 it reduces UBSAN overhead from 44% to 6%.
Measured as "Geomean difference" on "test-suite/MultiSource/Benchmarks"
with PGO build.

On real large server binary we see 95% of code is still instrumented,
with 10% -> 1.5% UBSAN overhead improvements. We can pass this test only
with subset of UBSAN, so base overhead is smaller.

We have followup patches to improve it even further.
Summary:
The current behavior of HIP is that when --offload-device-only is set it
still bundles the outputs into a fat binary. Even though this is
different from how all the other targets handle this, it seems to be
dependned on by some tooling so just make it backwards compatible for
the `-fno-gpu-rdc` case.
The foreign TU list immediately follows the local TU list and they both
use the same index, so that if there are N local TU entries, the index
for the first foreign TU is N.

Changed so that the size of local TU is accounted for when setting
foreign TU index.
lifengxiang1025 and others added 25 commits March 12, 2024 11:00
This header guard is wrong and conflicts with the one from Transport.h
…m#78876)

Previously, `__bounded_iter` only checked `operator*`. It allowed the
pointer to go out of bounds with `operator++`, etc., and relied on
`operator*` (which checked `begin <= current < end`) to handle
everything. This has several unfortunate consequences:

First, pointer arithmetic is UB if it goes out of bounds. So by the time
`operator*` checks, it may be too late and the optimizer may have done
something bad. Checking both operations is safer.

Second, `std::copy` and friends currently bypass bounded iterator
checks. I think the only hope we have to fix this is to key on `iter +
n` doing a check. See llvm#78771 for further discussion. Note this PR is not
sufficient to fix this. It adds the output bounds check, but ends up
doing it after the `memmove`, which is too late.

Finally, doing these checks is actually *more* optimizable. See llvm#78829,
which is fixed by this PR. Keeping the iterator always in bounds means
`operator*` can rely on some invariants and only needs to check `current
!= end`. This aligns better with common iterator patterns, which use
`!=` instead of `<`, so it's easier to delete checks with local
reasoning.

See https://godbolt.org/z/vEWrWEf8h for how this new `__bounded_iter`
impacts compiler output. The old `__bounded_iter` injected checks inside
the loops for all the `sum()` functions, which not only added a check
inside a loop, but also impeded Clang's vectorization. The new
`__bounded_iter` allows all the checks to be optimized out and we emit
the same code as if it wasn't here.

Not everything is ideal however. `add_and_deref` ends up emitting two
comparisons now instead of one. This is because a missed optimization in
Clang. I've filed llvm#78875 for that. I suspect (with no data) that this PR
is still a net performance win because impeding ranged-for loops is
particularly egregious. But ideally we'd fix the optimizer and make
`add_and_deref` fine too.

There's also something funny going on with `std::ranges::find` which I
have not yet figured out yet, but I suspect there are some further
missed optimization opportunities.

Fixes llvm#78829.

(CC @danakj)
Factor complex unary operations into their own function.
These tests were accidentally missed in llvm#83863
The DIV32/64 throughput was improved since Goldmont in the Atom
architecture. The Alder Lake-E shows similar number too. So we shouldn't
add such tunings to Gracemont and later products.

Checked from Agner Fog's table and uops.info.
…ing G_BUILD_VECTOR (llvm#84452)

It is safe to ignore undef values when selecting G_BUILD_VECTOR as undef
values choose random registers for copying values from.
This only happens in C as far as I can tell. The complex varialbe
will have undergone a conversion to bool in C++ before reaching
the unary operator.
For non polymorphic entities, semantics knows the type size and rewrite
sizeof to `"cst element size" * size(x)`.

Lowering has to deal with the polymorphic case where the type size must
be retrieved from the descriptor (note that the lowering implementation
would work with any entity, polymorphic on not, it is just not used for
the non polymorphic cases).
atom.add.noftz.f16 is supported since SM 7.0
This patch adds the thread ID to the subprocess memory shared memory
names. This avoids conflicts for downstream consumers that might want to
consume llvm-exegesis across multiple threads, which would otherwise run
into conflicts due to the same PID running multiple instances.
…vm#84451)"

This reverts commit 6bbe8a2.

This breaks building LLVM on macOS, failing with

    llvm/tools/llvm-exegesis/lib/SubprocessMemory.cpp:146:33: error: out-of-line definition of 'setupAuxiliaryMemoryInSubprocess' does not match any declaration in 'llvm::exegesis::SubprocessMemory'
    Expected<int> SubprocessMemory::setupAuxiliaryMemoryInSubprocess(
…her (llvm#84339)

At the moment, getUnderlyingObjects simply continues for phis that do
not refer to the same underlying object in loops, without adding them to
the list of underlying objects, effectively ignoring those phis.

Instead of ignoring those phis, add them to the list of underlying
objects. This fixes a miscompile where LoopAccessAnalysis fails to
identify a memory dependence, because no underlying objects can be found
for a set of memory accesses.

Fixes llvm#82665.

PR: llvm#84339
A mold argument need to be added to the hlfir.element_addr and set in
lowering so that when the hlfir.element_addr need to be turned into an
hlfir.elemental operation because the designator must be turned into a
value, the mold can be set on the hlfir.elemental to later allocate the
temporary according the the dynamic type.

This situation happens whenever the vector subscripted polymorphic
designator does not appear as an assignment left-hand side, or as an
IO-input item.


I initially thought retrieving the mold would be tricky if the dynamic
type of the designator was set by a part-ref of the right of the vector
subscripts ("array(vector)%polymorphic_comp"), but this turned out to be
impossible because:
1. A derived type component can be polymorphic only if it has the
POINTER or ALLOCATABLE attribute (F2023 C708).
2. Vector-subscripted part are ranked and F2023 C919 prohibits any
part-ref on the right of the rank part to have the POINTER or
ALLOCATABLE attribute.

=> If a vector subscripted designator is polymorphic, the vector
subscripted part is the rightmost part, and the mold is the base of the
vector subscripted part. This makes the retrieval of the mold easy in
lowering. The mold argument is always set to be the base of the vector
subscripted part when lowering the vector subscripted part, and it is
removed at the end of the designator lowering if the designator is not
polymorphic. This way there is no need to find back the mold from the
inside of the hlfir.element_addr body.
COMPILER_RT_HAS_AUXV is used now in builtins so the test need to be in
the builtin-config-ix.cmake too.
This is a one line fix for a Windows specific (I believe) build break.

The build failure looks like this:
`D:\a\_work\1\s\lldb\source\Symbol\Symtab.cpp(128): error C2440:
'<function-style-cast>': cannot convert from 'lldb_private::ConstString'
to 'llvm::StringRef'
D:\a\_work\1\s\lldb\source\Symbol\Symtab.cpp(128): note:
'llvm::StringRef::StringRef': ambiguous call to overloaded function
D:\a\_work\1\s\llvm\include\llvm/ADT/StringRef.h(840): note: could be
'llvm::StringRef::StringRef(llvm::StringRef &&)'
D:\a\_work\1\s\llvm\include\llvm/ADT/StringRef.h(104): note: or
'llvm::StringRef::StringRef(std::string_view)'
D:\a\_work\1\s\lldb\source\Symbol\Symtab.cpp(128): note: while trying to
match the argument list '(lldb_private::ConstString)'
D:\a\_work\1\s\lldb\source\Symbol\Symtab.cpp(128): error C2672:
'std::multimap<llvm::StringRef,const lldb_private::Symbol
*,std::less<llvm::StringRef>,std::allocator<std::pair<const
llvm::StringRef,const lldb_private::Symbol *>>>::emplace': no matching
overloaded function found
C:\Program Files\Microsoft Visual
Studio\2022\Enterprise\VC\Tools\MSVC\14.37.32822\include\map(557): note:
could be
'std::_Tree_iterator<std::_Tree_val<std::_Tree_simple_types<std::pair<const
llvm::StringRef,const lldb_private::Symbol *>>>>
std::multimap<llvm::StringRef,const lldb_private::Symbol
*,std::less<llvm::StringRef>,std::allocator<std::pair<const
llvm::StringRef,const lldb_private::Symbol *>>>::emplace(_Valty &&...)'
`

The StringRef constructor here is intended to take a ConstString object,
which I assume is implicitly converted to a std::string_view by
compilers other than Visual Studio's. To fix the VS build I made the
StringRef initialization more explicit, as you can see in the diff.
Use the new range attribute from llvm#84617
to simplify comparisons where both sides have range information.
The TranspBlocks set was used to cache aliasing decision for all
processed loads in the parent loop. This is incorrect, because each load
can access a different location, which means one load not being modified
in a block doesn't translate to another load not being modified in the
same block.

All loads access the same underlying object, so we could perhaps use a
location without size for all loads and retain the cache, but that would
mean we loose precision.

For now, just drop the cache.

Fixes llvm#84807

PR: llvm#84835
Add the ability to set the number of tablegen jobs that can run in
parallel
similar to the LLVM_PARALLEL_[COMPILE|LINK]_JOBS options that already
exist.
…#84739)

Have DIBuilder conditionally insert either debug intrinsics or DbgRecord
depending on the module's IsNewDbgInfoFormat flag. The insertion methods
now return a `DbgInstPtr` (a `PointerUnion<Instruction *, DbgRecord
*>`).

Add a unittest for both modes (I couldn't find an existing test testing
insertion behaviours specifically).

This patch changes the existing assumption that DbgRecords are only ever
inserted if there's an instruction to insert-before because clang
currently inserts debug intrinsics while CodeGening (like any other
instruction) meaning it'll try inserting to the end of a block without a
terminator. We already have machinery in place to maintain the
DbgRecords when a terminator is removed - these become "trailing
DbgRecords" which are re-attached when a new instruction is inserted.
All I've done is allow this state to occur while inserting DbgRecords
too, i.e., it's not only removing terminators that causes this valid
transient state, but inserting DbgRecords into incomplete blocks too.

The C API will be updated in follow up patches.

---

Note: this doesn't mean clang is emitting DbgRecords yet, because the
modules it creates are still always in the old debug mode. That will
come in a future patch.
Base automatically changed from bump_to_818af71b to feature/fused-ops August 15, 2024 07:01
An error occurred while trying to automatically change base from bump_to_818af71b to feature/fused-ops August 15, 2024 07:01
@mgehre-amd mgehre-amd merged commit 6b89ba9 into feature/fused-ops Aug 15, 2024
10 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_9997e039 branch August 15, 2024 07:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.