Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge upstream changes #186

Merged
merged 5,304 commits into from
Jul 6, 2022
Merged

Conversation

Ralender
Copy link
Contributor

@Ralender Ralender commented Jun 2, 2022

#185 is intergrated into this PR because it is needed to run the tests

brads55 and others added 30 commits May 4, 2022 12:57
…jects are present

Without SVE, after a dynamic stack allocation has modified the SP, it is
presumed that a frame pointer restoration will revert the SP back to
it's correct value prior to any caller stack being restored. However the
SVE frame is restored using the stack pointer directly, as it is located
after the frame pointer. This means that in the presence of a dynamic
stack allocation, any SVE callee state gets corrupted as SP has the
incorrect value when the SVE state is restored.

To address this issue, when variable sized objects and SVE CSRs are
present, treat the stack as having been realigned, hence restoring the
stack pointer from the frame pointerr prior to restoring the SVE state.

Differential Revision: https://reviews.llvm.org/D124615
This is a speculative fix for a build bot which does not put the LLVM
revision information into the PCH hash.

http://45.33.8.238/linux/75290/step_7.txt
…#55224

Fix uninitialized variables introduced by D116325.

Differential Revision: https://reviews.llvm.org/D124916
Demanded bits analysis may replace a full-width not with a
any_extend (not (truncate X)) pattern. This patch looks through
this kind of pattern in haveNoCommonBitsSet(). Of course, we can
only do this if we only need negated bits in the non-extended part,
as the other bits may now be arbitrary. For example, if we have
haveNoCommonBitsSet(~X & Y, X) then ~X only needs to actually
negate bits set in Y.

This is only a partial solution to the problem in that it allows
add -> or conversion, but the resulting or doesn't get folded yet.
(I guess that will involve exposing getBitwiseNotOperand() as a
more general helper and using that in the relevant transform.)

Differential Revision: https://reviews.llvm.org/D124856
This test shows incorrect cross-bb insertion. We'd expect to see
a SEW=8 vsetvli, something like:

        vsetvli zero, zero, e8, mf8, ta, mu
        vluxei64.v      v1, (a2), v8, v0.t

But instead the vsetvli is omitted and instead an inherited SEW=64
vsetvli is used:
        vmv1r.v v9, v1
        vsetvli a3, zero, e64, m1, ta, mu
        vmseq.vi        v9, v1, 0
        vmv1r.v v8, v0
        vmandn.mm       v0, v9, v2
        beqz    a0, .LBB0_2
    # %bb.1:
        vluxei64.v      v1, (a2), v8, v0.t
        vmv1r.v v3, v1

The "mask reg op" vmandn.mm in bb.1 appears to be confusing the insertion
process, as it is able to elide its own vsetvli as its VLMAX (SEW=8,
LMUL=MF8) is identical to the previous one (SEW=64, LMUL=1).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124089
We cache device library programs (like fallback assertion) in the
context. In multi-threaded applications simultaneous access to the
cache of device lib programs is possible in program_manager::build().
That's why access to the this cache needs to be guarded to avoid data
race.
`IgnoreParenImpCasts` will remove implicit casts to bool
(e.g. `PointerToBoolean`), such that the resulting expression may not
be of the `bool` type. The `cast_or_null<BoolValue>` in
`extendFlowCondition` will then trigger an assert, as the pointer
expression will not have a `BoolValue`.

Instead, we only skip `ExprWithCleanups` and `ParenExpr` nodes, as the
CFG does not emit them.

Differential Revision: https://reviews.llvm.org/D124807
https://alive2.llvm.org/ce/z/sD-JVv

This extends 432c199 with a 3 arg intrinsic to demonstrate
that the code works with the extra operand.

Eventually, we will want to use llvm::isTriviallyVectorizable()
or create some new API for this list, but for now, I am intentionally
making a minimum change to reduce risk and only affect an intrinsic
with regression tests in place.
- Exit early when constraint caching is disabled.
- Use unique_ptr to manage temporary lifetime.
- Fix a typo in a comment (InsertPos instead of InsertNode).

The new code duplicates the forwarding call to CheckConstraintSatisfaction,
but reduces the number of interconnected if statements and simplifies lifetime
management.

This increases the overall readability.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D124923
The fold added with 9c4770e neglected to propagate FMF.
This was missed when extending the fold to allow fma with
9c4770e
Test for printing plan with additions from D123537.
We were failing to check if the controlling expression is dependent or
not when testing whether it has side effects. This would trigger an
assertion. Instead, if the controlling expression is dependent, we
suppress the check and diagnostic.

This fixes Issue 50227.
tapi & clang-extractapi both attempt to construct then check against
how a header was included to determine api information when working
against multiple search paths, headermap, and vfsoverlay mechanisms.
Validating this against what the preprocessor sees during lookup time
makes this check more reliable.

Reviewed By: zixuw, jansvoboda11

Differential Revision: https://reviews.llvm.org/D124638
Support int8, int16, int32 and int32. Also fix source code format in mlir_pytaco_utils.py.

Add tests.

Reviewed By: aartbik

Differential Revision: https://reviews.llvm.org/D124925
CUDA/HIP needs to mangle for aux target. When mangling for aux target,
the mangler should use mangling number for aux target. Previously
in https://reviews.llvm.org/D122734 a state was introduced in
ASTContext to let the mangler get mangling number for aux target
from ASTContext. This patch removes that state from ASTConext
and add an IsAux member to MangleContext to indicate that
the mangle context is for aux target. This reflects the reality that
the mangle context is created for mangling aux target and makes
ASTContext cleaner.

Reviewed by: Artem Belevich, Reid Kleckner

Differential Revision: https://reviews.llvm.org/D124842
…ands

This extends 432c199 and 9c4770e with an intrinsic
cited directly in issue #46238

Eventually, we will want to use llvm::isTriviallyVectorizable()
or create some new API for this list, but for now, I am intentionally
making a minimum change to reduce risk and only affect an intrinsic
with regression tests in place.
This check is in the related fold for binops,
but it was missed when the code was adapted
for intrinsics in 432c199. The new test
would crash when trying to create a new
intrinsic with mismatched types.
When constant evaluating the initializer for an object of vector type,
we would call APInt::trunc() but truncate to the same bit-width the
object already had, which would cause an assertion. Instead, use
APInt::truncOrSelf() so that we no longer assert in this situation.

Fix #50216
This patch transforms the given input headers to relative include names
using header search entries and some heuritics.
For example: `/Path/To/Header.h` will be included as `<Header.h>` with a
search path of `-I /Path/To/`; and
`/Path/To/Framework.framework/Headers/Header.h` will be included as
`<Framework/Header.h>`, given a search path of `-F /Path/To`.
Headermaps will also be queried in reverse to find a spelled name to
include headers.

Differential Revision: https://reviews.llvm.org/D123831
4c262fe accidentally added local
unfinished test case clang/test/Index/annotate-comments-enum-constant.c
This patch removes it.
If LLDB index cache is enabled and everything is cached, then loading of debug
info is essentially single-threaded, because it's done from PreloadSymbols()
called from GetOrCreateModule(), which is called from a loop calling
LoadModuleAtAddress() in DynamicLoaderPOSIXDYLD. Parallelizing the entire
loop could be unsafe because of GetOrCreateModule() operating on a module
list, so instead move only the PreloadSymbols() call to Target::ModulesDidLoad()
and parallelize there, which should be safe.

This may greatly reduce the load time if the debugged program uses a large
number of binaries (as opposed to monolithic programs where this presumably
doesn't make a difference). In my specific case of LibreOffice Calc this reduces
startup time from 6s to 2s.

Differential Revision: https://reviews.llvm.org/D122975
Reorganize the test and simplify the #ifdefs. Fix a typo in __powerpc64__
as a fly-by, and also add a test for the unstable ABI.

Differential Revision: https://reviews.llvm.org/D124403
jinge90 and others added 12 commits May 26, 2022 09:41
When this option is used, we use online link for SYCL device libraries,
otherwise, we use offline link, all wrapper and fallback device libraries
are linked with user's device image in compilation time.

Signed-off-by: jinge90 [email protected]
We need to destroy resources when the Ctx object is destructed. Otherwise the
destruction of internal RT objects would be delayed until global context
destruction that happens *after* ~PiMock.
Previously the max local mem allocation in DPC++ for CUDA backend was 48KB for most devices.

However as https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html notes, for the A100 the max local mem dynamic allocation is in fact 164KB.

This PR introduces an environment variable SYCL_PI_CUDA_MAX_LOCAL_MEM_SZ which allows you to manually specify the max local memory in bytes allowed to be allocated per kernel for a given application. If an invalid value is specified (one that exceeds the device's capabilities/is negative) then a runtime error will be thrown.

Using:
SYCL_PI_CUDA_MAX_LOCAL_MEM_SZ=166912 ./a.out

Allows the application to use up to 163KB of local memory, if the device supports it.
A pull request to solve the issue intel/llvm#6123

Co-authored-by: Jin Z <[email protected]>
Co-authored-by: Nicolas Miller <[email protected]>
…annotations_member (#5884)

* [SYCL][Doc] Adjust design for compile-time properties through add_ir_annotations_member

During implementation of the attribute and translation of annotations on fields, the design was conflicting with existing features. This commit makes the following design changes:

 * Change the `llvm.ptr.annotation` intrinsic call produced by  `[[__sycl_detail__::add_ir_annotations_member()]]` to use a pointer to a constant global variable instead of metadata. This is done to adhere to the signature of the intrinsic.
 * Change the representation consumed by the translator from a new SPIR-V builtin to an extended version of existing decoration parsing using `llvm.ptr.annotation`.

Implementation of these are in review intel/llvm#5879 and KhronosGroup/SPIRV-LLVM-Translator#1446

Signed-off-by: Larsen, Steffen <[email protected]>
intel/llvm#6180 moved the retain logic for
events to the make_event interop API. However, this means that the old
OpenCL interop constructor for event does not retain the native event.
These changes add the retain logic directly to the old OpenCL interop
constructor for OpenCL.

Signed-off-by: Larsen, Steffen <[email protected]>
* [SYCL][ESIMD][EMU] LSC support for ESIMD_EMULATOR backend

36 out of 40 LSC tests are passing
…ns (#6192)

Commit removes mutex locks from PI release and retain functions. They
were added under impression that 2 threads working simultaneously on
releasing resources can both reach ref count 0 and cause double free.
But since our ref count is atomic and decrement is an atomic operation
only single thread will execute code under if (--(PiObj->RefCount) == 0).
There can't be 2 threads which can reach ref count 0.

Behavior is undefined if DPCPP runtime calls somehow uses deleted pi object
(with ref count 0) or provides it as an argument to any of the pi functions.
@Ralender
Copy link
Contributor Author

Ralender commented Jul 5, 2022

could we get this merged ?

Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@keryell keryell merged commit bdfea91 into triSYCL:sycl/unified/master Jul 6, 2022
@Ralender
Copy link
Contributor Author

Ralender commented Jul 7, 2022

I just realized it was pushed onto master. we need to update next

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment