Merge upstream changes #186

Ralender · 2022-06-02T00:07:17Z

#185 is intergrated into this PR because it is needed to run the tests

…jects are present Without SVE, after a dynamic stack allocation has modified the SP, it is presumed that a frame pointer restoration will revert the SP back to it's correct value prior to any caller stack being restored. However the SVE frame is restored using the stack pointer directly, as it is located after the frame pointer. This means that in the presence of a dynamic stack allocation, any SVE callee state gets corrupted as SP has the incorrect value when the SVE state is restored. To address this issue, when variable sized objects and SVE CSRs are present, treat the stack as having been realigned, hence restoring the stack pointer from the frame pointerr prior to restoring the SVE state. Differential Revision: https://reviews.llvm.org/D124615

This is a speculative fix for a build bot which does not put the LLVM revision information into the PCH hash. http://45.33.8.238/linux/75290/step_7.txt

This should address failing test bots: https://lab.llvm.org/buildbot/#/builders/68/builds/31828

…#55224 Fix uninitialized variables introduced by D116325. Differential Revision: https://reviews.llvm.org/D124916

Demanded bits analysis may replace a full-width not with a any_extend (not (truncate X)) pattern. This patch looks through this kind of pattern in haveNoCommonBitsSet(). Of course, we can only do this if we only need negated bits in the non-extended part, as the other bits may now be arbitrary. For example, if we have haveNoCommonBitsSet(~X & Y, X) then ~X only needs to actually negate bits set in Y. This is only a partial solution to the problem in that it allows add -> or conversion, but the resulting or doesn't get folded yet. (I guess that will involve exposing getBitwiseNotOperand() as a more general helper and using that in the relevant transform.) Differential Revision: https://reviews.llvm.org/D124856

This test shows incorrect cross-bb insertion. We'd expect to see a SEW=8 vsetvli, something like: vsetvli zero, zero, e8, mf8, ta, mu vluxei64.v v1, (a2), v8, v0.t But instead the vsetvli is omitted and instead an inherited SEW=64 vsetvli is used: vmv1r.v v9, v1 vsetvli a3, zero, e64, m1, ta, mu vmseq.vi v9, v1, 0 vmv1r.v v8, v0 vmandn.mm v0, v9, v2 beqz a0, .LBB0_2 # %bb.1: vluxei64.v v1, (a2), v8, v0.t vmv1r.v v3, v1 The "mask reg op" vmandn.mm in bb.1 appears to be confusing the insertion process, as it is able to elide its own vsetvli as its VLMAX (SEW=8, LMUL=MF8) is identical to the previous one (SEW=64, LMUL=1). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124089

We cache device library programs (like fallback assertion) in the context. In multi-threaded applications simultaneous access to the cache of device lib programs is possible in program_manager::build(). That's why access to the this cache needs to be guarded to avoid data race.

`IgnoreParenImpCasts` will remove implicit casts to bool (e.g. `PointerToBoolean`), such that the resulting expression may not be of the `bool` type. The `cast_or_null<BoolValue>` in `extendFlowCondition` will then trigger an assert, as the pointer expression will not have a `BoolValue`. Instead, we only skip `ExprWithCleanups` and `ParenExpr` nodes, as the CFG does not emit them. Differential Revision: https://reviews.llvm.org/D124807

Differential Revision: https://reviews.llvm.org/D124931

https://alive2.llvm.org/ce/z/sD-JVv This extends 432c199 with a 3 arg intrinsic to demonstrate that the code works with the extra operand. Eventually, we will want to use llvm::isTriviallyVectorizable() or create some new API for this list, but for now, I am intentionally making a minimum change to reduce risk and only affect an intrinsic with regression tests in place.

- Exit early when constraint caching is disabled. - Use unique_ptr to manage temporary lifetime. - Fix a typo in a comment (InsertPos instead of InsertNode). The new code duplicates the forwarding call to CheckConstraintSatisfaction, but reduces the number of interconnected if statements and simplifies lifetime management. This increases the overall readability. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D124923

The fold added with 9c4770e neglected to propagate FMF.

This was missed when extending the fold to allow fma with 9c4770e

Test for printing plan with additions from D123537.

We were failing to check if the controlling expression is dependent or not when testing whether it has side effects. This would trigger an assertion. Instead, if the controlling expression is dependent, we suppress the check and diagnostic. This fixes Issue 50227.

tapi & clang-extractapi both attempt to construct then check against how a header was included to determine api information when working against multiple search paths, headermap, and vfsoverlay mechanisms. Validating this against what the preprocessor sees during lookup time makes this check more reliable. Reviewed By: zixuw, jansvoboda11 Differential Revision: https://reviews.llvm.org/D124638

Support int8, int16, int32 and int32. Also fix source code format in mlir_pytaco_utils.py. Add tests. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D124925

CUDA/HIP needs to mangle for aux target. When mangling for aux target, the mangler should use mangling number for aux target. Previously in https://reviews.llvm.org/D122734 a state was introduced in ASTContext to let the mangler get mangling number for aux target from ASTContext. This patch removes that state from ASTConext and add an IsAux member to MangleContext to indicate that the mangle context is for aux target. This reflects the reality that the mangle context is created for mangling aux target and makes ASTContext cleaner. Reviewed by: Artem Belevich, Reid Kleckner Differential Revision: https://reviews.llvm.org/D124842

…ands This extends 432c199 and 9c4770e with an intrinsic cited directly in issue #46238 Eventually, we will want to use llvm::isTriviallyVectorizable() or create some new API for this list, but for now, I am intentionally making a minimum change to reduce risk and only affect an intrinsic with regression tests in place.

This check is in the related fold for binops, but it was missed when the code was adapted for intrinsics in 432c199. The new test would crash when trying to create a new intrinsic with mismatched types.

When constant evaluating the initializer for an object of vector type, we would call APInt::trunc() but truncate to the same bit-width the object already had, which would cause an assertion. Instead, use APInt::truncOrSelf() so that we no longer assert in this situation. Fix #50216

This patch transforms the given input headers to relative include names using header search entries and some heuritics. For example: `/Path/To/Header.h` will be included as `<Header.h>` with a search path of `-I /Path/To/`; and `/Path/To/Framework.framework/Headers/Header.h` will be included as `<Framework/Header.h>`, given a search path of `-F /Path/To`. Headermaps will also be queried in reverse to find a spelled name to include headers. Differential Revision: https://reviews.llvm.org/D123831

4c262fe accidentally added local unfinished test case clang/test/Index/annotate-comments-enum-constant.c This patch removes it.

If LLDB index cache is enabled and everything is cached, then loading of debug info is essentially single-threaded, because it's done from PreloadSymbols() called from GetOrCreateModule(), which is called from a loop calling LoadModuleAtAddress() in DynamicLoaderPOSIXDYLD. Parallelizing the entire loop could be unsafe because of GetOrCreateModule() operating on a module list, so instead move only the PreloadSymbols() call to Target::ModulesDidLoad() and parallelize there, which should be safe. This may greatly reduce the load time if the debugged program uses a large number of binaries (as opposed to monolithic programs where this presumably doesn't make a difference). In my specific case of LibreOffice Calc this reduces startup time from 6s to 2s. Differential Revision: https://reviews.llvm.org/D122975

Reorganize the test and simplify the #ifdefs. Fix a typo in __powerpc64__ as a fly-by, and also add a test for the unstable ABI. Differential Revision: https://reviews.llvm.org/D124403

When this option is used, we use online link for SYCL device libraries, otherwise, we use offline link, all wrapper and fallback device libraries are linked with user's device image in compilation time. Signed-off-by: jinge90 [email protected]

We need to destroy resources when the Ctx object is destructed. Otherwise the destruction of internal RT objects would be delayed until global context destruction that happens *after* ~PiMock.

Previously the max local mem allocation in DPC++ for CUDA backend was 48KB for most devices. However as https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html notes, for the A100 the max local mem dynamic allocation is in fact 164KB. This PR introduces an environment variable SYCL_PI_CUDA_MAX_LOCAL_MEM_SZ which allows you to manually specify the max local memory in bytes allowed to be allocated per kernel for a given application. If an invalid value is specified (one that exceeds the device's capabilities/is negative) then a runtime error will be thrown. Using: SYCL_PI_CUDA_MAX_LOCAL_MEM_SZ=166912 ./a.out Allows the application to use up to 163KB of local memory, if the device supports it.

A pull request to solve the issue intel/llvm#6123 Co-authored-by: Jin Z <[email protected]> Co-authored-by: Nicolas Miller <[email protected]>

…annotations_member (#5884) * [SYCL][Doc] Adjust design for compile-time properties through add_ir_annotations_member During implementation of the attribute and translation of annotations on fields, the design was conflicting with existing features. This commit makes the following design changes: * Change the `llvm.ptr.annotation` intrinsic call produced by `[[__sycl_detail__::add_ir_annotations_member()]]` to use a pointer to a constant global variable instead of metadata. This is done to adhere to the signature of the intrinsic. * Change the representation consumed by the translator from a new SPIR-V builtin to an extended version of existing decoration parsing using `llvm.ptr.annotation`. Implementation of these are in review intel/llvm#5879 and KhronosGroup/SPIRV-LLVM-Translator#1446 Signed-off-by: Larsen, Steffen <[email protected]>

intel/llvm#6180 moved the retain logic for events to the make_event interop API. However, this means that the old OpenCL interop constructor for event does not retain the native event. These changes add the retain logic directly to the old OpenCL interop constructor for OpenCL. Signed-off-by: Larsen, Steffen <[email protected]>

* [SYCL][ESIMD][EMU] LSC support for ESIMD_EMULATOR backend 36 out of 40 LSC tests are passing

…ns (#6192) Commit removes mutex locks from PI release and retain functions. They were added under impression that 2 threads working simultaneously on releasing resources can both reach ref count 0 and cause double free. But since our ref count is atomic and decrement is an atomic operation only single thread will execute code under if (--(PiObj->RefCount) == 0). There can't be 2 threads which can reach ref count 0. Behavior is undefined if DPCPP runtime calls somehow uses deleted pi object (with ref count 0) or provides it as an argument to any of the pi functions.

Ralender · 2022-07-05T09:53:00Z

could we get this merged ?

keryell

Great!

Ralender · 2022-07-07T16:08:58Z

I just realized it was pushed onto master. we need to update next

brads55 and others added 30 commits May 4, 2022 12:57

Bump the serialization major version number

1587f6b

This is a speculative fix for a build bot which does not put the LLVM revision information into the PCH hash. http://45.33.8.238/linux/75290/step_7.txt

Do not rely on implicit int for this test

2df9bd3

This should address failing test bots: https://lab.llvm.org/buildbot/#/builders/68/builds/31828

[X86] Fix uninitialized variable warnings in cetintrin.h reported by …

2d18a86

…#55224 Fix uninitialized variables introduced by D116325. Differential Revision: https://reviews.llvm.org/D124916

[SCEV] Add additional poison implication tests (NFC)

2f64a6c

[VectorCombine] Add tests for shuffle binops patterns. NFC

7aadfc5

[mlir] Add a flag to allow equivalent results.

e8f7d01

Differential Revision: https://reviews.llvm.org/D124931

[InstCombine] add tests for fma with shuffled operands; NFC

03e36d8

[InstCombine] add FMF to tests for better coverage; NFC

4954f0d

The fold added with 9c4770e neglected to propagate FMF.

[InstCombine] propagate FMF when reordering intrinsics and shuffles

15042f4

This was missed when extending the fold to allow fma with 9c4770e

[VPlan] Add test for printing plan with an exit value.

ff8d0b3

Test for printing plan with additions from D123537.

[mlir][sparse][taco] Support more data types.

1cd13e6

Support int8, int16, int32 and int32. Also fix source code format in mlir_pytaco_utils.py. Add tests. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D124925

[InstCombine] add tests for funnel-shift with shuffled operands; NFC

629e1e8

[InstCombine] add type constraint to intrinsic+shuffle fold

14f2576

This check is in the related fold for binops, but it was missed when the code was adapted for intrinsics in 432c199. The new test would crash when trying to create a new intrinsic with mismatched types.

[NFC] Remove unfinished test case

5f841c7

4c262fe accidentally added local unfinished test case clang/test/Index/annotate-comments-enum-constant.c This patch removes it.

[libc++] Refactor max_size.pass.cpp

0e2fb8a

Reorganize the test and simplify the #ifdefs. Fix a typo in __powerpc64__ as a fly-by, and also add a test for the unstable ABI. Differential Revision: https://reviews.llvm.org/D124403

[HWASan] cleanup imports in hwasan_symbolize.

1b2704f

jinge90 and others added 12 commits May 26, 2022 09:41

[SYCL] Fix unittest crashes after PR #6128 (#6206)

7384e43

We need to destroy resources when the Ctx object is destructed. Otherwise the destruction of internal RT objects would be delayed until global context destruction that happens *after* ~PiMock.

[SYCL] Fix returned alignment of allocation functions (#6205)

3114f02

[SYCL][HIP] Vendor ID for AMD devices is expected to be 4098 (#6127)

a1b42aa

A pull request to solve the issue intel/llvm#6123 Co-authored-by: Jin Z <[email protected]> Co-authored-by: Nicolas Miller <[email protected]>

[SYCL][ESIMD][EMU] LSC support for ESIMD_EMULATOR backend (#6099)

b78bf00

* [SYCL][ESIMD][EMU] LSC support for ESIMD_EMULATOR backend 36 out of 40 LSC tests are passing

Merge remote-tracking branch 'intel/sycl' into MergeIntel

1e39828

Minimize diff with upstream

d04250e

[SYCL] Add newPM support for passes in LLVMSYCL

3fa9b01

Ralender force-pushed the MergeIntel branch from 61210d4 to dabc796 Compare June 2, 2022 16:02

Merge branch 'sycl/unified/next' into MergeIntel

5a97a7d

Ralender force-pushed the MergeIntel branch from dabc796 to 5a97a7d Compare June 27, 2022 14:31

Gauthier Harnisch added 3 commits July 5, 2022 01:15

Merge branch 'sycl/unified/next' into MergeIntel

b87a74e

[VXX][test] Minimize the set of environement variables needed for tests

018a553

[VXX][test] reorganize test target names

a326f83

Ralender force-pushed the MergeIntel branch from 7194479 to a326f83 Compare July 5, 2022 09:41

This was linked to issues Jul 5, 2022

There are still some xocc and XOCC in sycl/test/lit.cfg.py #188

Open

Check whether XILINX_HLS is still used by the test infrastructure #187

Closed

Ralender removed a link to an issue Jul 5, 2022

There are still some xocc and XOCC in sycl/test/lit.cfg.py #188

Open

Merge branch 'sycl/unified/next' into MergeIntel

97aa7b7

Ralender removed a link to an issue Jul 6, 2022

Check whether XILINX_HLS is still used by the test infrastructure #187

Closed

Ralender linked an issue Jul 6, 2022 that may be closed by this pull request

Check whether XILINX_HLS is still used by the test infrastructure #187

Closed

Ralender mentioned this pull request Jul 6, 2022

Fixes next vitis #191

Merged

keryell approved these changes Jul 6, 2022

View reviewed changes

keryell merged commit bdfea91 into triSYCL:sycl/unified/master Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge upstream changes #186

Merge upstream changes #186

Ralender commented Jun 2, 2022

Ralender commented Jul 5, 2022

keryell left a comment

Ralender commented Jul 7, 2022

Merge upstream changes #186

Merge upstream changes #186

Conversation

Ralender commented Jun 2, 2022

Ralender commented Jul 5, 2022

keryell left a comment

Choose a reason for hiding this comment

Ralender commented Jul 7, 2022