-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge upstream changes #186
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…jects are present Without SVE, after a dynamic stack allocation has modified the SP, it is presumed that a frame pointer restoration will revert the SP back to it's correct value prior to any caller stack being restored. However the SVE frame is restored using the stack pointer directly, as it is located after the frame pointer. This means that in the presence of a dynamic stack allocation, any SVE callee state gets corrupted as SP has the incorrect value when the SVE state is restored. To address this issue, when variable sized objects and SVE CSRs are present, treat the stack as having been realigned, hence restoring the stack pointer from the frame pointerr prior to restoring the SVE state. Differential Revision: https://reviews.llvm.org/D124615
This is a speculative fix for a build bot which does not put the LLVM revision information into the PCH hash. http://45.33.8.238/linux/75290/step_7.txt
This should address failing test bots: https://lab.llvm.org/buildbot/#/builders/68/builds/31828
…#55224 Fix uninitialized variables introduced by D116325. Differential Revision: https://reviews.llvm.org/D124916
Demanded bits analysis may replace a full-width not with a any_extend (not (truncate X)) pattern. This patch looks through this kind of pattern in haveNoCommonBitsSet(). Of course, we can only do this if we only need negated bits in the non-extended part, as the other bits may now be arbitrary. For example, if we have haveNoCommonBitsSet(~X & Y, X) then ~X only needs to actually negate bits set in Y. This is only a partial solution to the problem in that it allows add -> or conversion, but the resulting or doesn't get folded yet. (I guess that will involve exposing getBitwiseNotOperand() as a more general helper and using that in the relevant transform.) Differential Revision: https://reviews.llvm.org/D124856
This test shows incorrect cross-bb insertion. We'd expect to see a SEW=8 vsetvli, something like: vsetvli zero, zero, e8, mf8, ta, mu vluxei64.v v1, (a2), v8, v0.t But instead the vsetvli is omitted and instead an inherited SEW=64 vsetvli is used: vmv1r.v v9, v1 vsetvli a3, zero, e64, m1, ta, mu vmseq.vi v9, v1, 0 vmv1r.v v8, v0 vmandn.mm v0, v9, v2 beqz a0, .LBB0_2 # %bb.1: vluxei64.v v1, (a2), v8, v0.t vmv1r.v v3, v1 The "mask reg op" vmandn.mm in bb.1 appears to be confusing the insertion process, as it is able to elide its own vsetvli as its VLMAX (SEW=8, LMUL=MF8) is identical to the previous one (SEW=64, LMUL=1). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D124089
We cache device library programs (like fallback assertion) in the context. In multi-threaded applications simultaneous access to the cache of device lib programs is possible in program_manager::build(). That's why access to the this cache needs to be guarded to avoid data race.
`IgnoreParenImpCasts` will remove implicit casts to bool (e.g. `PointerToBoolean`), such that the resulting expression may not be of the `bool` type. The `cast_or_null<BoolValue>` in `extendFlowCondition` will then trigger an assert, as the pointer expression will not have a `BoolValue`. Instead, we only skip `ExprWithCleanups` and `ParenExpr` nodes, as the CFG does not emit them. Differential Revision: https://reviews.llvm.org/D124807
Differential Revision: https://reviews.llvm.org/D124931
https://alive2.llvm.org/ce/z/sD-JVv This extends 432c199 with a 3 arg intrinsic to demonstrate that the code works with the extra operand. Eventually, we will want to use llvm::isTriviallyVectorizable() or create some new API for this list, but for now, I am intentionally making a minimum change to reduce risk and only affect an intrinsic with regression tests in place.
- Exit early when constraint caching is disabled. - Use unique_ptr to manage temporary lifetime. - Fix a typo in a comment (InsertPos instead of InsertNode). The new code duplicates the forwarding call to CheckConstraintSatisfaction, but reduces the number of interconnected if statements and simplifies lifetime management. This increases the overall readability. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D124923
The fold added with 9c4770e neglected to propagate FMF.
This was missed when extending the fold to allow fma with 9c4770e
Test for printing plan with additions from D123537.
We were failing to check if the controlling expression is dependent or not when testing whether it has side effects. This would trigger an assertion. Instead, if the controlling expression is dependent, we suppress the check and diagnostic. This fixes Issue 50227.
tapi & clang-extractapi both attempt to construct then check against how a header was included to determine api information when working against multiple search paths, headermap, and vfsoverlay mechanisms. Validating this against what the preprocessor sees during lookup time makes this check more reliable. Reviewed By: zixuw, jansvoboda11 Differential Revision: https://reviews.llvm.org/D124638
Support int8, int16, int32 and int32. Also fix source code format in mlir_pytaco_utils.py. Add tests. Reviewed By: aartbik Differential Revision: https://reviews.llvm.org/D124925
CUDA/HIP needs to mangle for aux target. When mangling for aux target, the mangler should use mangling number for aux target. Previously in https://reviews.llvm.org/D122734 a state was introduced in ASTContext to let the mangler get mangling number for aux target from ASTContext. This patch removes that state from ASTConext and add an IsAux member to MangleContext to indicate that the mangle context is for aux target. This reflects the reality that the mangle context is created for mangling aux target and makes ASTContext cleaner. Reviewed by: Artem Belevich, Reid Kleckner Differential Revision: https://reviews.llvm.org/D124842
…ands This extends 432c199 and 9c4770e with an intrinsic cited directly in issue #46238 Eventually, we will want to use llvm::isTriviallyVectorizable() or create some new API for this list, but for now, I am intentionally making a minimum change to reduce risk and only affect an intrinsic with regression tests in place.
This check is in the related fold for binops, but it was missed when the code was adapted for intrinsics in 432c199. The new test would crash when trying to create a new intrinsic with mismatched types.
When constant evaluating the initializer for an object of vector type, we would call APInt::trunc() but truncate to the same bit-width the object already had, which would cause an assertion. Instead, use APInt::truncOrSelf() so that we no longer assert in this situation. Fix #50216
This patch transforms the given input headers to relative include names using header search entries and some heuritics. For example: `/Path/To/Header.h` will be included as `<Header.h>` with a search path of `-I /Path/To/`; and `/Path/To/Framework.framework/Headers/Header.h` will be included as `<Framework/Header.h>`, given a search path of `-F /Path/To`. Headermaps will also be queried in reverse to find a spelled name to include headers. Differential Revision: https://reviews.llvm.org/D123831
4c262fe accidentally added local unfinished test case clang/test/Index/annotate-comments-enum-constant.c This patch removes it.
If LLDB index cache is enabled and everything is cached, then loading of debug info is essentially single-threaded, because it's done from PreloadSymbols() called from GetOrCreateModule(), which is called from a loop calling LoadModuleAtAddress() in DynamicLoaderPOSIXDYLD. Parallelizing the entire loop could be unsafe because of GetOrCreateModule() operating on a module list, so instead move only the PreloadSymbols() call to Target::ModulesDidLoad() and parallelize there, which should be safe. This may greatly reduce the load time if the debugged program uses a large number of binaries (as opposed to monolithic programs where this presumably doesn't make a difference). In my specific case of LibreOffice Calc this reduces startup time from 6s to 2s. Differential Revision: https://reviews.llvm.org/D122975
Reorganize the test and simplify the #ifdefs. Fix a typo in __powerpc64__ as a fly-by, and also add a test for the unstable ABI. Differential Revision: https://reviews.llvm.org/D124403
When this option is used, we use online link for SYCL device libraries, otherwise, we use offline link, all wrapper and fallback device libraries are linked with user's device image in compilation time. Signed-off-by: jinge90 [email protected]
We need to destroy resources when the Ctx object is destructed. Otherwise the destruction of internal RT objects would be delayed until global context destruction that happens *after* ~PiMock.
Previously the max local mem allocation in DPC++ for CUDA backend was 48KB for most devices. However as https://docs.nvidia.com/cuda/ampere-tuning-guide/index.html notes, for the A100 the max local mem dynamic allocation is in fact 164KB. This PR introduces an environment variable SYCL_PI_CUDA_MAX_LOCAL_MEM_SZ which allows you to manually specify the max local memory in bytes allowed to be allocated per kernel for a given application. If an invalid value is specified (one that exceeds the device's capabilities/is negative) then a runtime error will be thrown. Using: SYCL_PI_CUDA_MAX_LOCAL_MEM_SZ=166912 ./a.out Allows the application to use up to 163KB of local memory, if the device supports it.
A pull request to solve the issue intel/llvm#6123 Co-authored-by: Jin Z <[email protected]> Co-authored-by: Nicolas Miller <[email protected]>
…annotations_member (#5884) * [SYCL][Doc] Adjust design for compile-time properties through add_ir_annotations_member During implementation of the attribute and translation of annotations on fields, the design was conflicting with existing features. This commit makes the following design changes: * Change the `llvm.ptr.annotation` intrinsic call produced by `[[__sycl_detail__::add_ir_annotations_member()]]` to use a pointer to a constant global variable instead of metadata. This is done to adhere to the signature of the intrinsic. * Change the representation consumed by the translator from a new SPIR-V builtin to an extended version of existing decoration parsing using `llvm.ptr.annotation`. Implementation of these are in review intel/llvm#5879 and KhronosGroup/SPIRV-LLVM-Translator#1446 Signed-off-by: Larsen, Steffen <[email protected]>
intel/llvm#6180 moved the retain logic for events to the make_event interop API. However, this means that the old OpenCL interop constructor for event does not retain the native event. These changes add the retain logic directly to the old OpenCL interop constructor for OpenCL. Signed-off-by: Larsen, Steffen <[email protected]>
* [SYCL][ESIMD][EMU] LSC support for ESIMD_EMULATOR backend 36 out of 40 LSC tests are passing
…ns (#6192) Commit removes mutex locks from PI release and retain functions. They were added under impression that 2 threads working simultaneously on releasing resources can both reach ref count 0 and cause double free. But since our ref count is atomic and decrement is an atomic operation only single thread will execute code under if (--(PiObj->RefCount) == 0). There can't be 2 threads which can reach ref count 0. Behavior is undefined if DPCPP runtime calls somehow uses deleted pi object (with ref count 0) or provides it as an argument to any of the pi functions.
This was
linked to
issues
Jul 5, 2022
could we get this merged ? |
Merged
keryell
approved these changes
Jul 6, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
I just realized it was pushed onto master. we need to update next |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#185 is intergrated into this PR because it is needed to run the tests