forked from intel/llvm
-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] 3D kernel update regression test #373
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This PR helps to avoid hardcoded path symbols and let OS dependent tools to work with paths. Original version produces `path/....\lib\xptifw.lib` which causes skip of library linkage on windows. `os.path.normpath` is the main change here that helps to handle path separators properly. Signed-off-by: Tikhomirova, Kseniya <[email protected]>
…el#14011) This PR migrates changes to `get_version`, `get_major_version` and `get_minor_version`, adds new free function equivalents which accept a `sycl::device` argument, and adds relevant tests and documentation. --------- Signed-off-by: Joe Todd <[email protected]>
…#13987) Bump UR L0 commit to oneapi-src/unified-runtime#1694 so that the SYCL device aspect for supporting update in graphs is correctly reported for L0 devices. Currently, support can be incorrectly reported. --------- Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
Running the `Graph/Update` E2E tests on Level Zero with `UR_L0_LEAKS_DEBUG=1` shows that we are leaking a PI kernel and module. On investigation this was because we are retaining these objects in `getOrCreateKernel()` but not releasing them. Added release calls similar to how it is done in [enqueueImpCommandBufferKernel](https://github.com/intel/llvm/blob/b49303c7e13ca0a69454eaaaeb8c3d094916218d/sycl/source/detail/scheduler/commands.cpp#L2550) by the scheduler
This upgrades the docker to use the cuda 12.5 image. I've ran the test-e2e locally using cuda 12.5 and all is well. cuda 12.5 also fixed an issue introduced by the cuda 12.4 driver: see intel#13661 (comment) Signed-off-by: JackAKirk <[email protected]>
…lock_load/slm_block_store APIs accepting simd_view (intel#13978) Co-authored-by: Nick Sarnie <[email protected]>
… build libdevice with thinLTO (intel#14036) This is the first change in my work on thinLTO for SYCL. --------- Signed-off-by: Sarnie, Nick <[email protected]>
Scheduled igc dev drivers uplift Co-authored-by: GitHub Actions <[email protected]>
See intel#14053 Signed-off-by: Sarnie, Nick <[email protected]>
…mbiguity (intel#14018) This change avoids the ambiguity between the deprecated `sycl::ext::oneapi::sub_group` and `sycl::sub_group` when both namespaces are used. This fixes a failure on windows for cuda. --------- Signed-off-by: JackAKirk <[email protected]>
…ate for local_accessors API accepting simd_view (intel#14024)
intel#14000) Co-authored-by: Nick Sarnie <[email protected]>
…el#14061) See intel#14060 --------- Signed-off-by: Sarnie, Nick <[email protected]>
Supported matrix dimensions are queried from the device, and inform the tests which tile sizes one can use. This is a subset of all tests that are planned to be modified. Test manually tested on PVC and SPR --------- Co-authored-by: Yury Plyakhin <[email protected]>
Supported matrix dimensions are queried from the device, and inform the tests which tile sizes one can use. This is a subset of all tests that are planned to be modified. Test manually tested on PVC and SPR - no new regresssions The following tests have been marked as XFAIL on all platforms. I removed them from XMX8 folder. Once they are passing then they can be modified to query the supported matrix dimensions form the device. * joint_matrix_colA_rowB_colC.cpp * joint_matrix_out_bounds.cpp * joint_matrix_unaligned_k.cpp
…ntel#14067) Co-authored-by: Ewan Crawford <[email protected]>
By some reason, we used to only emit unused member functions if they are explicitly annotated with `sycl_device` attribute (through `SYCL_EXTERNAL` macro). This logic was introduced in 3baec18 and there is no clear indication as to why exactly we have a check that the attribute is explicit. SYCL extension for virtual functions introduces an alternative markup for specifying which function and that markup is SYCL compile-time properties that we turn into attributes implicitly under the hood. Essentially, we now have a situation where an implicit `sycl_device` attribute on a member function should be treated as an explicit one, because it could be a result of SYCL compile-time property being applied to that method. Considering our current codebase, it seems like we intend to have member function to be emitted in all cases where `sycl_device` is being implicitly added and therefore this patch removes the requirement for the attribute to be explicit.
This PR adds functionalities for: * Listing devices in stdout * Filtering devices Tests and docs updated accordingly. --------- Signed-off-by: Alberto Cabrera <[email protected]> Co-authored-by: Joe Todd <[email protected]>
…intel#14015) This PR adds a `wait_and_free` func. This makes it safer and less likely to release memory during or before it is used by enqueued commands. `async_free` is renamed `enqueue_free`, to make its behaviour clearer This PR updates the comments and tests accordingly
…d to run on Windows (intel#13957) [Windows doesn't support cudaMemPrefetchAsync()](bitsandbytes-foundation/bitsandbytes#453) which is used in the call to `prefetch` in the test. [urEnqueueUSMPrefetch](https://github.com/oneapi-src/unified-runtime/blob/c0c607c3a88933b4c5c20a0aca4539781c678411/source/adapters/cuda/enqueue.cpp#L1629) is also commented with a note for not having the support for CUDA on Windows.
temp fix for problems from cuda 12.5 uplift that were caused by intel#14049. Should fix intel#14071 --------- Signed-off-by: JackAKirk <[email protected]>
…e exception message (intel#14055) - C++ thrown exception message not shown when running from Windows terminal. - The patch fixes [cuda-max-local-mem-size.cpp](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Plugin/cuda-max-local-mem-size.cpp) test failure.
… GPUs through OpenCL. (intel#14072) Extend the `sycl-ls-gpu-default.cpp` test to cover the support of Intel GPUs through OpenCL. The patch fixes the failure when running the test on a system with Intel and CUDA gpus.
Fixes ansi-alias violation and reads from uninitialized buffers. Fixes intel#13790.
…UDA (intel#14058) Fails in Nightly testing on the self-hosted CUDA runner: intel#12995.
…date ACC API accepting simd_view (intel#14065) Co-authored-by: Nick Sarnie <[email protected]>
…date USM API accepting simd_view (intel#14043)
…#14087) See intel#14086 Signed-off-by: Sarnie, Nick <[email protected]>
…ntel#14088) CODEOWNERS seems to be missing a line attributing `sycl/test/check_device_code/matrix` tests to intel/sycl-matrix-reviewers (As per [this discussion](intel#14063 (comment))). This PR remedies this. Although, I noticed the current CODEOWNERS section for the matrix reviewers uses paths; Let me know if I should use `sycl/**/matrix` instead.
The runner seems to be broken, don't run the tests until it's fixed.
This PR 1. Updates extension to add math builtins for `vec<bfloat16>` and corresponding swizzles. 2. Implements **unoptimized** support for `vec<bloat16>`/swizzles in math builtins and adds a test case for the same. 3. Adds a test to check the device code generated for `vec<bloat16>` math builtins. I will make a follow-up PR to optimize `vec<bfloat16>` math builtins. I think we can use elementwise builtins for `ext_vector_type` (https://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors) to optimize `vec<bfloat16>` math builtins. The device code test case will help visualizing/reviewing math builtin optimizations.
…l#13943) Changed CUDA sycl/test/check_device_code lit test cases to use SYCL_EXTERNAL functions instead of submitting kernels to the queue everytime.
Bindless Images should now properly work on Windows, with the exception of Vulkan interop, which requires extra work. Required a few fixes to non-conformant C++ code.
This was tracked down to a bug in ROCm that seems to be fixed with newer versions, and the CI is now on ROCm 6+ so these should be fine. ROCm ticket: ROCm/clr#13 The reduce over group test works on W6800 and MI210, but it seems for gfx1031 it reports not supporting shared USM, note that HIP for gfx1031 isn't officially supported by AMD. --------- Co-authored-by: Steffen Larsen <[email protected]>
Added SYCL Module Splitting as a library. ESIMD splitting is not present in this patch and will be added in an upcoming patch. Added particular testing tool sycl-module-split that invokes added functionality. Not all `device-code-split` tests were updated in this patch because the rest of them (mostly) don't test module splitting itself. They will be migrated in an upcoming patch.
This PR adds math `extend_v*2` operators _(18 in total)_ along with unit-tests for signed and unsigned `int32` cases. --------- Co-authored-by: Alberto Cabrera Pérez <[email protected]> Co-authored-by: Joe Todd <[email protected]>
…ntel#14017) intel#13512 implemented the sycl_ext_oneapi_enqueue_functions extension. Following this, the corresponding extension document is moved to experimental and the feature test macro is defined. --------- Signed-off-by: Larsen, Steffen <[email protected]>
…14081) We don't actually need cpp20, this was left over from a previous iteration of the test. Signed-off-by: Sarnie, Nick <[email protected]>
(fixed issues with post commit testing) --------- Co-authored-by: jason1.li <[email protected]>
…13974) After intel#13486, aspect name information is visible in `sycl-post-link` without the use of `!sycl_aspects`, so this PR updates `sycl-post-link` to use the aspect names that are now available within the `!sycl_used_aspects` metadata instead of `!sycl_aspects`. Aditionally, this PR also adds E2E related to optional kernel features for AOT enabled by these changes
Add a proposed extension specification for `work_group_memory`, a lighter weight API to allocate device local memory for an nd-range kernel. Also related, add a list of restrictions that, when followed, provide a guarantee that a kernel written in the free-function kernel syntax can be launched directly via Level Zero or OpenCL.
… in the SPEC (intel#13947) Follow-up of and blocked by: intel#13945 For `vec<std::byte>`, we only allow math operations that are valid on `std::byte` itself. (https://en.cppreference.com/w/cpp/types/byte)
…13975) Currently `std::byte*` scratch pointer is not aligned and `reinterpret_cast`ed as `T*` where type `T` may have alignment requirement different from `byte*`, this is UB. As a solution, use `std::align` to align the required buffer in the scratch and use placement `new` so that dynamic type of the buffer in the scratch will be `T*`.
This change is intended to fix a CI failure on the OpenCL backend for the `memory_management_test3`. This is because the memory is allocated on a different q to the one it is released on.
the address pointer should be the USM pointer
Add an E2E regression test for updating kernel nodes with 3 dimensions. Test contains a graph with two nodes, the first node with an NDRange containing a user specified local size, and the second node containing a Range with implementation determined local size.
EwanC
force-pushed
the
ewan/update_3d_regression
branch
from
June 10, 2024 08:07
3d8edb2
to
4e7f2dc
Compare
Closing in favor of upstream PR intel#14110 since this test is a confirmed reproducer |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add an E2E regression test for updating kernel nodes with 3 dimensions. Test contains a graph with two nodes, the first node with an NDRange containing a user specified local size, and the second node containing a Range with implementation determined local size.