-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Graph] Add specification for kernel binary update #378
Commits on Jul 29, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e664798 - Browse repository at this point
Copy the full SHA e664798View commit details
Commits on Jul 30, 2024
-
[SYCL][COMPAT] New launch API to enable passing kernel & launch prope…
…rties (intel#14441) This PR defines a new user-facing struct `launch_strategy`, and two new `launch` overloads (currently in `syclcompat::experimental`) which accept a `launch_strategy`. ## Extensions & Properties This work builds on top of the [kernel_properties](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_properties.asciidoc) and [enqueue_functions](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc) extensions. The latter defines APIs for passing `launch_properties` as part of a `launch_config` object. These are the `parallel_for` and `nd_launch` overloads used by the new `launch`. See the [note](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc#launch-configuration) in the Launch configuration section which describes how `kernel_properties` must be passed via a `get(properties_tag)` method of a kernel functor. ## Local Memory Note also that in order to properly handle local memory, we **must** construct the `KernelFunctor` object within the `cgh` lambda, passing in a `local_accessor` to the constructor. Then within `KernelFunctor::operator()` (the SYCL 'kernel') we can at last grab the local memory pointer with `local_acc.get_multi_ptr<sycl::access::decorated::no>()`, since CUDA-style device functions expect to receive their dynamic local memory as a `char *`. --------- Signed-off-by: Joe Todd <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 599fcd0 - Browse repository at this point
Copy the full SHA 599fcd0View commit details -
[SYCL] Mark ASAN tests that are failing the nightly as unsupported. (i…
…ntel#14820) These are due to a known regression introduced by the PI removal patch, we have a fix but for now it's more expedient to simply disable the tests and unblock the nightly workflow.
Configuration menu - View commit details
-
Copy full SHA for 0f7b261 - Browse repository at this point
Copy the full SHA 0f7b261View commit details -
[SYCL][Bindless][Doc] Rename interop related structs/funcs to external (
intel#14444) Rename related interop structs/funcs with "external" keyword over "interop" to align better with existing structs/funcs and other 3rd party APIs. Remove "handle" keyword from imported external memory/semaphore objects to distinguish between 3rd party external handles and imported external handles. --------- Co-authored-by: Sean Stirling <[email protected]> Co-authored-by: chedy.najjar <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 95604ae - Browse repository at this point
Copy the full SHA 95604aeView commit details -
[SYCL][ESIMD] Move spirv global translation out of the function proce…
…ssing to improve compilation time (intel#14786)
Configuration menu - View commit details
-
Copy full SHA for ff35d2f - Browse repository at this point
Copy the full SHA ff35d2fView commit details -
[GHA] Uplift Linux GPU RT version to 24.26.30049.6 (intel#14838)
Scheduled drivers uplift Co-authored-by: GitHub Actions <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1bc6751 - Browse repository at this point
Copy the full SHA 1bc6751View commit details -
[SYCL] Rename
detail::memcpy
todetail::memcpy_no_adl
(intel#14836)`detail::memcpy`, even though in a different namespace, can cause ambiguity with libc's `memcpy`, due to argument dependent lookup (ADL). For example, the compiler throws a compilation error due to `memcpy` ambiguity in the following code: ``` #include <sycl/vector.hpp> template <typename T> void foo(T *dst, T *src, size_t count) { memcpy(dst, src, count * sizeof(T)); } using T = sycl::vec<int, 1>; SYCL_EXTERNAL void bar(T *dst, T *src, size_t count) { foo(dst, src, count * sizeof(T)); } ``` Compilation error: ``` memcpy_test.cpp:5:4: error: call to 'memcpy' is ambiguous 5 | memcpy(dst, src, count * sizeof(T)); | ^~~~~~ memcpy_test.cpp:11:4: note: in instantiation of function template specialization 'foo<sycl::vec<int, 1>>' requested here 11 | foo(dst, src, count * sizeof(T)); | ^ /usr/include/string.h:43:14: note: candidate function 43 | extern void *memcpy (void *__restrict __dest, const void *__restrict __src, | ^ llvm/build/bin/../include/sycl/detail/memcpy.hpp:16:13: note: candidate function 16 | inline void memcpy(void *Dst, const void *Src, size_t Size) { | ^ 1 error generated. ``` To fix this error, this PR renames `detail::memcpy` to `detail::memcpy_no_adl`
Configuration menu - View commit details
-
Copy full SHA for a574ce6 - Browse repository at this point
Copy the full SHA a574ce6View commit details -
[SYCL][ESIMD] Fix driver check in two tests (intel#14832)
When using L0 we always see the x.y.zzzzz style version, even on Windows. These tests were incorrectly running on Windows because of this problem. Signed-off-by: Sarnie, Nick <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 70268e6 - Browse repository at this point
Copy the full SHA 70268e6View commit details -
[SYCL][E2E] Add
-Werror
flag to sycl e2e tests (intel#14689)This patch adds the `-Werror` flag to all SYCL e2e tests to stop the introduction of new warnings. Added `-Wno-error=` to existing tests that have warnings (Or made changes to resolve the warnings).
Configuration menu - View commit details
-
Copy full SHA for e0a222f - Browse repository at this point
Copy the full SHA e0a222fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 822d63e - Browse repository at this point
Copy the full SHA 822d63eView commit details -
[SYCL] Fix XPASS of Matrix test on new GPU driver (intel#14849)
It's passing in the new driver uplift: https://github.com/intel/llvm/actions/runs/10164787306/job/28112343861 Signed-off-by: Sarnie, Nick <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f990a8a - Browse repository at this point
Copy the full SHA f990a8aView commit details -
[SYCL][Doc] Mark spec for prefetch extension as supported (intel#14735)
According to release notes the extension was implemented by intel@e7139b0, intel@0229456 and intel@b5d69df --------- Co-authored-by: Greg Lueck <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for c79c3df - Browse repository at this point
Copy the full SHA c79c3dfView commit details
Commits on Jul 31, 2024
-
[NFC] Reflow files in buildbot and sycl-fusion dirs (intel#14792)
Run `black` on python files in buildbot and fusion directories. Those files skipped the original formatting effort, so any change to them now would cause a formatting CI job to fail.
Configuration menu - View commit details
-
Copy full SHA for 300b1f8 - Browse repository at this point
Copy the full SHA 300b1f8View commit details -
[SYCL][E2E] Fix bindless images tests to run on Level Zero devices (i…
…ntel#14779) Signed-off-by: Neil R. Spruit <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1354ff2 - Browse repository at this point
Copy the full SHA 1354ff2View commit details -
[NFC][Bindless] Add a test of DX12 interop without semaphore (intel#1…
…4790) This allows testing of DX12 interop in L0 backend that doesn't support semaphore importing yet. Fix getDX12Adapter to increment index when software adapter is seen.
Configuration menu - View commit details
-
Copy full SHA for 20351f5 - Browse repository at this point
Copy the full SHA 20351f5View commit details -
Configuration menu - View commit details
-
Copy full SHA for a66958b - Browse repository at this point
Copy the full SHA a66958bView commit details -
[DeviceSaniitizer] Force outline call for setting private shadow memo…
…ry (intel#14818) By default, address sanitizer will inline call for setting private shadow memory with small size. However, if work group size is too large, the private shadow memory may allocate failed. We need to check if shadow base is null before trying to poison it. --------- Co-authored-by: Yang Zhao <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f203826 - Browse repository at this point
Copy the full SHA f203826View commit details -
[SYCL][COMPAT] Disable memory_async.cpp tests (intel#14855)
These are failing intermittently, possibly due to runtime race condition.
Configuration menu - View commit details
-
Copy full SHA for 7886c87 - Browse repository at this point
Copy the full SHA 7886c87View commit details -
Configuration menu - View commit details
-
Copy full SHA for d0415e0 - Browse repository at this point
Copy the full SHA d0415e0View commit details -
[SYCL][E2E] Fix another matrix XPASS (intel#14866)
Passing on Arc in postcommit https://github.com/intel/llvm/actions/runs/10180580875/job/28159431987 Signed-off-by: Sarnie, Nick <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 98beefd - Browse repository at this point
Copy the full SHA 98beefdView commit details -
[SYCL][ESIMD][E2E] Disable two tests hanging on Windows (intel#14869)
intel#14868 Signed-off-by: Sarnie, Nick <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2dfdf1d - Browse repository at this point
Copy the full SHA 2dfdf1dView commit details -
[SYCL][E2E] Fix SG32/joint_matrix_prefetch.cpp (intel#14870)
Feedback from intel#14866 Signed-off-by: Sarnie, Nick <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ff7cf16 - Browse repository at this point
Copy the full SHA ff7cf16View commit details -
[SYCL][XPTI] Fix off-by-one error in USMAnalyzer (intel#13936)
When looking for the correct allocation, the upper bound check was inclusive (Ptr <= Alloc.first + Alloc.second.Length). If we have two allocations back-to-back, the pointer to the beginning of the second allocation would incorrectly be determined as belonging to the first allocation. This caused false-positives errors about out-of-bounds memory operations.
Configuration menu - View commit details
-
Copy full SHA for e4620f6 - Browse repository at this point
Copy the full SHA e4620f6View commit details
Commits on Aug 1, 2024
-
[SYCL][Doc] Fix ptrdiff_t type namespace qualifier (intel#14888)
ptrdiff_t is declared in std namespace. Define syclex namespace alias used in the usage examples.
Configuration menu - View commit details
-
Copy full SHA for 6532637 - Browse repository at this point
Copy the full SHA 6532637View commit details -
[SYCL][CUDA] Implement root group barrier (intel#14828)
This PR adds an algorithm for doing a GPU wide barrier in CUDA backend. Rough outline of the algorithm: - Every `0th` thread from each workgroup performs `atomic.add(1)` - The same thread checks the atomic result with `ld.acquire` in a loop until it's equal to total amount of workgroups. - All threads call group-wide `barrier.sync` One caveat to this is that there is no initialization of the atomic start value. So if we call this barrier several times in a kernel, on the second iteration, the start value will already contain the result from previous barrier. That's why we actually spin the while loop while `current value % totalWgroups != 0`.
Configuration menu - View commit details
-
Copy full SHA for 132f763 - Browse repository at this point
Copy the full SHA 132f763View commit details -
[SYCL][AMD] Propagate metadata in createURProgram (intel#14831)
SYCL properties weren't converted when calling creatreURProgram, leading to issue in finalization during KernelFusion for AMD. Fixes intel#14841
Configuration menu - View commit details
-
Copy full SHA for 41d8977 - Browse repository at this point
Copy the full SHA 41d8977View commit details -
Configuration menu - View commit details
-
Copy full SHA for 914561a - Browse repository at this point
Copy the full SHA 914561aView commit details -
[sycl-post-link] Fix spec constant pattern match for DeviceSanitizer (i…
…ntel#14740) Adjust spec constant pattern match for base alloca + offset case in device sanitizer. Address sanitizer merges static allocas into a large layout base alloca and original alloca is replaced with base + offset.
Configuration menu - View commit details
-
Copy full SHA for 623bf14 - Browse repository at this point
Copy the full SHA 623bf14View commit details -
[UR][DeviceSantizer] Enable Symoblizer for UR santizer layer (intel#1…
…4513) UR Part: oneapi-src/unified-runtime#1844
Configuration menu - View commit details
-
Copy full SHA for 56b1410 - Browse repository at this point
Copy the full SHA 56b1410View commit details -
Bump UR tag to include PrintTrace fix (intel#14728)
Tests oneapi-src/unified-runtime#1884 Fixes intel#14704 Fixes intel#14721
Configuration menu - View commit details
-
Copy full SHA for 8411a7b - Browse repository at this point
Copy the full SHA 8411a7bView commit details -
[SYCL] Fix XPTI/basic_event_collection_linux failure (intel#14857)
The test was expecting 'kernel_name' metadata on an edge_create event which should not exist It was still sometimes matching anyway but appears to have been a fluke and not intended behavior. Fixes intel#14744
Configuration menu - View commit details
-
Copy full SHA for 495645c - Browse repository at this point
Copy the full SHA 495645cView commit details -
[SYCL][Doc] Cluster Group Extension Document (intel#13594)
Initial public working draft for thread block cluster support in SYCL, intended to get feedback. Contains the proposal for - 1. Launching a kernel with cluster group 2. Accessing the various `ids` associated with the cluster_group from the kernel 3. Cluster level barrier 4. Accessing another workgroup's local memory --------- Co-authored-by: Greg Lueck <[email protected]> Co-authored-by: Gordon Brown <[email protected]> Co-authored-by: John Pennycook <[email protected]> Co-authored-by: Ruyman <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 20bcfea - Browse repository at this point
Copy the full SHA 20bcfeaView commit details -
[SYCL][Graph] Add specification for kernel binary updates
Adds the kernel binary update feature to the sycl graph specification. This introduces a new dynamic_command_group class which can be used to update the command-group function of a kernel nodes in graphs.
Configuration menu - View commit details
-
Copy full SHA for 4637510 - Browse repository at this point
Copy the full SHA 4637510View commit details