Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] 3D kernel update regression test #373

Closed
wants to merge 55 commits into from

Conversation

EwanC
Copy link
Collaborator

@EwanC EwanC commented Jun 3, 2024

Add an E2E regression test for updating kernel nodes with 3 dimensions. Test contains a graph with two nodes, the first node with an NDRange containing a user specified local size, and the second node containing a Range with implementation determined local size.

KseniyaTikhomirova and others added 30 commits June 4, 2024 09:15
This PR helps to avoid hardcoded path symbols and let OS dependent tools
to work with paths.
Original version produces `path/....\lib\xptifw.lib` which causes skip
of library linkage on windows.
`os.path.normpath` is the main change here that helps to handle path
separators properly.

Signed-off-by: Tikhomirova, Kseniya <[email protected]>
…el#14011)

This PR migrates changes to `get_version`, `get_major_version` and
`get_minor_version`, adds new free function equivalents which accept a
`sycl::device` argument, and adds relevant tests and documentation.

---------

Signed-off-by: Joe Todd <[email protected]>
…#13987)

Bump UR L0 commit to
oneapi-src/unified-runtime#1694 so that the SYCL
device aspect for supporting update in graphs is correctly reported for
L0 devices. Currently, support can be incorrectly reported.

---------

Co-authored-by: Kenneth Benzie (Benie) <[email protected]>
Running the `Graph/Update` E2E tests on Level Zero with
`UR_L0_LEAKS_DEBUG=1` shows that we are leaking a PI kernel and module.

On investigation this was because we are retaining these objects in
`getOrCreateKernel()` but not releasing them. Added release calls
similar to how it is done in
[enqueueImpCommandBufferKernel](https://github.com/intel/llvm/blob/b49303c7e13ca0a69454eaaaeb8c3d094916218d/sycl/source/detail/scheduler/commands.cpp#L2550)
by the scheduler
This upgrades the docker to use the cuda 12.5 image.

I've ran the test-e2e locally using cuda 12.5 and all is well. cuda 12.5
also fixed an issue introduced by the cuda 12.4 driver: see
intel#13661 (comment)

Signed-off-by: JackAKirk <[email protected]>
…lock_load/slm_block_store APIs accepting simd_view (intel#13978)

Co-authored-by: Nick Sarnie <[email protected]>
… build libdevice with thinLTO (intel#14036)

This is the first change in my work on thinLTO for SYCL.

---------

Signed-off-by: Sarnie, Nick <[email protected]>
Scheduled igc dev drivers uplift

Co-authored-by: GitHub Actions <[email protected]>
…mbiguity (intel#14018)

This change avoids the ambiguity between the deprecated
`sycl::ext::oneapi::sub_group` and `sycl::sub_group` when both
namespaces are used. This fixes a failure on windows for cuda.

---------

Signed-off-by: JackAKirk <[email protected]>
Supported matrix dimensions are queried from the device, and inform the
tests which tile sizes one can use.

This is a subset of all tests that are planned to be modified.

Test manually tested on PVC and SPR

---------

Co-authored-by: Yury Plyakhin <[email protected]>
Supported matrix dimensions are queried from the device, and inform the
tests which tile sizes one can use.

This is a subset of all tests that are planned to be modified.

Test manually tested on PVC and SPR - no new regresssions

The following tests have been marked as XFAIL on all platforms. I
removed them from XMX8 folder. Once they are passing then they can be
modified to query the supported matrix dimensions form the device.
* joint_matrix_colA_rowB_colC.cpp
* joint_matrix_out_bounds.cpp
* joint_matrix_unaligned_k.cpp
By some reason, we used to only emit unused member functions if they
are explicitly annotated with `sycl_device` attribute (through
`SYCL_EXTERNAL` macro).

This logic was introduced in 3baec18
and there is no clear indication as to why exactly we have a check that
the attribute is explicit.

SYCL extension for virtual functions introduces an alternative markup
for specifying which function and that markup is SYCL compile-time
properties that we turn into attributes implicitly under the hood.

Essentially, we now have a situation where an implicit `sycl_device`
attribute on a member function should be treated as an explicit one,
because it could be a result of SYCL compile-time property being applied
to that method.

Considering our current codebase, it seems like we intend to
have member function to be emitted in all cases where 
`sycl_device` is being implicitly added and therefore this patch removes
the requirement for the attribute to be explicit.
This PR adds functionalities for:
* Listing devices in stdout
* Filtering devices

Tests and docs updated accordingly.

---------

Signed-off-by: Alberto Cabrera <[email protected]>
Co-authored-by: Joe Todd <[email protected]>
…intel#14015)

This PR adds a `wait_and_free` func. This makes it safer and less likely
to release memory during or before it is used by enqueued commands.

`async_free` is renamed `enqueue_free`, to make its behaviour clearer

This PR updates the comments and tests accordingly
…d to run on Windows (intel#13957)

[Windows doesn't support
cudaMemPrefetchAsync()](bitsandbytes-foundation/bitsandbytes#453)
which is used in the call to `prefetch` in the test.

[urEnqueueUSMPrefetch](https://github.com/oneapi-src/unified-runtime/blob/c0c607c3a88933b4c5c20a0aca4539781c678411/source/adapters/cuda/enqueue.cpp#L1629)
is also commented with a note for not having the support for CUDA on
Windows.
temp fix for problems from cuda 12.5 uplift that were caused by
intel#14049. Should fix
intel#14071

---------

Signed-off-by: JackAKirk <[email protected]>
…e exception message (intel#14055)

- C++ thrown exception message not shown when running from Windows
terminal.
- The patch fixes
[cuda-max-local-mem-size.cpp](https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Plugin/cuda-max-local-mem-size.cpp)
test failure.
… GPUs through OpenCL. (intel#14072)

Extend the `sycl-ls-gpu-default.cpp` test to cover the support of Intel
GPUs through OpenCL.
The patch fixes the failure when running the test on a system with Intel
and CUDA gpus.
Fixes ansi-alias violation and reads from uninitialized buffers. Fixes
intel#13790.
…UDA (intel#14058)

Fails in Nightly testing on the self-hosted CUDA runner:
intel#12995.
fineg74 and others added 25 commits June 6, 2024 20:38
…date ACC API accepting simd_view (intel#14065)

Co-authored-by: Nick Sarnie <[email protected]>
…ntel#14088)

CODEOWNERS seems to be missing a line attributing
`sycl/test/check_device_code/matrix` tests to
intel/sycl-matrix-reviewers (As per [this
discussion](intel#14063 (comment))).
This PR remedies this.

Although, I noticed the current CODEOWNERS section for the matrix
reviewers uses paths; Let me know if I should use `sycl/**/matrix`
instead.
The runner seems to be broken, don't run the tests until it's fixed.
This PR
1. Updates extension to add math builtins for `vec<bfloat16>` and
corresponding swizzles.
2. Implements **unoptimized** support for `vec<bloat16>`/swizzles in
math builtins and adds a test case for the same.
3. Adds a test to check the device code generated for `vec<bloat16>`
math builtins.

I will make a follow-up PR to optimize `vec<bfloat16>` math builtins. I
think we can use elementwise builtins for `ext_vector_type`
(https://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors)
to optimize `vec<bfloat16>` math builtins. The device code test case
will help visualizing/reviewing math builtin optimizations.
…l#13943)

Changed CUDA sycl/test/check_device_code lit test cases to use
SYCL_EXTERNAL functions instead of submitting kernels to the queue
everytime.
Bindless Images should now properly work on Windows, with the exception
of Vulkan interop, which requires extra work. Required a few fixes to
non-conformant C++ code.
This was tracked down to a bug in ROCm that seems to be fixed with newer
versions, and the CI is now on ROCm 6+ so these should be fine.

ROCm ticket: ROCm/clr#13

The reduce over group test works on W6800 and MI210, but it seems for
gfx1031 it reports not supporting shared USM, note that HIP for gfx1031
isn't officially supported by AMD.

---------

Co-authored-by: Steffen Larsen <[email protected]>
Added SYCL Module Splitting as a library. ESIMD splitting is not present
in this patch and will be added in an upcoming patch.
Added particular testing tool sycl-module-split that invokes added
functionality.

Not all `device-code-split` tests were updated in this patch because the rest
of them (mostly) don't test module splitting itself. They will be migrated in an
upcoming patch.
This PR adds math `extend_v*2` operators _(18 in total)_ along with
unit-tests for signed and unsigned `int32` cases.

---------

Co-authored-by: Alberto Cabrera Pérez <[email protected]>
Co-authored-by: Joe Todd <[email protected]>
…ntel#14017)

intel#13512 implemented the
sycl_ext_oneapi_enqueue_functions extension. Following this, the
corresponding extension document is moved to experimental and the
feature test macro is defined.

---------

Signed-off-by: Larsen, Steffen <[email protected]>
…14081)

We don't actually need cpp20, this was left over from a previous
iteration of the test.

Signed-off-by: Sarnie, Nick <[email protected]>
(fixed issues with post commit testing)

---------

Co-authored-by: jason1.li <[email protected]>
…13974)

After intel#13486, aspect name information is visible in `sycl-post-link`
without the use of `!sycl_aspects`, so this PR updates `sycl-post-link`
to use the aspect names that are now available within the
`!sycl_used_aspects` metadata instead of `!sycl_aspects`.

Aditionally, this PR also adds E2E related to optional kernel features
for AOT enabled by these changes
Add a proposed extension specification for `work_group_memory`, a
lighter weight API to allocate device local memory for an nd-range
kernel.

Also related, add a list of restrictions that, when followed, provide a
guarantee that a kernel written in the free-function kernel syntax can
be launched directly via Level Zero or OpenCL.
… in the SPEC (intel#13947)

Follow-up of and blocked by: intel#13945

For `vec<std::byte>`, we only allow math operations that are valid on
`std::byte` itself. (https://en.cppreference.com/w/cpp/types/byte)
…13975)

Currently `std::byte*` scratch pointer is not aligned and
`reinterpret_cast`ed as `T*` where type `T` may have alignment
requirement different from `byte*`, this is UB.

As a solution, use `std::align` to align the required buffer in the
scratch and use placement `new` so that dynamic type of the buffer in
the scratch will be `T*`.
This change is intended to fix a CI failure on the OpenCL backend for
the `memory_management_test3`. This is because the memory is allocated
on a different q to the one it is released on.
the address pointer should be the USM pointer
Add an E2E regression test for updating kernel nodes
with 3 dimensions. Test contains a graph with two nodes,
the first node with an NDRange containing a user specified
local size, and the second node containing a Range with
implementation determined local size.
@EwanC EwanC force-pushed the ewan/update_3d_regression branch from 3d8edb2 to 4e7f2dc Compare June 10, 2024 08:07
@EwanC
Copy link
Collaborator Author

EwanC commented Jun 10, 2024

Closing in favor of upstream PR intel#14110 since this test is a confirmed reproducer

@EwanC EwanC closed this Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.