Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Graph] Add specification for kernel binary update #378

Closed
wants to merge 33 commits into from

Commits on Jul 29, 2024

  1. [SYCL] Fix use of removed ArchType enum (intel#14833)

    Two concurrent PRs added a new use of and simultaneously removed this
    enum. Commit 63c61d8 added a new use, while dc37699 was trying to
    delete it.
    frasercrmck authored Jul 29, 2024
    Configuration menu
    Copy the full SHA
    e664798 View commit details
    Browse the repository at this point in the history

Commits on Jul 30, 2024

  1. [SYCL][COMPAT] New launch API to enable passing kernel & launch prope…

    …rties (intel#14441)
    
    This PR defines a new user-facing struct `launch_strategy`, and two new
    `launch` overloads (currently in `syclcompat::experimental`) which
    accept a `launch_strategy`.
    
    ## Extensions & Properties
    
    This work builds on top of the
    [kernel_properties](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_kernel_properties.asciidoc)
    and
    [enqueue_functions](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc)
    extensions. The latter defines APIs for passing `launch_properties` as
    part of a `launch_config` object. These are the `parallel_for` and
    `nd_launch` overloads used by the new `launch`.
    
    See the
    [note](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_oneapi_enqueue_functions.asciidoc#launch-configuration)
    in the Launch configuration section which describes how
    `kernel_properties` must be passed via a `get(properties_tag)` method of
    a kernel functor.
    
    ## Local Memory
    
    Note also that in order to properly handle local memory, we **must**
    construct the `KernelFunctor` object within the `cgh` lambda, passing in
    a `local_accessor` to the constructor. Then within
    `KernelFunctor::operator()` (the SYCL 'kernel') we can at last grab the
    local memory pointer with
    `local_acc.get_multi_ptr<sycl::access::decorated::no>()`, since
    CUDA-style device functions expect to receive their dynamic local memory
    as a `char *`.
    
    ---------
    
    Signed-off-by: Joe Todd <[email protected]>
    joeatodd authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    599fcd0 View commit details
    Browse the repository at this point in the history
  2. [SYCL] Mark ASAN tests that are failing the nightly as unsupported. (i…

    …ntel#14820)
    
    These are due to a known regression introduced by the PI removal patch,
    we have a fix but for now it's more expedient to simply disable the
    tests and unblock the nightly workflow.
    aarongreig authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    0f7b261 View commit details
    Browse the repository at this point in the history
  3. [SYCL][Bindless][Doc] Rename interop related structs/funcs to external (

    intel#14444)
    
    Rename related interop structs/funcs with "external" keyword over
    "interop" to align better with existing structs/funcs and other 3rd
    party APIs.
    
    Remove "handle" keyword from imported external memory/semaphore objects
    to distinguish between 3rd party external handles and imported external
    handles.
    
    ---------
    
    Co-authored-by: Sean Stirling <[email protected]>
    Co-authored-by: chedy.najjar <[email protected]>
    3 people authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    95604ae View commit details
    Browse the repository at this point in the history
  4. [SYCL][ESIMD] Move spirv global translation out of the function proce…

    …ssing to improve compilation time (intel#14786)
    fineg74 authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    ff35d2f View commit details
    Browse the repository at this point in the history
  5. [GHA] Uplift Linux GPU RT version to 24.26.30049.6 (intel#14838)

    Scheduled drivers uplift
    
    Co-authored-by: GitHub Actions <[email protected]>
    bb-sycl and actions-user authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    1bc6751 View commit details
    Browse the repository at this point in the history
  6. [SYCL] Rename detail::memcpy to detail::memcpy_no_adl (intel#14836)

    `detail::memcpy`, even though in a different namespace, can cause
    ambiguity with libc's `memcpy`, due to argument dependent lookup (ADL).
    For example, the compiler throws a compilation error due to `memcpy`
    ambiguity in the following code:
    ```
    #include <sycl/vector.hpp>
    
    template <typename T>
    void foo(T *dst, T *src, size_t count) {
              memcpy(dst, src, count * sizeof(T));
    }
    
    using T = sycl::vec<int, 1>;
    
    SYCL_EXTERNAL void bar(T *dst, T *src, size_t count) {
              foo(dst, src, count * sizeof(T));
    }
    ```
    
    Compilation error:
    ```
    memcpy_test.cpp:5:4: error: call to 'memcpy' is ambiguous
        5 |           memcpy(dst, src, count * sizeof(T));
          |           ^~~~~~
    memcpy_test.cpp:11:4: note: in instantiation of function template specialization 'foo<sycl::vec<int, 1>>' requested here
       11 |           foo(dst, src, count * sizeof(T));
          |           ^
    /usr/include/string.h:43:14: note: candidate function
       43 | extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
          |              ^
    llvm/build/bin/../include/sycl/detail/memcpy.hpp:16:13: note: candidate function
       16 | inline void memcpy(void *Dst, const void *Src, size_t Size) {
          |             ^
    1 error generated.
    ```
    
    To fix this error, this PR renames `detail::memcpy` to
    `detail::memcpy_no_adl`
    uditagarwal97 authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    a574ce6 View commit details
    Browse the repository at this point in the history
  7. [SYCL][ESIMD] Fix driver check in two tests (intel#14832)

    When using L0 we always see the x.y.zzzzz style version, even on
    Windows. These tests were incorrectly running on Windows because of this
    problem.
    
    Signed-off-by: Sarnie, Nick <[email protected]>
    sarnex authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    70268e6 View commit details
    Browse the repository at this point in the history
  8. [SYCL][E2E] Add -Werror flag to sycl e2e tests (intel#14689)

    This patch adds the `-Werror` flag to all SYCL e2e tests to stop the
    introduction of new warnings.
    
    Added `-Wno-error=` to existing tests that have warnings (Or made
    changes to resolve the warnings).
    ayylol authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    e0a222f View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    822d63e View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    f990a8a View commit details
    Browse the repository at this point in the history
  11. [SYCL][Doc] Mark spec for prefetch extension as supported (intel#14735)

    According to release notes the extension was implemented by
    
    intel@e7139b0,
    
    intel@0229456
    and
    
    intel@b5d69df
    
    ---------
    
    Co-authored-by: Greg Lueck <[email protected]>
    bader and gmlueck authored Jul 30, 2024
    Configuration menu
    Copy the full SHA
    c79c3df View commit details
    Browse the repository at this point in the history

Commits on Jul 31, 2024

  1. [NFC] Reflow files in buildbot and sycl-fusion dirs (intel#14792)

    Run `black` on python files in buildbot and fusion directories. Those
    files skipped the original formatting effort, so any change to them now
    would cause a formatting CI job to fail.
    jchlanda authored Jul 31, 2024
    Configuration menu
    Copy the full SHA
    300b1f8 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1354ff2 View commit details
    Browse the repository at this point in the history
  3. [NFC][Bindless] Add a test of DX12 interop without semaphore (intel#1…

    …4790)
    
    This allows testing of DX12 interop in L0 backend that doesn't support
    semaphore importing yet.
    Fix getDX12Adapter to increment index when software adapter is seen.
    wenju-he authored Jul 31, 2024
    Configuration menu
    Copy the full SHA
    20351f5 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    a66958b View commit details
    Browse the repository at this point in the history
  5. [DeviceSaniitizer] Force outline call for setting private shadow memo…

    …ry (intel#14818)
    
    By default, address sanitizer will inline call for setting private
    shadow memory with small size. However, if work group size is too large,
    the private shadow memory may allocate failed. We need to check if
    shadow base is null before trying to poison it.
    
    ---------
    
    Co-authored-by: Yang Zhao <[email protected]>
    zhaomaosu and AllanZyne authored Jul 31, 2024
    Configuration menu
    Copy the full SHA
    f203826 View commit details
    Browse the repository at this point in the history
  6. [SYCL][COMPAT] Disable memory_async.cpp tests (intel#14855)

    These are failing intermittently, possibly due to runtime race
    condition.
    joeatodd authored Jul 31, 2024
    Configuration menu
    Copy the full SHA
    7886c87 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    d0415e0 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    98beefd View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    2dfdf1d View commit details
    Browse the repository at this point in the history
  10. [SYCL][E2E] Fix SG32/joint_matrix_prefetch.cpp (intel#14870)

    Feedback from intel#14866
    
    Signed-off-by: Sarnie, Nick <[email protected]>
    sarnex authored Jul 31, 2024
    Configuration menu
    Copy the full SHA
    ff7cf16 View commit details
    Browse the repository at this point in the history
  11. [SYCL][XPTI] Fix off-by-one error in USMAnalyzer (intel#13936)

    When looking for the correct allocation, the upper bound check was
    inclusive (Ptr <= Alloc.first + Alloc.second.Length). If we have two
    allocations back-to-back, the pointer to the beginning of the second
    allocation would incorrectly be determined as belonging to the first
    allocation. This caused false-positives errors about out-of-bounds
    memory operations.
    al42and authored Jul 31, 2024
    Configuration menu
    Copy the full SHA
    e4620f6 View commit details
    Browse the repository at this point in the history

Commits on Aug 1, 2024

  1. [SYCL][Doc] Fix ptrdiff_t type namespace qualifier (intel#14888)

    ptrdiff_t is declared in std namespace.
    
    Define syclex namespace alias used in the usage examples.
    bader authored Aug 1, 2024
    Configuration menu
    Copy the full SHA
    6532637 View commit details
    Browse the repository at this point in the history
  2. [SYCL][CUDA] Implement root group barrier (intel#14828)

    This PR adds an algorithm for doing a GPU wide barrier in CUDA backend. 
    
    Rough outline of the algorithm:
    - Every `0th` thread from each workgroup performs `atomic.add(1)`
    - The same thread checks the atomic result with `ld.acquire` in a loop
    until it's equal to total amount of workgroups.
    - All threads call group-wide `barrier.sync`
    
    One caveat to this is that there is no initialization of the atomic
    start value. So if we call this barrier several times in a kernel, on
    the second iteration, the start value will already contain the result
    from previous barrier. That's why we actually spin the while loop while
    `current value % totalWgroups != 0`.
    konradkusiak97 authored Aug 1, 2024
    Configuration menu
    Copy the full SHA
    132f763 View commit details
    Browse the repository at this point in the history
  3. [SYCL][AMD] Propagate metadata in createURProgram (intel#14831)

    SYCL properties weren't converted when calling creatreURProgram, leading
    to issue in finalization during KernelFusion for AMD.
    
    Fixes intel#14841
    Naghasan authored Aug 1, 2024
    Configuration menu
    Copy the full SHA
    41d8977 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    914561a View commit details
    Browse the repository at this point in the history
  5. [sycl-post-link] Fix spec constant pattern match for DeviceSanitizer (i…

    …ntel#14740)
    
    Adjust spec constant pattern match for base alloca + offset case in
    device sanitizer.
    Address sanitizer merges static allocas into a large layout base alloca
    and original alloca is replaced with base + offset.
    wenju-he authored Aug 1, 2024
    Configuration menu
    Copy the full SHA
    623bf14 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    56b1410 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    8411a7b View commit details
    Browse the repository at this point in the history
  8. [SYCL] Fix XPTI/basic_event_collection_linux failure (intel#14857)

    The test was expecting 'kernel_name' metadata on an edge_create event
    which should not exist
    
    It was still sometimes matching anyway but appears to have been a fluke
    and not intended behavior.
    
    Fixes intel#14744
    callumfare authored Aug 1, 2024
    Configuration menu
    Copy the full SHA
    495645c View commit details
    Browse the repository at this point in the history
  9. [SYCL][Doc] Cluster Group Extension Document (intel#13594)

    Initial public working draft for thread block cluster support in SYCL,
    intended to get feedback.
    
    Contains the proposal for - 
    1. Launching a kernel with cluster group
    2. Accessing the various `ids` associated with the cluster_group from
    the kernel
    3. Cluster level barrier
    4. Accessing another workgroup's local memory
    
    ---------
    
    Co-authored-by: Greg Lueck <[email protected]>
    Co-authored-by: Gordon Brown <[email protected]>
    Co-authored-by: John Pennycook <[email protected]>
    Co-authored-by: Ruyman <[email protected]>
    5 people authored Aug 1, 2024
    Configuration menu
    Copy the full SHA
    20bcfea View commit details
    Browse the repository at this point in the history
  10. [SYCL][Graph] Add specification for kernel binary updates

    Adds the kernel binary update feature to the sycl graph specification.
    This introduces a new dynamic_command_group class which can be used
    to update the command-group function of a kernel nodes in graphs.
    fabiomestre committed Aug 1, 2024
    Configuration menu
    Copy the full SHA
    4637510 View commit details
    Browse the repository at this point in the history