Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider the use of forwarding references throughout all SYCL backend kernel submitters #2041

Open
mmichel11 opened this issue Feb 3, 2025 · 0 comments
Labels

Comments

@mmichel11
Copy link
Contributor

Describe the Bug:
Almost all of oneDPL's SYCL backend kernel submitters rely on forwarding references of the execution policy and any ranges used within the kernel. Here is an example from our reduce submitter:

template <typename _ExecutionPolicy, typename _Size, typename _ReduceOp, typename _TransformOp, typename... _Ranges>
auto
operator()(oneapi::dpl::__internal::__device_backend_tag, _ExecutionPolicy&& __exec, const _Size __n,
    const _Size __work_group_size, const _Size __iters_per_work_item, _ReduceOp __reduce_op,
    __transform_op,
    const __result_and_scratch_storage<_ExecutionPolicy, _Tp>& __scratch_container,
    _Ranges&&... __rngs) ...

With these forwarding references, different cv / ref qualifiers on the same execution policy will lead to separate function template instantiations by the compiler. Similarly, the same can be said for the ranges.

With these separate submitter function template instantiations, a new kernel is compiled per instantiation. However, _ExecutionPolicy&& __exec is only used for kernel submission, so logically no new kernel is needed. Similarly, for ranges, these will be passed in as lightweight views, and we may be able to just accept these by-value.

If the user is compiling with unnamed lambda naming and the submitter is using our internal "kernel name provider", then I think there is no risk of a compilation error here. However, we may compile more kernels than what is logically necessary depending on how the user passes policies / ranges. This will lead to long JIT / AOT compile times.

If the user is trying to name kernels themselves or the underlying submitter is using our "kernel compiler" internal API for kernel bundles, then we may see compilation errors regarding duplicate kernel names.

To Reproduce:
The following results in compilation error using no unnamed lambdas (icpx 2025.0.0 and oneDPL 2022.7.0) despite the same underlying policy being used:

// icpx -fsycl -fno-sycl-unnamed-lambda reduce.cpp
// We will see a compilation error.
#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>

#include <iostream>

int main()
{
    sycl::queue q;
    int n = 10;
    int *ptr = sycl::malloc_shared<int>(n, q);
    q.fill(ptr, 1, n).wait();

    oneapi::dpl::execution::device_policy<class kernel> policy{q};

    auto res1 = oneapi::dpl::reduce(policy, ptr, ptr + n);
    auto res2 = oneapi::dpl::reduce(std::move(policy), ptr, ptr + n);

    std::cout << res1 << " " << res2 << std::endl;
}

results in the following error:

In file included from reduce.cpp:1:
In file included from /opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/execution:67:
In file included from /opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/algorithm_impl.h:26:
In file included from /opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/execution_impl.h:22:
In file included from /opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/parallel_backend.h:32:
In file included from /opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl.h:38:
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:149:17: error: definition with same mangled name '_ZTSN6oneapi3dpl20__par_backend_hetero21__reduce_small_kernelIJZ4mainE6kernelEEE' as another definition
  149 |                 [=](sycl::nd_item<1> __item_id) {
      |                 ^
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:149:17: note: previous definition is here
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:214:17: error: definition with same mangled name '_ZTSN6oneapi3dpl20__par_backend_hetero26__reduce_mid_device_kernelIJZ4mainE6kernelEEE' as another definition
  214 |                 [=](sycl::nd_item<1> __item_id) {
      |                 ^
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:214:17: note: previous definition is here
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:262:17: error: definition with same mangled name '_ZTSN6oneapi3dpl20__par_backend_hetero30__reduce_mid_work_group_kernelIJZ4mainE6kernelEEE' as another definition
  262 |                 [=](sycl::nd_item<1> __item_id) {
      |                 ^
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:262:17: note: previous definition is here
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:377:21: error: definition with same mangled name '_ZTSN6oneapi3dpl20__par_backend_hetero15__reduce_kernelIJZ4mainE6kernelEEE' as another definition
  377 |                     [=](sycl::nd_item<1> __item_id) {
      |                     ^
/opt/intel/oneapi/dpl/2022.7/include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce.h:377:21: note: previous definition is here
4 errors generated.

Similarly for the above case with unnamed lambdas, we compile more kernels than is necessary. Here are the kernel names produced from a shader dump using "lazy compilation mode" (-fsycl-device-code-split=per_kernel) which shows that two reduction kernels are compiled when only one is needed:

Kernel 1:

.kernel "_ZTSZZNK6oneapi3dpl20__par_backend_hetero43__parallel_transform_reduce_small_submitterIiSt17integral_constantIbLb1EELh4ENS1_10__internal22__optional_kernel_nameIJEEEEclINS0_9execution5__dpl13device_policyINSB_17DefaultKernelNameEEEtSt4plusIiENS0_13unseq_backend6walk_nISE_NS0_10__internal7__no_opEEENSH_12__init_valueIiEEJNS0_8__ranges10guard_viewIPiEEEEEDaNSJ_20__device_backend_tagEOT_T0_SV_SV_T1_T2_T3_DpOT4_ENKUlRN4sycl3_V17handlerEE_clES15_EUlNS13_7nd_itemILi1EEEE_"

Kernel 2:

.kernel "_ZTSZZNK6oneapi3dpl20__par_backend_hetero43__parallel_transform_reduce_small_submitterIiSt17integral_constantIbLb1EELh4ENS1_10__internal22__optional_kernel_nameIJEEEEclIRNS0_9execution5__dpl13device_policyINSB_17DefaultKernelNameEEEtSt4plusIiENS0_13unseq_backend6walk_nISF_NS0_10__internal7__no_opEEENSI_12__init_valueIiEEJNS0_8__ranges10guard_viewIPiEEEEEDaNSK_20__device_backend_tagEOT_T0_SW_SW_T1_T2_T3_DpOT4_ENKUlRN4sycl3_V17handlerEE_clES16_EUlNS14_7nd_itemILi1EEEE_"
@mmichel11 mmichel11 added the bug label Feb 3, 2025
mmichel11 added a commit that referenced this issue Feb 3, 2025
By adding `_ExecutionPolicy` into the kernel name, we can work around the duplicate kernel name issue in reduce-then-scan based algorithms. However, a library wide solution is still needed for #2041

---------

Signed-off-by: Matthew Michel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant