diff --git a/sycl/doc/design/KernelFusionJIT.md b/sycl/doc/design/KernelFusionJIT.md index 224a1984d2902..8878e11ac6132 100644 --- a/sycl/doc/design/KernelFusionJIT.md +++ b/sycl/doc/design/KernelFusionJIT.md @@ -313,12 +313,44 @@ computing the private memory size. As range rounding only applies to basic kernels (parametrized by a `sycl::range`), local internalization is not affected by the range rounding transformation. +### Reductions + +Kernel fusion of reductions is partially supported. In order to preserve the +legality of the fused kernel, i.e., the fact that fused kernel must perform the +same work as the graph of kernels to be fused, only the fusion of following +reduction strategies at the time of writing is supported: + +- `group_reduce_and_last_wg_detection` +- `local_atomic_and_atomic_cross_wg` +- `range_basic` +- `group_reduce_and_atomic_cross_wg` +- `local_mem_tree_and_atomic_cross_wg` + +Other strategies require implicit inter-work-group synchronization, not +supported in kernel fusion. + +This way, users should not use `sycl::reduction` directly when performing kernel +fusion in their code, as an unsupported algorithm might be chosen. They should +instead use `sycl::detail::reduction_parallel_for`, forcing a supported fusion +strategy. Reductions implementation in +[`sycl/reduction.hpp`](../../include/sycl/reduction.hpp) might give users an +insight into which kind of reductions to use for their purposes: + +```c++ +q.submit([&](sycl::handler &cgh) { + sycl::accessor in(dataBuf, cgh, sycl::read_only); + sycl::reduction sum(sumBuf, cgh, sycl::plus<>{}); + // Force supported 'group_reduce_and_last_wg_detection' strategy + sycl::detail::reduction_parallel_for(...); +}); +``` + ### Unsupported SYCL constructs The following SYCL API constructs are currently not officially supported for kernel fusion and should be considered untested/unsupported: -- Reductions - `sycl::stream` - Specialization constants and `sycl::kernel_handler` - Images (`sycl::unsampled_image` and `sycl::sampled_image`)