Skip to content

Commit

Permalink
[SYCL][Doc] Update docs to reflect PI removal. (#15057)
Browse files Browse the repository at this point in the history
Fixes #14928
  • Loading branch information
aarongreig authored Nov 26, 2024
1 parent b2634a1 commit 3cc67ce
Show file tree
Hide file tree
Showing 120 changed files with 1,557 additions and 2,399 deletions.
6 changes: 3 additions & 3 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,11 @@ sycl/include/sycl/detail/ur.hpp @intel/unified-runtime-reviewers
sycl/source/detail/posix_ur.cpp @intel/unified-runtime-reviewers
sycl/source/detail/ur.cpp @intel/unified-runtime-reviewers
sycl/source/detail/windows_ur.cpp @intel/unified-runtime-reviewers
sycl/test-e2e/Plugin/ @intel/unified-runtime-reviewers
sycl/test-e2e/Adapters/ @intel/unified-runtime-reviewers

# Win Proxy Loader
sycl/pi_win_proxy_loader @intel/llvm-reviewers-runtime
sycl/test-e2e/Plugin/dll-detach-order.cpp @intel/llvm-reviewers-runtime
sycl/ur_win_proxy_loader @intel/llvm-reviewers-runtime
sycl/test-e2e/Adapters/dll-detach-order.cpp @intel/llvm-reviewers-runtime

# CUDA specific runtime implementations
sycl/include/sycl/ext/oneapi/experimental/cuda/ @intel/llvm-reviewers-cuda
Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ To contribute:
- [The seven rules of a great Git commit message](https://cbea.ms/git-commit)
are recommended read and follow.
- To a reasonable extent, title tags can be used to signify the component
changed, e.g.: `[PI]`, `[CUDA]`, `[Doc]`.
changed, e.g.: `[UR]`, `[CUDA]`, `[Doc]`.
- Create a pull request (PR) for your changes following
[Creating a pull request instructions](https://help.github.com/articles/creating-a-pull-request/).
- Make sure PR has a good description explaining all of the changes made,
Expand Down
10 changes: 6 additions & 4 deletions buildbot/configure.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def do_configure(args):
if sys.platform != "darwin":
sycl_enabled_backends.append("level_zero")

# lld is needed on Windows or for the HIP plugin on AMD
# lld is needed on Windows or for the HIP adapter on AMD
if platform.system() == "Windows" or (args.hip and args.hip_platform == "AMD"):
llvm_enable_projects += ";lld"

Expand Down Expand Up @@ -152,8 +152,8 @@ def do_configure(args):
libclc_targets_to_build += libclc_nvidia_target_names
libclc_gen_remangled_variants = "ON"

if args.enable_plugin:
sycl_enabled_backends += args.enable_plugin
if args.enable_backends:
sycl_enabled_backends += args.enable_backends

if args.disable_preview_lib:
sycl_preview_lib = "OFF"
Expand Down Expand Up @@ -374,7 +374,9 @@ def main():
parser.add_argument(
"--ci-defaults", action="store_true", help="Enable default CI parameters"
)
parser.add_argument("--enable-plugin", action="append", help="Enable SYCL plugin")
parser.add_argument(
"--enable-backends", action="append", help="Enable SYCL backend"
)
parser.add_argument(
"--disable-preview-lib",
action="store_true",
Expand Down
25 changes: 12 additions & 13 deletions sycl/doc/EnvironmentVariables.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ compiler and runtime.
| `SYCL_ENABLE_DEFAULT_CONTEXTS` | '1' or '0' | Enable ('1') or disable ('0') creation of default platform contexts in SYCL runtime. The default context for each platform contains all devices in the platform. Refer to [Platform Default Contexts](extensions/supported/sycl_ext_oneapi_default_context.asciidoc) extension to learn more. Enabled by default on Linux and disabled on Windows. |
| `SYCL_RT_WARNING_LEVEL` | Positive integer | The higher warning level is used the more warnings and performance hints the runtime library may print. Default value is '0', which means no warning/hint messages from the runtime library are allowed. The value '1' enables performance warnings from device runtime/codegen. The values greater than 1 are reserved for future use. |
| `SYCL_USM_HOSTPTR_IMPORT` | Integer | Enable by specifying non-zero value. Buffers created with a host pointer will result in host data promotion to USM, improving data transfer performance. To use this feature, also set SYCL_HOST_UNIFIED_MEMORY=1. |
| `SYCL_EAGER_INIT` | Integer | Enable by specifying non-zero value. Tells the SYCL runtime to do as much as possible initialization at objects construction as opposed to doing lazy initialization on the fly. This may mean doing some redundant work at warmup but ensures fastest possible execution on the following hot and reportable paths. It also instructs PI plugins to do the same. Default is "0". |
| `SYCL_EAGER_INIT` | Integer | Enable by specifying non-zero value. Tells the SYCL runtime to do as much as possible initialization at objects construction as opposed to doing lazy initialization on the fly. This may mean doing some redundant work at warmup but ensures fastest possible execution on the following hot and reportable paths. It also instructs UR adapters to do the same. Default is "0". |
| `SYCL_REDUCTION_PREFERRED_WORKGROUP_SIZE` | See [below](#sycl_reduction_preferred_workgroup_size) | Controls the preferred work-group size of reductions. |
| `SYCL_ENABLE_FUSION_CACHING` | '1' or '0' | Enable ('1') or disable ('0') caching of JIT compilations for kernel fusion. Caching avoids repeatedly running the JIT compilation pipeline if the same sequence of kernels is fused multiple times. Default value is '1'. |
| `SYCL_JIT_AMDGCN_PTX_KERNELS` | '1' or '0' | Enable ('1') or disable ('0') JIT compilation of kernels. Only supported for Nvidia and AMD backends. Note, that it is required to have a valid binary for the desired backend (AMD or CUDA), that was compiled with `-fsycl-embed-ir` in order to use JIT-ing. When JIT-ing is enabled SYCL runtime will try to cache and reuse JIT-compiled kernels, furthermore if a kernel uses specialization constants the compiler will attempt to materialize the values in place, turning them to de-facto compile time constants. Default is '0'. |
Expand Down Expand Up @@ -153,23 +153,23 @@ For a description of parallel for range rounding in DPC++ see
| | | `MinRangeX`: The minimum X dimension of the range such that range rounding is activated (Default 1024) |


## Controlling DPC++ Level Zero Plugin
## Controlling DPC++ Level Zero Adapter

| Environment variable | Values | Description |
| -------------------- | ------ | ----------- |
| `SYCL_ENABLE_PCI` (Deprecated) | Integer | When set to 1, enables obtaining the GPU PCI address when using the Level Zero backend. The default is 1. This option is kept for compatibility reasons and is immediately deprecated. |
| `SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR` | Any(\*) | Disable USM allocator in Level Zero plugin (each memory request will go directly to Level Zero runtime) |
| `SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY` | Any(\*) | Enable support of the kernels with indirect access and corresponding deferred release of memory allocations in the Level Zero plugin. |
| `SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR` | Any(\*) | Disable USM allocator in Level Zero adapter (each memory request will go directly to Level Zero runtime) |
| `SYCL_PI_LEVEL_ZERO_TRACK_INDIRECT_ACCESS_MEMORY` | Any(\*) | Enable support of the kernels with indirect access and corresponding deferred release of memory allocations in the Level Zero adapter. |

`(*) Note: Any means this environment variable is effective when set to any non-null value.`

## Controlling DPC++ CUDA Plugin
## Controlling DPC++ CUDA Adapter

| Environment variable | Values | Description |
| -------------------- | ------ | ----------- |
| `SYCL_PI_CUDA_MAX_LOCAL_MEM_SIZE` | Integer | Specifies the maximum size of a local memory allocation in bytes. If the value exceeds the device's capabilities then a `sycl::runtime_error` is thrown. In order for the full error message to be printed, `SYCL_RT_WARNING_LEVEL=2` must be set. The default value for `SYCL_PI_CUDA_MAX_LOCAL_MEM_SIZE` is determined by the hardware. |

## Controlling DPC++ HIP Plugin
## Controlling DPC++ HIP Adapter

| Environment variable | Values | Description |
| -------------------- | ------ | ----------- |
Expand Down Expand Up @@ -232,7 +232,6 @@ variables in production code.</span>
| after_addHostAcc | print graph after addHostAccessor method |
| always | print graph before and after each of the above methods |


### `SYCL_UR_TRACE` Options

`SYCL_UR_TRACE` accepts a bit-mask, so individual tracing types can be enabled.
Expand All @@ -258,7 +257,7 @@ Supported tracing levels are in the table below
Any valid combination of the above bit-masks can be used to enable/disable tracing of the corresponding caches. If the input value is not 0 and not a valid number, the disk cache tracing will be enabled (deprecated behavior).
The default value is 0 and no tracing is enabled.

## Debugging variables for Level Zero Plugin
## Debugging variables for Level Zero Adapter

:warning: **Warning:** <span style="color:red">the environment variables
described below are used for development and debugging of DPC++ compiler
Expand All @@ -267,15 +266,15 @@ variables in production code.</span>

| Environment variable | Values | Description |
| -------------------- | ------ | ----------- |
| `SYCL_PI_LEVEL_ZERO_SINGLE_THREAD_MODE` | Integer | A single-threaded app has an opportunity to enable this mode to avoid overhead from mutex locking in the Level Zero plugin. A value greater than 0 enables single thread mode. A value of 0 disables single thread mode. The default is 0. |
| `SYCL_PI_LEVEL_ZERO_SINGLE_THREAD_MODE` | Integer | A single-threaded app has an opportunity to enable this mode to avoid overhead from mutex locking in the Level Zero adapter. A value greater than 0 enables single thread mode. A value of 0 disables single thread mode. The default is 0. |
| `SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR` | [EnableBuffers][;[MaxPoolSize][;[host\|device\|shared:][MaxPoolableSize][,[Capacity][,SlabMinSize]]]...] | EnableBuffers enables pooling for SYCL buffers, default 1, set to 0 to disable. MaxPoolSize is the maximum size of the pool, by default there is no size limit. MemType is host, device, shared or read_only_shared. Other parameters are values specified as positive integers with optional K, M or G suffix. MaxPoolableSize is the maximum allocation size that may be pooled, default 0 for shared, 2MB for host, 4MB for device and read_only_shared. Capacity is the number of allocations in each size range freed by the program but retained in the pool for reallocation, default 4. Size ranges follow this pattern: 64, 96, 128, 192, and so on, i.e., powers of 2, with one range in between. SlabMinSize is the minimum allocation size, 64KB for host and device, 2MB for shared and read_only_shared. Example: SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=1;32M;host:1M,4,64K;device:1M,4,64K;shared:0,0,2M|
| `SYCL_PI_LEVEL_ZERO_BATCH_SIZE` | Integer | Sets a preferred number of compute commands to batch into a command list before executing the command list. A value of 0 causes the batch size to be adjusted dynamically. A value greater than 0 specifies fixed size batching, with the batch size set to the specified value. The default is 0. |
| `SYCL_PI_LEVEL_ZERO_COPY_BATCH_SIZE` | Integer | Sets a preferred number of copy commands to batch into a command list before executing the command list. A value of 0 causes the batch size to be adjusted dynamically. A value greater than 0 specifies fixed size batching, with the batch size set to the specified value. The default is 0. |
| `SYCL_PI_LEVEL_ZERO_FILTER_EVENT_WAIT_LIST` | Integer | When set to 0, disables filtering of signaled events from wait lists when using the Level Zero backend. The default is 0. |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE` | Any(\*) | This environment variable enables users to control use of copy engines for copy operations. If the value is an integer, it will allow the use of copy engines, if available in the device, in Level Zero plugin to transfer SYCL buffer or image data between the host and/or device(s) and to fill SYCL buffer or image data in device or shared memory. The value of this environment variable can also be a pair of the form "lower_index:upper_index" where the indices point to copy engines in a list of all available copy engines. The default is 0:0 when immediate command lists are being used on the device and 1 otherwise. (Also see description of SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS). |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE` | Any(\*) | This environment variable enables users to control use of copy engines for copy operations. If the value is an integer, it will allow the use of copy engines, if available in the device, in Level Zero adapter to transfer SYCL buffer or image data between the host and/or device(s) and to fill SYCL buffer or image data in device or shared memory. The value of this environment variable can also be a pair of the form "lower_index:upper_index" where the indices point to copy engines in a list of all available copy engines. The default is 0:0 when immediate command lists are being used on the device and 1 otherwise. (Also see description of SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS). |
| `SYCL_PI_LEVEL_ZERO_USE_COMPUTE_ENGINE` | Integer | It can be set to an integer (>=0) in which case all compute commands will be submitted to the command-queue with the given index in the compute command group. If it is instead set to a negative value then all available compute engines may be used. The default value is "0" |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_D2D_COPY` (experimental) | Integer | Allows the use of copy engine, if available in the device, in Level Zero plugin for device to device copy operations. The default is 0. This option is experimental and will be removed once heuristics are added to make a decision about use of copy engine for device to device copy operations. |
| `SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS` | Any(\*) | Enable support of device-scope events whose state is not visible to the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=1 the Level Zero plugin would create all events having device-scope only and create proxy host-visible events for them when their status is needed (wait/query) on the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=2 the Level Zero plugin would create all events having device-scope and add proxy host-visible event at the end of each command-list submission. The default is 0, meaning all events have host visibility. SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS is ignored when using immediate command lists (SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = 1) and all events use default scope of 0. |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_D2D_COPY` (experimental) | Integer | Allows the use of copy engine, if available in the device, in Level Zero adapter for device to device copy operations. The default is 0. This option is experimental and will be removed once heuristics are added to make a decision about use of copy engine for device to device copy operations. |
| `SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS` | Any(\*) | Enable support of device-scope events whose state is not visible to the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=1 the Level Zero adapter would create all events having device-scope only and create proxy host-visible events for them when their status is needed (wait/query) on the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=2 the Level Zero adapter would create all events having device-scope and add proxy host-visible event at the end of each command-list submission. The default is 0, meaning all events have host visibility. SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS is ignored when using immediate command lists (SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS = 1) and all events use default scope of 0. |
| `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS` | Integer | When set to a positive value enables use of Level Zero immediate commandlists, which means there is no batching and all commands are immediately submitted for execution. When set to 1, unique immediate commandlists are created for each SYCL queue. When set to 2, unique immediate commandlists are created per host thread per SYCL queue. Default is 1 on Intel® Data Center GPU Max Series running Linux and 0 elsewhere. |
| `SYCL_PI_LEVEL_ZERO_USE_MULTIPLE_COMMANDLIST_BARRIERS` | Integer | When set to a positive value enables use of multiple Level Zero commandlists when submitting barriers. Default is 1. |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_FILL` | Integer | When set to a positive value enables use of a copy engine for memory fill operations. Default is 0. |
Expand All @@ -287,7 +286,7 @@ variables in production code.</span>
| `SYCL_PI_LEVEL_ZERO_USM_RESIDENT` | Integer | Bit-mask controls if/where to make USM allocations resident at the time of allocation. Input value is of the form 0xHSD, where 4-bits of D control device allocations, 4-bits of S control shared allocations, and 4-bits of H control host allocations. Each 4-bit component is holding one of the following values: "0" - then no special residency is forced, "1" - then allocation is made resident at the device of allocation, or "2" - then allocation is made resident on all devices in the context of allocation that have P2P access to the device of allocation. Default is 0x002, i.e. force full residency for device allocations only. |
| `SYCL_PI_LEVEL_ZERO_USE_NATIVE_USM_MEMCPY2D` | Integer | When set to a positive value enables the use of Level Zero USM 2D memory copy operations. Default is 0. |

## Debugging variables for CUDA Plugin
## Debugging variables for CUDA Adapter

:warning: **Warning:** <span style="color:red">the environment variables
described below are used for development and debugging of DPC++ compiler
Expand Down
5 changes: 3 additions & 2 deletions sycl/doc/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,8 +138,9 @@ OpenCL 2.1, so any device, capable of OpenCL 2.1, should be supported.
Otherwise, your OpenCL device must support `cl_khr_il_program` extension.

Furthermore, developers can extend capabilities of the DPC++ Runtime to
non-OpenCL devices by writing correspondent plugins. To learn more, please
check out our [Plugin Interface Guide](design/PluginInterface.md).
non-OpenCL devices by writing correspondent adapters. To learn more, please
check out the
[Unified Runtime project](https://github.com/oneapi-src/unified-runtime).

### Q: DPC++ applications hang on Intel GPUs while working well on other devices
**A:** One of the common reasons is Intel GPUs feature called "hang check".
Expand Down
Loading

0 comments on commit 3cc67ce

Please sign in to comment.