Skip to content

Commit

Permalink
[SYCL][Doc] Update SYCL --offload-arch option. (#15531)
Browse files Browse the repository at this point in the history
Document SYCL offloading using `--offload-arch` to Intel CPUs, Intel
GPUs, NVidia GPUs and AMD GPUs.
  • Loading branch information
srividya-sundaram authored Oct 1, 2024
1 parent 9efd43a commit 74e4ae1
Showing 1 changed file with 151 additions and 1 deletion.
152 changes: 151 additions & 1 deletion sycl/doc/design/OffloadDesign.md
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,7 @@ list to be passed along.
*Example: spir64_gen enabling options*

> --gpu-tool-arg="-device pvc -options extraopt_pvc"
--gpu-tool-arg="-device skl -options -extraopt_skl"
--gpu-tool-arg="-options -extraopt_skl"

*Example: clang-linker-wrapper options*

Expand All @@ -296,6 +296,128 @@ resemble `--gpu-tool-arg=<arch> <arg>`. This corresponds to the existing
option syntax of `-fsycl-targets=intel_gpu_arch` where `arch` can be a fixed
set of targets.

#### --offload-arch

For SYCL offloading to Intel GPUs, Intel CPUs, NVidia and AMD GPUs, specify the device architecture using ``--offload-arch`` option. For instance
``--offload-arch=sm_80`` to target an NVidia Tesla A100,
``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or
``--offload-arch=sm_80,gfx90a`` to target both.

For Intel Graphics AOT target, valid values for ``--offload-arch`` are mapped to
valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the ``-device`` option.

SYCL offloading with ``--offload-arch`` for Intel CPUs and Intel GPUs is currently enabled only with ``--offload-new-driver`` option.

```
Example:
$ clang++ -fsycl -offload-arch=bdw --offload-new-driver -c foo.cpp // SYCL AOT for Intel GPU.
$ clang++ -fsycl -offload-arch=broadwell --offload-new-driver -c foo.cpp // SYCL AOT for Intel CPU.
```

The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel GPUs and the corresponding `-device` value passed to OCLOC.

| Intel GPU device | ``--offload-arch`` accepted value | OCLOC -device value |
|------------------|-------------------------|------------------------|
| Intel(R) microarchitecture code name Broadwell Intel graphics architecture | bdw | bdw |
| Intel(R) microarchitecture code name Skylake Intel graphics architecture | skl | skl |
| Kaby Lake Intel graphics architecture | kbl | kbl |
| Coffee Lake Intel graphics architecture | cfl | cfl |
| Apollo Lake Intel graphics architecture | apl | apl |
| Broxton Intel graphics architecture | bxt | apl |
| Gemini Lake Intel graphics architecture | glk | glk |
| Whiskey Lake Intel graphics architecture | whl | whl |
| Amber Lake Intel graphics architecture | aml | aml |
| Comet Lake Intel graphics architecture | cml | cml |
| Ice Lake Intel graphics architecture | icl, icllp | icllp |
| Elkhart Lake Intel graphics architecture | ehl | ehl |
| Jasper Lake Intel graphics architecture | jsl | jsl |
| Tiger Lake Intel graphics architecture | tgl, tgllp | tgllp |
| Rocket Lake Intel graphics architecture | rkl | rkl |
| Alder Lake S Intel graphics architecture | adl_s | adl_s |
| Raptor Lake Intel graphics architecture | rpl_s | adl_s |
| Alder Lake P Intel graphics architecture | adl_p | adl_p |
| Alder Lake N Intel graphics architecture | adl_n | adl_n |
| DG1 Intel graphics architecture | dg1 | dg1 |
| Alchemist G10 Intel graphics architecture | acm_g10, dg2_g10 | acm_g10 |
| Alchemist G11 Intel graphics architecture | acm_g11, dg2_g11 | acm_g11 |
| Alchemist G12 Intel graphics architecture | acm_g12, dg2_g12 | acm_g12 |
| Ponte Vecchio Intel graphics architecture | pvc | pvc |
| Ponte Vecchio VG Intel graphics architecture | pvc_vg | pvc_vg |
| Meteor Lake U/S or Arrow Lake U/S Intel graphics architecture | mtl_u, mtl_s, arl_u | mtl_s |
| Meteor Lake H Intel graphics architecture | mtl_h | mtl_h |
| Arrow Lake H Intel graphics architecture | arl_h | arl_h |
| Battlemage G21 Intel graphics architecture | bmg_g21 | bmg_g21 |
| Lunar Lake Intel graphics architecture | lnl_m | lnl_m |

#### nvptx64-nvidia-cuda support
For SYCL offloading to NVidia GPUs using ``--offload-arch`` option, the following table
lists the accepted values.

| NVidia GPU device name | ``--offload-arch`` accepted values for NVidia GPUs |
|------------------------|----------------------------------------------------|
| NVIDIA Maxwell architecture (compute capability 5.0) | sm_50 |
| NVIDIA Maxwell architecture (compute capability 5.2) | sm_52 |
| NVIDIA Maxwell architecture (compute capability 5.3) | sm_53 |
| NVIDIA Pascal architecture (compute capability 6.0) | sm_60 |
| NVIDIA Pascal architecture (compute capability 6.1) | sm_61 |
| NVIDIA Pascal architecture (compute capability 6.2) | sm_62 |
| NVIDIA Volta architecture (compute capability 7.0) | sm_70 |
| NVIDIA Volta architecture (compute capability 7.2) | sm_72 |
| NVIDIA Turing architecture (compute capability 7.5) | sm_75 |
| NVIDIA Ampere architecture (compute capability 8.0) | sm_80 |
| NVIDIA Ampere architecture (compute capability 8.6) | sm_86 |
| NVIDIA Jetson/Drive AGX Orin architecture | sm_87 |
| NVIDIA Ada Lovelace architecture | sm_89 |
| NVIDIA Hopper architecture | sm_90 |
| NVIDIA Hopper architecture (with wgmma and setmaxnreg instructions) | sm_90a |

#### amdgcn-amd-amdhsa support

For SYCL offloading to AMD GPUs using ``--offload-arch`` option, the following table
lists the accepted values.

| AMD GPU device name | ``--offload-arch`` accepted values for AMD GPUs |
|------------------------|----------------------------------------------------|
| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx700 |
| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx701 |
| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx702 |
| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx801 |
| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx802 |
| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx803 |
| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx805 |
| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx810 |
| AMD GCN GFX9 (Vega) architecture | gfx900 |
| AMD GCN GFX9 (Vega) architecture | gfx902 |
| AMD GCN GFX9 (Vega) architecture | gfx904 |
| AMD GCN GFX9 (Vega) architecture | gfx906 |
| AMD GCN GFX9 (Vega) architecture | gfx908 |
| AMD GCN GFX9 (Vega) architecture | gfx909 |
| AMD GCN GFX9 (Vega) architecture | gfx90a |
| AMD GCN GFX9 (Vega) architecture | gfx90c |
| AMD GCN GFX9 (Vega) architecture | gfx940 |
| AMD GCN GFX9 (Vega) architecture | gfx941 |
| AMD GCN GFX9 (Vega) architecture | gfx942 |
| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1010 |
| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1011 |
| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1012 |
| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1013 |
| AMD GCN GFX10.3 (RDNA 2) architecture | gfx1030 |
| GCN GFX10.3 (RDNA 2) architecture | gfx1031 |
| GCN GFX10.3 (RDNA 2) architecture | gfx1032 |
| GCN GFX10.3 (RDNA 2) architecture | gfx1033 |
| GCN GFX10.3 (RDNA 2) architecture | gfx1034 |
| GCN GFX10.3 (RDNA 2) architecture | gfx1035 |
| GCN GFX10.3 (RDNA 2) architecture | gfx1036 |
| GCN GFX11 (RDNA 3) architecture | gfx1100 |
| GCN GFX11 (RDNA 3) architecture | gfx1101 |
| GCN GFX11 (RDNA 3) architecture | gfx1102 |
| GCN GFX11 (RDNA 3) architecture | gfx1103 |
| GCN GFX11 (RDNA 3) architecture | gfx1150 |
| GCN GFX11 (RDNA 3) architecture | gfx1151 |
| GCN GFX12 (RDNA 4) architecture | gfx1200 |
| GCN GFX12 (RDNA 4) architecture | gfx1201 |

#### spir64_fpga support

Compilation behaviors involving AOT for FPGA involve an additional call to
Expand Down Expand Up @@ -355,6 +477,34 @@ Additional options passed by the user via the
`-Xsycl-target-backend=spir64_x86_64 <opts>` command will be processed by a new
option to the wrapper, `--cpu-tool-arg=<arg>`

Similar to SYCL offloading to Intel GPUs using `--offload-arch`, SYCL AOT for Intel CPUs
will also leverage the `--offload-arch` option.
The valid CPU device names accepted for `--offload-arch` are CPU names from ``clang -march``.
These names are more verbose, and do not overlap with the Intel GPU names.
These user input CPU names are mapped to the corresponding ``opencl-aot -march`` option.

The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel CPUs and the corresponding `-march` value passed to opencl-aot.

| Intel CPU device | ``--offload-arch`` accepted value | opencl-aot -march value |
|----------------|-------------------------|----------------------------|
| Intel(R) Advanced Vector Extensions 512 | skylake-avx512 | avx512 |
| Intel(R) Advanced Vector Extensions 2 | core-avx2 | avx2 |
| Intel(R) Advanced Vector Extensions | corei7-avx | avx |
| Intel(R) Streaming SIMD Extensions 4.2 | corei7 | sse4.2 |
| Intel(R) microarchitecture code name Westmere | westmere | wsm |
| Intel(R) microarchitecture code name Sandy Bridge | sandybridge | snb |
| Intel(R) microarchitecture code name Ivy Bridge | ivybridge | ivyb |
| Intel(R) microarchitecture code name Broadwell | broadwell | bdw |
| Intel(R) microarchitecture code name Coffee Lake | coffeelake | cfl |
| Intel(R) microarchitecture code name Alder Lake | alderlake | adl |
| Intel(R) microarchitecture code name Skylake (client) | skylake | skylake |
| Intel(R) microarchitecture code name Skylake (server) | skx | skx |
| Intel(R) microarchitecture code name Cascade Lake | cascadelake | clk |
| Intel(R) microarchitecture code name Ice Lake (client) | icelake-client | icl |
| Intel(R) microarchitecture code name Ice Lake (server) | icelake-server | icx |
| Intel(R) microarchitecture code name Sapphire Rapids | sapphirerapids | spr |
| Intel(R) microarchitecture code name Granite Rapids | graniterapids | gnr |

### Wrapping of device image

Once the device binary is pulled out of the fat binary, the binary must be
Expand Down

0 comments on commit 74e4ae1

Please sign in to comment.