diff --git a/sycl/doc/design/OffloadDesign.md b/sycl/doc/design/OffloadDesign.md index cf3a43713b9cf..8db5748217d0d 100644 --- a/sycl/doc/design/OffloadDesign.md +++ b/sycl/doc/design/OffloadDesign.md @@ -283,7 +283,7 @@ list to be passed along. *Example: spir64_gen enabling options* > --gpu-tool-arg="-device pvc -options extraopt_pvc" ---gpu-tool-arg="-device skl -options -extraopt_skl" +--gpu-tool-arg="-options -extraopt_skl" *Example: clang-linker-wrapper options* @@ -296,6 +296,128 @@ resemble `--gpu-tool-arg= `. This corresponds to the existing option syntax of `-fsycl-targets=intel_gpu_arch` where `arch` can be a fixed set of targets. +#### --offload-arch + +For SYCL offloading to Intel GPUs, Intel CPUs, NVidia and AMD GPUs, specify the device architecture using ``--offload-arch`` option. For instance + ``--offload-arch=sm_80`` to target an NVidia Tesla A100, + ``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or + ``--offload-arch=sm_80,gfx90a`` to target both. + +For Intel Graphics AOT target, valid values for ``--offload-arch`` are mapped to +valid device names accepted by OCLOC (the Intel GPU AOT compiler) via the ``-device`` option. + +SYCL offloading with ``--offload-arch`` for Intel CPUs and Intel GPUs is currently enabled only with ``--offload-new-driver`` option. + +``` +Example: + +$ clang++ -fsycl -offload-arch=bdw --offload-new-driver -c foo.cpp // SYCL AOT for Intel GPU. +$ clang++ -fsycl -offload-arch=broadwell --offload-new-driver -c foo.cpp // SYCL AOT for Intel CPU. +``` + +The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel GPUs and the corresponding `-device` value passed to OCLOC. + +| Intel GPU device | ``--offload-arch`` accepted value | OCLOC -device value | +|------------------|-------------------------|------------------------| +| Intel(R) microarchitecture code name Broadwell Intel graphics architecture | bdw | bdw | +| Intel(R) microarchitecture code name Skylake Intel graphics architecture | skl | skl | +| Kaby Lake Intel graphics architecture | kbl | kbl | +| Coffee Lake Intel graphics architecture | cfl | cfl | +| Apollo Lake Intel graphics architecture | apl | apl | +| Broxton Intel graphics architecture | bxt | apl | +| Gemini Lake Intel graphics architecture | glk | glk | +| Whiskey Lake Intel graphics architecture | whl | whl | +| Amber Lake Intel graphics architecture | aml | aml | +| Comet Lake Intel graphics architecture | cml | cml | +| Ice Lake Intel graphics architecture | icl, icllp | icllp | +| Elkhart Lake Intel graphics architecture | ehl | ehl | +| Jasper Lake Intel graphics architecture | jsl | jsl | +| Tiger Lake Intel graphics architecture | tgl, tgllp | tgllp | +| Rocket Lake Intel graphics architecture | rkl | rkl | +| Alder Lake S Intel graphics architecture | adl_s | adl_s | +| Raptor Lake Intel graphics architecture | rpl_s | adl_s | +| Alder Lake P Intel graphics architecture | adl_p | adl_p | +| Alder Lake N Intel graphics architecture | adl_n | adl_n | +| DG1 Intel graphics architecture | dg1 | dg1 | +| Alchemist G10 Intel graphics architecture | acm_g10, dg2_g10 | acm_g10 | +| Alchemist G11 Intel graphics architecture | acm_g11, dg2_g11 | acm_g11 | +| Alchemist G12 Intel graphics architecture | acm_g12, dg2_g12 | acm_g12 | +| Ponte Vecchio Intel graphics architecture | pvc | pvc | +| Ponte Vecchio VG Intel graphics architecture | pvc_vg | pvc_vg | +| Meteor Lake U/S or Arrow Lake U/S Intel graphics architecture | mtl_u, mtl_s, arl_u | mtl_s | +| Meteor Lake H Intel graphics architecture | mtl_h | mtl_h | +| Arrow Lake H Intel graphics architecture | arl_h | arl_h | +| Battlemage G21 Intel graphics architecture | bmg_g21 | bmg_g21 | +| Lunar Lake Intel graphics architecture | lnl_m | lnl_m | + +#### nvptx64-nvidia-cuda support +For SYCL offloading to NVidia GPUs using ``--offload-arch`` option, the following table +lists the accepted values. + +| NVidia GPU device name | ``--offload-arch`` accepted values for NVidia GPUs | +|------------------------|----------------------------------------------------| +| NVIDIA Maxwell architecture (compute capability 5.0) | sm_50 | +| NVIDIA Maxwell architecture (compute capability 5.2) | sm_52 | +| NVIDIA Maxwell architecture (compute capability 5.3) | sm_53 | +| NVIDIA Pascal architecture (compute capability 6.0) | sm_60 | +| NVIDIA Pascal architecture (compute capability 6.1) | sm_61 | +| NVIDIA Pascal architecture (compute capability 6.2) | sm_62 | +| NVIDIA Volta architecture (compute capability 7.0) | sm_70 | +| NVIDIA Volta architecture (compute capability 7.2) | sm_72 | +| NVIDIA Turing architecture (compute capability 7.5) | sm_75 | +| NVIDIA Ampere architecture (compute capability 8.0) | sm_80 | +| NVIDIA Ampere architecture (compute capability 8.6) | sm_86 | +| NVIDIA Jetson/Drive AGX Orin architecture | sm_87 | +| NVIDIA Ada Lovelace architecture | sm_89 | +| NVIDIA Hopper architecture | sm_90 | +| NVIDIA Hopper architecture (with wgmma and setmaxnreg instructions) | sm_90a | + +#### amdgcn-amd-amdhsa support + +For SYCL offloading to AMD GPUs using ``--offload-arch`` option, the following table +lists the accepted values. + +| AMD GPU device name | ``--offload-arch`` accepted values for AMD GPUs | +|------------------------|----------------------------------------------------| +| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx700 | +| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx701 | +| AMD GCN GFX7 (Sea Islands (CI)) architecture | gfx702 | +| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx801 | +| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx802 | +| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx803 | +| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx805 | +| AMD GCN GFX8 (Volcanic Islands (VI)) architecture | gfx810 | +| AMD GCN GFX9 (Vega) architecture | gfx900 | +| AMD GCN GFX9 (Vega) architecture | gfx902 | +| AMD GCN GFX9 (Vega) architecture | gfx904 | +| AMD GCN GFX9 (Vega) architecture | gfx906 | +| AMD GCN GFX9 (Vega) architecture | gfx908 | +| AMD GCN GFX9 (Vega) architecture | gfx909 | +| AMD GCN GFX9 (Vega) architecture | gfx90a | +| AMD GCN GFX9 (Vega) architecture | gfx90c | +| AMD GCN GFX9 (Vega) architecture | gfx940 | +| AMD GCN GFX9 (Vega) architecture | gfx941 | +| AMD GCN GFX9 (Vega) architecture | gfx942 | +| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1010 | +| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1011 | +| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1012 | +| AMD GCN GFX10.1 (RDNA 1) architecture | gfx1013 | +| AMD GCN GFX10.3 (RDNA 2) architecture | gfx1030 | +| GCN GFX10.3 (RDNA 2) architecture | gfx1031 | +| GCN GFX10.3 (RDNA 2) architecture | gfx1032 | +| GCN GFX10.3 (RDNA 2) architecture | gfx1033 | +| GCN GFX10.3 (RDNA 2) architecture | gfx1034 | +| GCN GFX10.3 (RDNA 2) architecture | gfx1035 | +| GCN GFX10.3 (RDNA 2) architecture | gfx1036 | +| GCN GFX11 (RDNA 3) architecture | gfx1100 | +| GCN GFX11 (RDNA 3) architecture | gfx1101 | +| GCN GFX11 (RDNA 3) architecture | gfx1102 | +| GCN GFX11 (RDNA 3) architecture | gfx1103 | +| GCN GFX11 (RDNA 3) architecture | gfx1150 | +| GCN GFX11 (RDNA 3) architecture | gfx1151 | +| GCN GFX12 (RDNA 4) architecture | gfx1200 | +| GCN GFX12 (RDNA 4) architecture | gfx1201 | + #### spir64_fpga support Compilation behaviors involving AOT for FPGA involve an additional call to @@ -355,6 +477,34 @@ Additional options passed by the user via the `-Xsycl-target-backend=spir64_x86_64 ` command will be processed by a new option to the wrapper, `--cpu-tool-arg=` +Similar to SYCL offloading to Intel GPUs using `--offload-arch`, SYCL AOT for Intel CPUs +will also leverage the `--offload-arch` option. +The valid CPU device names accepted for `--offload-arch` are CPU names from ``clang -march``. +These names are more verbose, and do not overlap with the Intel GPU names. +These user input CPU names are mapped to the corresponding ``opencl-aot -march`` option. + +The following table shows a mapping of the accepted values for `--offload-arch` to enable SYCL offloading to Intel CPUs and the corresponding `-march` value passed to opencl-aot. + +| Intel CPU device | ``--offload-arch`` accepted value | opencl-aot -march value | +|----------------|-------------------------|----------------------------| +| Intel(R) Advanced Vector Extensions 512 | skylake-avx512 | avx512 | +| Intel(R) Advanced Vector Extensions 2 | core-avx2 | avx2 | +| Intel(R) Advanced Vector Extensions | corei7-avx | avx | +| Intel(R) Streaming SIMD Extensions 4.2 | corei7 | sse4.2 | +| Intel(R) microarchitecture code name Westmere | westmere | wsm | +| Intel(R) microarchitecture code name Sandy Bridge | sandybridge | snb | +| Intel(R) microarchitecture code name Ivy Bridge | ivybridge | ivyb | +| Intel(R) microarchitecture code name Broadwell | broadwell | bdw | +| Intel(R) microarchitecture code name Coffee Lake | coffeelake | cfl | +| Intel(R) microarchitecture code name Alder Lake | alderlake | adl | +| Intel(R) microarchitecture code name Skylake (client) | skylake | skylake | +| Intel(R) microarchitecture code name Skylake (server) | skx | skx | +| Intel(R) microarchitecture code name Cascade Lake | cascadelake | clk | +| Intel(R) microarchitecture code name Ice Lake (client) | icelake-client | icl | +| Intel(R) microarchitecture code name Ice Lake (server) | icelake-server | icx | +| Intel(R) microarchitecture code name Sapphire Rapids | sapphirerapids | spr | +| Intel(R) microarchitecture code name Granite Rapids | graniterapids | gnr | + ### Wrapping of device image Once the device binary is pulled out of the fat binary, the binary must be