-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Doc] Add spec to get device image backend content #14811
Changes from 1 commit
f09c9d6
2cd6478
793b739
bf6b734
a548105
5f17499
ef53b68
acd6b94
d288119
236c32e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,254 @@ | ||
= sycl_ext_oneapi_device_image_content | ||
|
||
:source-highlighter: coderay | ||
:coderay-linenums-mode: table | ||
|
||
// This section needs to be after the document title. | ||
:doctype: book | ||
:toc2: | ||
:toc: left | ||
:encoding: utf-8 | ||
:lang: en | ||
:dpcpp: pass:[DPC++] | ||
:endnote: —{nbsp}end{nbsp}note | ||
|
||
// Set the default source code type in this document to C++, | ||
// for syntax highlighting purposes. This is needed because | ||
// docbook uses c++ and html5 uses cpp. | ||
:language: {basebackend@docbook:c++:cpp} | ||
|
||
|
||
== Notice | ||
|
||
[%hardbreaks] | ||
Copyright (C) 2024 Intel Corporation. All rights reserved. | ||
|
||
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks | ||
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by | ||
permission by Khronos. | ||
|
||
|
||
== Contact | ||
|
||
To report problems with this extension, please open a new issue at: | ||
|
||
https://github.com/intel/llvm/issues | ||
|
||
|
||
== Dependencies | ||
|
||
This extension is written against the SYCL 2020 revision 8 specification. | ||
All references below to the "core SYCL specification" or to section numbers in | ||
the SYCL specification refer to that revision. | ||
|
||
|
||
== Status | ||
|
||
This is a proposed extension specification, intended to gather community | ||
feedback. | ||
Interfaces defined in this specification may not be implemented yet or may be | ||
in a preliminary state. | ||
The specification itself may also change in incompatible ways before it is | ||
finalized. | ||
*Shipping software products should not rely on APIs defined in this | ||
specification.* | ||
|
||
|
||
== Overview | ||
|
||
This extension adds a mechanism to obtain the raw content of the device images | ||
that are in a kernel bundle. | ||
The format of this content is implementation-defined, so applications that make | ||
use of this extension are normally not portable to other implementations of | ||
SYCL. | ||
|
||
|
||
== Specification | ||
|
||
=== Feature test macro | ||
|
||
This extension provides a feature-test macro as described in the core SYCL | ||
specification. | ||
An implementation supporting this extension must predefine the macro | ||
`SYCL_EXT_ONEAPI_DEVICE_IMAGE_CONTENT` to one of the values defined in the table | ||
below. | ||
Applications can test for the existence of this macro to determine if the | ||
implementation supports this feature, or applications can test the macro's | ||
value to determine which of the extension's features the implementation | ||
supports. | ||
|
||
[%header,cols="1,5"] | ||
|=== | ||
|Value | ||
|Description | ||
|
||
|1 | ||
|The APIs of this experimental extension are not versioned, so the | ||
feature-test macro always has this value. | ||
|=== | ||
|
||
=== New member functions in the `device_image` class | ||
|
||
This extension adds the following member functions to the `device_image` class. | ||
|
||
[source,c++] | ||
---- | ||
namespace sycl { | ||
|
||
template <bundle_state State> | ||
class device_image { | ||
public: | ||
std::vector<std::byte> ext_oneapi_get_content() const; | ||
std::span<std::byte> ext_oneapi_get_content_view() const; | ||
|
||
/*...*/ | ||
}; | ||
|
||
} // namespace sycl | ||
---- | ||
|
||
''' | ||
|
||
[frame=all,grid=none,separator="@"] | ||
!==== | ||
a@ | ||
[source,c++] | ||
---- | ||
std::vector<std::byte> ext_oneapi_get_content() const; | ||
---- | ||
!==== | ||
|
||
_Returns:_ A copy of the raw bytes for this device image. | ||
The format of this data is implementation-defined. | ||
See below for a description of this format on {dpcpp}. | ||
|
||
''' | ||
|
||
[frame=all,grid=none,separator="@"] | ||
!==== | ||
a@ | ||
[source,c++] | ||
---- | ||
std::span<std::byte> ext_oneapi_get_content_view() const; | ||
---- | ||
!==== | ||
|
||
Available on when the compiler is {cpp}20 or higher. | ||
|
||
_Returns:_ A view of the raw bytes for this device image. | ||
The data behind this view has the same lifetime as the `kernel_bundle` that | ||
contains this `device_image`. | ||
|
||
''' | ||
|
||
|
||
== Device image format for {dpcpp} | ||
|
||
This section is non-normative and applies only to the {dpcpp} implementation. | ||
The format of the data returned by `device_image::ext_oneapi_get_content` and | ||
`device_image::ext_oneapi_get_content_view` depends on the backend of the kernel | ||
bundle that contains the device image and also on the `State` of the device | ||
image. | ||
|
||
=== Format on Level Zero | ||
|
||
The format depends on the `State` of the device image: | ||
|
||
* `bundle_state::input`: The format could change in the future, but it is | ||
currently a SPIR-V module representing one or more kernels. | ||
This SPIR-V module may be partially linked, with references to internal | ||
library functions that are not defined in the SPIR-V module. | ||
|
||
* `bundle_state::object`: The same format as `bundle_state::input`. | ||
|
||
* `bundle_state::executable`: The device image content is native ISA for the | ||
device, which can be passed to `zeModuleCreate` as `ZE_MODULE_FORMAT_NATIVE` | ||
format. | ||
|
||
:ref1: ../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc#level-zero-and-opencl-compatibility | ||
|
||
[_Note:_ The interface to kernels in the device image is not defined in the | ||
general case, which means there is no portable way to invoke kernels from a | ||
Level Zero module that is created from the raw device image content. | ||
However, see link:{ref1}[here] for a limited case where this portability is | ||
guaranteed. | ||
_{endnote}_] | ||
|
||
=== Format on OpenCL | ||
|
||
The format depends on the `State` of the device image: | ||
|
||
* `bundle_state::input`: The format could change in the future, but it is | ||
currently a SPIR-V module as described above for Level Zero. | ||
|
||
* `bundle_state::object`: The device image content is an unspecified format that | ||
is created by calling `clCompileProgram`. | ||
|
||
* `bundle_state::executable`: The device image content is executable binary | ||
device code representing one or more kernels, which can be passed to | ||
`clCreateProgramWithBinary`. | ||
|
||
[_Note:_ The interface to kernels in the device image is not defined in the | ||
general case, which means there is no portable way to invoke kernels from a | ||
OpenCL `cl_program` object that is created from the raw device image content. | ||
However, see link:{ref1}[here] for a limited case where this portability is | ||
guaranteed. | ||
_{endnote}_] | ||
|
||
=== Format on CUDA | ||
|
||
The format depends on the `State` of the device image: | ||
|
||
* `bundle_state::input`: The device image content is a PTX module representing | ||
one or more kernels. | ||
|
||
* `bundle_state::object`: ??? | ||
|
||
* `bundle_state::executable`: The device image content is a CUBIN module | ||
representing one or more kernels. | ||
|
||
|
||
== Example | ||
|
||
:ref2: ../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc | ||
|
||
A kernel bundle can contain multiple device images with different | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any way to get information about device for a given device image? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not directly. You can indirectly ask if a device image is compatible with a device via
I think we could add a query like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if there is device_image::has_kernel(kernel_id, device), then I think it is fine. |
||
representations of the same kernel for different devices. | ||
This example shows how to get the device image content for a particular kernel | ||
for a particular device. | ||
Note that this example also uses the kernel syntax described in link:{ref2}[ | ||
sycl_ext_oneapi_free_function_kernels], but it is not necessary to define | ||
kernels in that syntax when using this extension. | ||
|
||
[source,c++] | ||
---- | ||
#include <sycl/sycl.hpp> | ||
namespace syclext = sycl::ext::oneapi; | ||
namespace syclexp = sycl::ext::oneapi::experimental; | ||
|
||
SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclexp::nd_range_kernel<1>)) | ||
void iota(float start, float *ptr) { | ||
size_t id = syclext::this_work_item::get_nd_item().get_global_linear_id(); | ||
ptr[id] = start + static_cast<float>(id); | ||
} | ||
|
||
void main() { | ||
sycl::device d; | ||
sycl::queue q{d}; | ||
sycl::context ctxt = q.get_context(); | ||
|
||
// Get a kernel bundle that contains the kernel "iota". | ||
sycl::kernel_id iota = syclexp::get_kernel_id<iota>(); | ||
auto exe_bndl = | ||
sycl::get_kernel_bundle<sycl::bundle_state::executable>(ctxt, {iota}); | ||
|
||
std::vector<std::byte> bytes; | ||
for (auto& img: bundle) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just for my education, are there any cases where there will be more than 1 image returned for this case? Except the multi-device case There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a good question! For DPC++, this will be the case (at least for now). @AerialMantis proposed the initial draft for the kernel bundle APIs, and he originally thought there could be multiple device images for a single device. I think the idea was that the compiler could create multiple versions that are optimized for different scenarios. I think this was just a hypothetical use case, though. DPC++ does not currently do this. If we did add this in the future, we would need some sort of query to distinguish between them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh wait. Are you asking if a
There is no guarantee that |
||
// Search for the device image that contains "iota" for this device. | ||
if (img.has_kernel(iota, dev)) { | ||
bytes = img.ext_oneapi_get_content(); | ||
break; | ||
} | ||
} | ||
} | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AerialMantis: Can you help with this part? What is the device image content for an
object
state bundle on CUDA?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no meaningful state for the cuda and HIP backend, they each ignore transition to this state and jump to the last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds the same as as with Level Zero. Can we say:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to restrict this extension to only
bundle_state::executable
, so I think the issue here is now moot.