Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Doc] Add spec to get device image backend content #14811

Merged
merged 10 commits into from
Dec 3, 2024
Original file line number Diff line number Diff line change
@@ -0,0 +1,254 @@
= sycl_ext_oneapi_device_image_content

:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en
:dpcpp: pass:[DPC++]
:endnote: —{nbsp}end{nbsp}note

// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}


== Notice

[%hardbreaks]
Copyright (C) 2024 Intel Corporation. All rights reserved.

Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
permission by Khronos.


== Contact

To report problems with this extension, please open a new issue at:

https://github.com/intel/llvm/issues


== Dependencies

This extension is written against the SYCL 2020 revision 8 specification.
All references below to the "core SYCL specification" or to section numbers in
the SYCL specification refer to that revision.


== Status

This is a proposed extension specification, intended to gather community
feedback.
Interfaces defined in this specification may not be implemented yet or may be
in a preliminary state.
The specification itself may also change in incompatible ways before it is
finalized.
*Shipping software products should not rely on APIs defined in this
specification.*


== Overview

This extension adds a mechanism to obtain the raw content of the device images
that are in a kernel bundle.
The format of this content is implementation-defined, so applications that make
use of this extension are normally not portable to other implementations of
SYCL.


== Specification

=== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
specification.
An implementation supporting this extension must predefine the macro
`SYCL_EXT_ONEAPI_DEVICE_IMAGE_CONTENT` to one of the values defined in the table
below.
Applications can test for the existence of this macro to determine if the
implementation supports this feature, or applications can test the macro's
value to determine which of the extension's features the implementation
supports.

[%header,cols="1,5"]
|===
|Value
|Description

|1
|The APIs of this experimental extension are not versioned, so the
feature-test macro always has this value.
|===

=== New member functions in the `device_image` class

This extension adds the following member functions to the `device_image` class.

[source,c++]
----
namespace sycl {

template <bundle_state State>
class device_image {
public:
std::vector<std::byte> ext_oneapi_get_content() const;
std::span<std::byte> ext_oneapi_get_content_view() const;

/*...*/
};

} // namespace sycl
----

'''

[frame=all,grid=none,separator="@"]
!====
a@
[source,c++]
----
std::vector<std::byte> ext_oneapi_get_content() const;
----
!====

_Returns:_ A copy of the raw bytes for this device image.
The format of this data is implementation-defined.
See below for a description of this format on {dpcpp}.

'''

[frame=all,grid=none,separator="@"]
!====
a@
[source,c++]
----
std::span<std::byte> ext_oneapi_get_content_view() const;
----
!====

Available on when the compiler is {cpp}20 or higher.

_Returns:_ A view of the raw bytes for this device image.
The data behind this view has the same lifetime as the `kernel_bundle` that
contains this `device_image`.

'''


== Device image format for {dpcpp}

This section is non-normative and applies only to the {dpcpp} implementation.
The format of the data returned by `device_image::ext_oneapi_get_content` and
`device_image::ext_oneapi_get_content_view` depends on the backend of the kernel
bundle that contains the device image and also on the `State` of the device
image.

=== Format on Level Zero

The format depends on the `State` of the device image:

* `bundle_state::input`: The format could change in the future, but it is
currently a SPIR-V module representing one or more kernels.
This SPIR-V module may be partially linked, with references to internal
library functions that are not defined in the SPIR-V module.

* `bundle_state::object`: The same format as `bundle_state::input`.

* `bundle_state::executable`: The device image content is native ISA for the
device, which can be passed to `zeModuleCreate` as `ZE_MODULE_FORMAT_NATIVE`
format.

:ref1: ../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc#level-zero-and-opencl-compatibility

[_Note:_ The interface to kernels in the device image is not defined in the
general case, which means there is no portable way to invoke kernels from a
Level Zero module that is created from the raw device image content.
However, see link:{ref1}[here] for a limited case where this portability is
guaranteed.
_{endnote}_]

=== Format on OpenCL

The format depends on the `State` of the device image:

* `bundle_state::input`: The format could change in the future, but it is
currently a SPIR-V module as described above for Level Zero.

* `bundle_state::object`: The device image content is an unspecified format that
is created by calling `clCompileProgram`.

* `bundle_state::executable`: The device image content is executable binary
device code representing one or more kernels, which can be passed to
`clCreateProgramWithBinary`.

[_Note:_ The interface to kernels in the device image is not defined in the
general case, which means there is no portable way to invoke kernels from a
OpenCL `cl_program` object that is created from the raw device image content.
However, see link:{ref1}[here] for a limited case where this portability is
guaranteed.
_{endnote}_]

=== Format on CUDA

The format depends on the `State` of the device image:

* `bundle_state::input`: The device image content is a PTX module representing
one or more kernels.

* `bundle_state::object`: ???
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AerialMantis: Can you help with this part? What is the device image content for an object state bundle on CUDA?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no meaningful state for the cuda and HIP backend, they each ignore transition to this state and jump to the last.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds the same as as with Level Zero. Can we say:

The same format as bundle_state::input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to restrict this extension to only bundle_state::executable, so I think the issue here is now moot.


* `bundle_state::executable`: The device image content is a CUBIN module
representing one or more kernels.


== Example

:ref2: ../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc

A kernel bundle can contain multiple device images with different

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to get information about device for a given device image?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly. You can indirectly ask if a device image is compatible with a device via device_image::has_kernel(kernel_id, device). There are a couple reasons why it's hard to provide a direct query:

  • For device images that are not in executable state, the device image could be compatible with many different devices. For example, a SPIR-V module can be compiled for many different devices, including future devices that don't even exist yet.

  • Even for executable device images, the image could be compatible with many different devices. For example, I think this is the case for Nvidia CUBIN modules.

I think we could add a query like device_image::is_compatible(device) if that would help. This would be similar to device_image::has_kernel(kernel_id, device), except it asks about the device image a whole, rather than a specific kernel.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if there is device_image::has_kernel(kernel_id, device), then I think it is fine.

representations of the same kernel for different devices.
This example shows how to get the device image content for a particular kernel
for a particular device.
Note that this example also uses the kernel syntax described in link:{ref2}[
sycl_ext_oneapi_free_function_kernels], but it is not necessary to define
kernels in that syntax when using this extension.

[source,c++]
----
#include <sycl/sycl.hpp>
namespace syclext = sycl::ext::oneapi;
namespace syclexp = sycl::ext::oneapi::experimental;

SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclexp::nd_range_kernel<1>))
void iota(float start, float *ptr) {
size_t id = syclext::this_work_item::get_nd_item().get_global_linear_id();
ptr[id] = start + static_cast<float>(id);
}

void main() {
sycl::device d;
sycl::queue q{d};
sycl::context ctxt = q.get_context();

// Get a kernel bundle that contains the kernel "iota".
sycl::kernel_id iota = syclexp::get_kernel_id<iota>();
auto exe_bndl =
sycl::get_kernel_bundle<sycl::bundle_state::executable>(ctxt, {iota});

std::vector<std::byte> bytes;
for (auto& img: bundle) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my education, are there any cases where there will be more than 1 image returned for this case? Except the multi-device case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question! For DPC++, this will be the case (at least for now).

@AerialMantis proposed the initial draft for the kernel bundle APIs, and he originally thought there could be multiple device images for a single device. I think the idea was that the compiler could create multiple versions that are optimized for different scenarios. I think this was just a hypothetical use case, though. DPC++ does not currently do this. If we did add this in the future, we would need some sort of query to distinguish between them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait. Are you asking if a kernel_bundle could contain more than one device image when the system has just one device? Yes, that can definitely happen. Consider, for example, a case where you ask for a bundle that contains two kernels:

sycl::kernel_id foo = syclexp::get_kernel_id<foo>();
sycl::kernel_id bar = syclexp::get_kernel_id<bar>();
auto exe_bndl =
    sycl::get_kernel_bundle<sycl::bundle_state::executable>(ctxt, {foo, bar});

There is no guarantee that foo and bar are in the same device image, so the kernel bundle might contain two device images in this case. The compiler decides how to pack kernels together into device images depending on the compilation flags (e.g. -fsycl-device-code-split) and also some internal algorithms.

// Search for the device image that contains "iota" for this device.
if (img.has_kernel(iota, dev)) {
bytes = img.ext_oneapi_get_content();
break;
}
}
}
----
Original file line number Diff line number Diff line change
Expand Up @@ -773,15 +773,14 @@ int main() {
```


[[level-zero-and-opencl-compatibility]]
== {dpcpp} guaranteed compatibility with Level Zero and OpenCL backends

The contents of this section are non-normative and apply only to the {dpcpp}
implementation.
Kernels written using the free function kernel syntax can be submitted to a
device by using the Level Zero or OpenCL backends, without going through the
SYCL host runtime APIs.
This works only when the kernel is AOT compiled to native device code using the
`-fsycl-targets` compiler option.

The interface to the kernel in the native device code module is only guaranteed
when the kernel adheres to the following restrictions:
Expand All @@ -795,6 +794,17 @@ when the kernel adheres to the following restrictions:
* The translation unit containing the kernel is compiled with the
`-fno-sycl-dead-args-optimization` option.

In order to invoke a kernel using Level Zero or OpenCL, the application must
first obtain the raw native ISA content of the device image that contains the
kernel.
One way to do this is by using
link:../proposed/sycl_ext_oneapi_device_image_content.asciidoc[
sycl_ext_oneapi_device_image_content] to obtain the content of the device image
in `bundle_state::executable` state.
It is also possible to compile the application in AOT mode via the
`-fsycl-targets` compiler option and then extract the device image from the
executable file.

Both Level Zero and OpenCL identify a kernel via a _name_ string.
(See `zeKernelCreate` and `clCreateKernel` in their respective specifications.)
When a kernel is defined according to the restrictions above, the _name_ is
Expand Down