Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Doc] Add spec to get device image backend content #14811

Merged
merged 10 commits into from
Dec 3, 2024
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
= sycl_ext_oneapi_device_image_backend_content

:source-highlighter: coderay
:coderay-linenums-mode: table

// This section needs to be after the document title.
:doctype: book
:toc2:
:toc: left
:encoding: utf-8
:lang: en
:dpcpp: pass:[DPC++]
:endnote: —{nbsp}end{nbsp}note

// Set the default source code type in this document to C++,
// for syntax highlighting purposes. This is needed because
// docbook uses c++ and html5 uses cpp.
:language: {basebackend@docbook:c++:cpp}


== Notice

[%hardbreaks]
Copyright (C) 2024 Intel Corporation. All rights reserved.

Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by
permission by Khronos.


== Contact

To report problems with this extension, please open a new issue at:

https://github.com/intel/llvm/issues


== Dependencies

This extension is written against the SYCL 2020 revision 9 specification.
All references below to the "core SYCL specification" or to section numbers in
the SYCL specification refer to that revision.


== Status

This is a proposed extension specification, intended to gather community
feedback.
Interfaces defined in this specification may not be implemented yet or may be
in a preliminary state.
The specification itself may also change in incompatible ways before it is
finalized.
*Shipping software products should not rely on APIs defined in this
specification.*


== Overview

This extension adds a mechanism to obtain the raw backend content of the device
images that are in a kernel bundle.
The format of this content is implementation-defined, so applications that make
use of this extension are normally not portable to other implementations of
SYCL.
gmlueck marked this conversation as resolved.
Show resolved Hide resolved


== Specification

=== Feature test macro

This extension provides a feature-test macro as described in the core SYCL
specification.
An implementation supporting this extension must predefine the macro
`SYCL_EXT_ONEAPI_DEVICE_IMAGE_BACKEND_CONTENT` to one of the values defined in
the table below.
Applications can test for the existence of this macro to determine if the
implementation supports this feature, or applications can test the macro's
value to determine which of the extension's features the implementation
supports.

[%header,cols="1,5"]
|===
|Value
|Description

|1
|The APIs of this experimental extension are not versioned, so the
feature-test macro always has this value.
|===

=== New member functions in the `device_image` class

This extension adds the following member functions to the `device_image` class.

[source,c++]
----
namespace sycl {

template <bundle_state State>
class device_image {
public:
std::vector<std::byte> ext_oneapi_get_backend_content() const;
std::span<std::byte> ext_oneapi_get_backend_content_view() const;
gmlueck marked this conversation as resolved.
Show resolved Hide resolved

/*...*/
};

} // namespace sycl
----

'''

[frame=all,grid=none,separator="@"]
!====
a@
[source,c++]
----
std::vector<std::byte> ext_oneapi_get_backend_content() const;
----
!====

_Constraints:_ Available only when `State` is `bundle_state::executable`.

_Returns:_ A copy of the raw backend content for this device image.
The format of this data is implementation-defined.
See below for a description of this format on {dpcpp}.
gmlueck marked this conversation as resolved.
Show resolved Hide resolved

'''

[frame=all,grid=none,separator="@"]
!====
a@
[source,c++]
----
std::span<std::byte> ext_oneapi_get_content_backend_view() const;
----
!====

Available only when the compiler is {cpp}20 or higher.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should drop this, and rely on the ifdef in the synopsis. This is really saying "Available only when the library provides an implementation of std::span", and that's implied by usage of the type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to agree on standard wording for cases like this because I think it will come up more and more.

I feel like we should say somewhere that this API is tied to C++20. How do you feel about the wording I added in d288119?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have said this is good enough, but if you'd like to settle on standard wording I'd like to talk about it a bit more.

I think it would be more accurate to say that the implementation defines the macro. ISO C++ says the implementation only defines the macro if you include <version> or <span>.

Given the above, do you want to say anything about who is responsible for including <span> somewhere? It probably shouldn't be part of the function synopsis, but we should make it clear that if this extension is implemented then sycl.hpp is expected to include <span> if it's available (see __has_include), and that this isn't a user's responsibility. For the actual SYCL specification, I imagine we'd have a single dedicated section to explain how compilation with different versions of C++ works, and we could cover this there instead of in each extension.

Finally, I think we could improve the formatting. We should add a title to the paragraph (like we do with Constraints, etc) and consider rephrasing the "Available only when" (to further differentiate from constraints). How about something like:

Required C++ Features: The implementation must define the __cpp_lib_span feature-test macro (which is defined in C++20 and higher).

I deliberately put "C++" in the title in an attempt to be future-proof. Even if ISO C++ decided to introduce a similar concept with the same name, it would probably just be called "Required Features" (since the C++ would be implied).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is making me think that my original wording was better. There are plenty of C++ feature-test macros even for features that exist in C++17. For example, __cpp_lib_byte tells whether std::byte is available. Are we going to add similar wording for every API that uses std::byte or every API that uses any C++ feature with a corresponding feature-test macro? This seems unnecessarily complex to me.

I think the SYCL spec should assume a C++ compiler that is fully conformant to whatever C++ version it claims to be. Implementations can then decide to make some of these APIs conditionally available for the benefit of partially conformant C++ compilers if they so desire. But, this would be a quality of implementation thing, not part of the specification.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be unnecessary to go back and add the features for C++17, but I think that's because SYCL currently requires C++17 as a base and so we can assume that all C++17 features are available. If we were to change that and say that SYCL is compatible with every version of C++, but that different subsets of features are available based on the capabilities of your C++ implementation, then I absolutely think we should clearly label the requirements of each SYCL feature.

What I like about separating the macros from the C++ version is that it allows for a world where developers can opt-in to certain things, which I think we (and other implementations) might want.

For example, let's say that we design a feature (like a replacement for accessors) that uses std::mdspan. I think we'd want to make that available to as many developers as possible as quickly as possible. But there may be a lot of developers who are not willing/able to adopt C++23, or there might only be incomplete support for C++23 in their compiler/library. In such a situation, we could add a compiler option to make std::mdspan available with earlier versions of C++: DPC++ would provide <mdspan>, __has_include(<mdspan>) would evaluate to true, and __cpp_lib_mdspan would be defined appropriately, but the __cplusplus macro would still evaluate to C++17.

I believe my thinking here is aligned with the reasoning behind introducing these macros in the first place (e.g., see https://isocpp.org/std/standing-documents/sd-6-sg10-feature-test-recommendations#explanation-and-rationale-for-the-approach).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is going to be a longer conversation, and I'd like to get implementation started on this extension. Would you be OK approving what we have now, and I'll open an issue in the Khronos repo to continue discussion about the precise wording that we'll use going forward?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, absolutely!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


_Constraints:_ Available only when `State` is `bundle_state::executable`.

_Returns:_ A view of the raw backend content for this device image.
The data behind this view has the same lifetime as the `kernel_bundle` that
contains this `device_image`.
gmlueck marked this conversation as resolved.
Show resolved Hide resolved

'''


== Device image format for {dpcpp}

This section is non-normative and applies only to the {dpcpp} implementation.
The format of the data returned by
`device_image::ext_oneapi_get_backend_content` and
`device_image::ext_oneapi_get_backend_content_view` depends on the backend of the
kernel bundle that contains the device image.
Pennycook marked this conversation as resolved.
Show resolved Hide resolved

=== Format on Level Zero

The device image's backend content is native ISA for the device, which can be
AlexeySachkov marked this conversation as resolved.
Show resolved Hide resolved
passed to `zeModuleCreate` as `ZE_MODULE_FORMAT_NATIVE` format.

:ref1: ../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc#level-zero-and-opencl-compatibility

[_Note:_ The interface to kernels in the device image backend content is not
defined in the general case, which means there is no portable way to invoke
kernels from a Level Zero module that is created from the raw device image
content.
However, see link:{ref1}[here] for a limited case where this portability is
guaranteed.
_{endnote}_]

=== Format on OpenCL

The device image's backend content is executable binary device code representing
one or more kernels, which can be passed to `clCreateProgramWithBinary`.

[_Note:_ The interface to kernels in the device image backend content is not
defined in the general case, which means there is no portable way to invoke
kernels from a OpenCL `cl_program` object that is created from the raw device
image content.
However, see link:{ref1}[here] for a limited case where this portability is
guaranteed.
_{endnote}_]

=== Format on CUDA

The device image's backend content is a CUBIN module representing one or more
kernels.


== Example

:ref2: ../proposed/sycl_ext_oneapi_free_function_kernels.asciidoc

A kernel bundle can contain multiple device images with different
representations of the same kernel for different devices.
This example shows how to get the device image's backend content for a
particular kernel for a particular device.
Note that this example also uses the kernel syntax described in link:{ref2}[
sycl_ext_oneapi_free_function_kernels], but it is not necessary to define
kernels in that syntax when using this extension.

[source,c++]
----
#include <sycl/sycl.hpp>
namespace syclext = sycl::ext::oneapi;
namespace syclexp = sycl::ext::oneapi::experimental;

SYCL_EXT_ONEAPI_FUNCTION_PROPERTY((syclexp::nd_range_kernel<1>))
void iota(float start, float *ptr) {
size_t id = syclext::this_work_item::get_nd_item().get_global_linear_id();
ptr[id] = start + static_cast<float>(id);
}

void main() {
sycl::device d;
sycl::queue q{d};
sycl::context ctxt = q.get_context();

// Get a kernel bundle that contains the kernel "iota".
sycl::kernel_id iota = syclexp::get_kernel_id<iota>();
auto exe_bndl =
sycl::get_kernel_bundle<sycl::bundle_state::executable>(ctxt, {iota});

std::vector<std::byte> bytes;
for (auto& img: bundle) {
// Search for the device image that contains "iota" for this device.
if (img.has_kernel(iota, dev)) {
bytes = img.ext_oneapi_get_backend_content();
break;
}
}
}
----
Original file line number Diff line number Diff line change
Expand Up @@ -883,15 +883,14 @@ int main() {
```


[[level-zero-and-opencl-compatibility]]
== {dpcpp} guaranteed compatibility with Level Zero and OpenCL backends

The contents of this section are non-normative and apply only to the {dpcpp}
implementation.
Kernels written using the free function kernel syntax can be submitted to a
device by using the Level Zero or OpenCL backends, without going through the
SYCL host runtime APIs.
This works only when the kernel is AOT compiled to native device code using the
`-fsycl-targets` compiler option.

The interface to the kernel in the native device code module is only guaranteed
when the kernel adheres to the following restrictions:
Expand All @@ -905,6 +904,16 @@ when the kernel adheres to the following restrictions:
* The translation unit containing the kernel is compiled with the
`-fno-sycl-dead-args-optimization` option.

In order to invoke a kernel using Level Zero or OpenCL, the application must
first obtain the raw backend content of the device image that contains the
kernel.
One way to do this is by using
link:../proposed/sycl_ext_oneapi_device_image_backend_content.asciidoc[
sycl_ext_oneapi_device_image_backend_content].
It is also possible to compile the application in AOT mode via the
`-fsycl-targets` compiler option and then extract the device image's backend
content from the executable file.

Both Level Zero and OpenCL identify a kernel via a _name_ string.
(See `zeKernelCreate` and `clCreateKernel` in their respective specifications.)
When a kernel is defined according to the restrictions above, the _name_ is
Expand Down