From 65567ac85ee2b2d7976594b21a6e656c75bb2ed7 Mon Sep 17 00:00:00 2001 From: Gordon Brown Date: Thu, 5 Sep 2024 12:30:48 +0100 Subject: [PATCH] [SYCL] Minor changes to async memory alloc ext based on feedback --- ...ycl_ext_oneapi_async_memory_alloc.asciidoc | 122 +++++++++--------- 1 file changed, 60 insertions(+), 62 deletions(-) diff --git a/sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memory_alloc.asciidoc b/sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memory_alloc.asciidoc index 5bf043e0e07e..bbd4659e8583 100644 --- a/sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memory_alloc.asciidoc +++ b/sycl/doc/extensions/proposed/sycl_ext_oneapi_async_memory_alloc.asciidoc @@ -76,11 +76,11 @@ from that memory pool specifically, however, if no memory pool object is provided there is a default memory pool which will be used. The memory pool introduced is a dynamic memory pool, as such no memory is -immediately allocated on construction, instead memory is allocated by the OS or -an existing memory allocation provided by the user, for the pool when -allocations are made and are freed back to the OS or the memory allocation -provided by the user, when the queue which enqueued the malloc and free commands -is synchronized with. +immediately allocated on construction, instead memory is allocated by the SYCL +runtime or an existing memory allocation provided by the user, for the pool when +allocations are made and are freed back to the SYCL runtime or the memory +allocation provided by the user, when the queue which enqueued the malloc and +free commands is synchronized with. The immediate benefit of using the asynchronous malloc and free functions, as opposed to the existing synchronous malloc and free functions, is that they no @@ -95,8 +95,8 @@ re-use memory allocations from one kernel to another, without any intermediate freeing and re-allocation. This benefit can be further extended beyond queue synchronization by specifying a threshold, which will instruct the runtime to (if possible) maintain a certain size of memory and not release that back to the -OS or to the memory allocation provided by the user, even when the queue is -synchronized with. +SYCL runtime or to the memory allocation provided by the user, even when the +queue is synchronized with. There are also other properties which can be used when constructing a memory pool object to control other aspects of how the memory is allocated. @@ -121,8 +121,8 @@ this are not included in this version of the extension. There are various ways in which this extension can be used but a typical usage of the memory pool and the asynchronous malloc and free commands is described below. In this example an explicit memory pool is created and this is used to -share memory allocated from the OS between multiple asynchronous malloc -commands. +share memory allocated from the SYCL runtime between multiple asynchronous +malloc commands. [source,c++] ---- @@ -134,7 +134,7 @@ int main(int argc, char *argv[]) q.get_device(), usm::alloc::device); { - // memory pool allocates memory from the OS + // memory pool allocates memory from the SYCL runtime void *temp = async_malloc_from_pool(q, 1024, memPool); // memory allocation is used for first kernel @@ -159,7 +159,7 @@ int main(int argc, char *argv[]) async_free(q, temp); } - // memory pool releases memory back to the OS + // memory pool releases memory back to the SYCL runtime q.wait(); } ---- @@ -179,15 +179,15 @@ int main(int argc, char *argv[]) { void *temp = null; - // memory pool allocates memory from the OS + // memory pool allocates memory from the SYCL runtime auto e1 = q.submit_with_event([&](handler &cgh) { - temp = async_malloc_from_pool(q, 1024, memPool); + temp = async_malloc_from_pool(cgh, 1024, memPool); }); // memory allocation is used for first kernel auto e2 = q.submit_with_event([&](handler &cgh) { cgh.depends_on(e1); - parallel_for(q, range{1024}, [=](id<1> idx) { + parallel_for(cgh, range{1024}, [=](id<1> idx) { do_something(idx, temp); }); }); @@ -195,7 +195,7 @@ int main(int argc, char *argv[]) // memory is available to be used by another allocation auto e3 = q.submit_with_event([&](handler &cgh) { cgh.depends_on(e2); - async_free(q, temp); + async_free(cgh, temp); }); } @@ -204,13 +204,14 @@ int main(int argc, char *argv[]) // memory pool re-uses previously allocated memory auto e4 = q.submit_with_event([&](handler &cgh) { - temp = async_malloc_from_pool(q, 1024, memPool); + cgh.depends_on(e3); + temp = async_malloc_from_pool(cgh, 1024, memPool); }); // memory allocation is used for second kernel auto e5 = q.submit_with_event([&](handler &cgh) { cgh.depends_on(e4); - parallel_for(q, range{1024}, [=](id<1> idx) { + parallel_for(cgh, range{1024}, [=](id<1> idx) { do_something_else(idx, temp); }); }); @@ -218,11 +219,11 @@ int main(int argc, char *argv[]) // memory is available to be used by another allocation q.submit_with_event([&](handler &cgh) { cgh.depends_on(e5); - async_free(q, temp); + async_free(cgh, temp); }); } - // memory pool releases memory back to the OS + // memory pool releases memory back to the SYCL runtime q.wait(); } ---- @@ -231,9 +232,9 @@ Another example of memory pool usage is described in the example below. In this example rather than creating an explicit memory pool the default memory pool is being used instead. There is also additional queue synchronization between the commands enqueued which would ordinarily lead to memory being released back to -the OS, however, the threshold for the memory pool is extended to the SYCL RT -will maintain this memory allocation, and therefore still provide the benefit of -re-allocating memory from the memory pool. +the SYCL runtime, however, the allocation threshold for the memory pool is +extended so the memory pool maintains the allocations and therefore still +provide the benefit of re-allocating memory from the memory pool. [source,c++] ---- @@ -247,7 +248,7 @@ int main(int argc, char *argv[]) memPool.set_new_threshold(1024); { - // memory pool allocates memory from the OS + // memory pool allocates memory from the SYCL runtime void *temp = async_malloc_from_pool(q, 1024, memPool); // memory allocation is used for first kernel @@ -259,8 +260,8 @@ int main(int argc, char *argv[]) async_free(q, temp); } - // memory pool does not release memory back to the OS as it is still within - // the specified threshold + // memory pool does not release memory back to the SYCL runtime as it is still + // within the specified threshold q.wait(); { @@ -276,7 +277,7 @@ int main(int argc, char *argv[]) async_free(q, temp); } - // again memory pool does not release memory back to the OS + // again memory pool does not release memory back to the SYCL runtime q.wait(); } ---- @@ -312,9 +313,9 @@ SYCL common reference semantics. Memory pools have the following properties: -* A memory pool can allocate memory from two possible sources; either the OS or - an existing USM memory allocation provided by the user. The default source is - the OS. +* A memory pool can allocate memory from two possible sources; either the SYCL + runtime or an existing USM memory allocation provided by the user. The default + source is the SYCL runtime. * A maximum allocation size (in bytes) is used to manage the total amount of memory which can be allocated in the memory pool. If the maximum size is exceeded an error is thrown. The default maximum size is @@ -379,7 +380,7 @@ memory_pool(context ctx, Properties props = {}); ---- _Effects_: Constructs a memory pool associated with `ctx` and all SYCL devices -associated with it, with the allocation kind `usm::host` and applying any +associated with it, with the allocation kind `usm::alloc::host` and applying any properties in `props`. [source, c++] @@ -392,7 +393,7 @@ _Effects_: Constructs a memory pool associated with `ctx` and `dev`, with the allocation kind `kind` and applying any properties in `props`. _Throws_: An exception with the `errc::invalid` error code if `kind` is -`usm::host`. +`usm::alloc::host`. [source, c++] ---- @@ -518,8 +519,8 @@ inline constexpr use_existing_memory_key::value_t use_existing_memory; |`read_only` |The `read_only` property is a performance hint which asserts that all memory - allocations from the memory pool will only ever be read from, this can be used - by the SYCL runtime to optimize for performance. + allocations from the memory pool will only ever be read from within SYCL kernel + functions, this can be used by the SYCL runtime to optimize for performance. |`zero_init` |The `zero_init` property adds the requirement that all memory allocated to the @@ -532,13 +533,14 @@ inline constexpr use_existing_memory_key::value_t use_existing_memory; |`use_existing_memory` |The `use_existing_memory` property adds the requirement that the memory pool will use an existing USM memory allocation provided by the user instead of - allocating from the OS. This property takes a pointer to a valid USM memory - allocation of the same allocation kind as the memory pool is initialized with - and the size of that memory allocation. Using this property will implicitly set - the `maximum_size` and `initial_threshold` property to be that of the size - provided, and as such using the `maximum_size` or `initial_threshold` - properties in conjunction with this property will cause the `memory_pool` - constructor to throw an exception with the `errc::invalid` error code. + allocating from the SYCL runtime. This property takes a pointer to a valid USM + memory allocation of the same allocation kind as the memory pool is initialized + with and the size of that memory allocation. Using this property will + implicitly set the `maximum_size` and `initial_threshold` property to be that + of the size provided, and as such using the `maximum_size` or + `initial_threshold` properties in conjunction with this property will cause the + `memory_pool` constructor to throw an exception with the `errc::invalid` error + code. |=== @@ -572,7 +574,7 @@ memory_pool context::ext_oneapi_get_default_memory_pool() const; ---- _Returns_: The default memory pool associated with the context for allocating -with the allocation kind `usm::host`. +with the allocation kind `usm::alloc::host`. [source, c++] ---- @@ -580,7 +582,7 @@ memory_pool context::ext_oneapi_get_default_memory_pool(device dev) const; ---- _Returns_: The default memory pool associated with the context and `dev` for -allocating with the allocation kind `usm::device`. +allocating with the allocation kind `usm::alloc::device`. === Asynchronous malloc & free @@ -590,7 +592,7 @@ asynchronous malloc and free commands which operate with the memory pools also introduced in this extension. All enqueue functions introduced have overloads which take a SYCL `queue` and a -SYCL `handler`. None of enqueue functions return a SYCL `event` direction, as +SYCL `handler`. None of enqueue functions return a SYCL `event` directly, as this extension is in line with the link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[ sycl_ext_oneapi_enqueue_functions] extension, so events are returned when @@ -633,18 +635,16 @@ memory pool `pool` if provided, otherwise allocation from the default memory pool associated with the SYCL context and device associated with `q` or `h`. Memory is first allocated from the memory pool if possible, otherwise memory is allocated from the source to the memory pool to provide enough memory in the -memory pool for the allocation. +memory pool for the allocation. Accessing the memory at the address of the +pointer returned by asynchronous malloc functions before the command has +completed execution is undefined behavior. -_Returns_: A pointer to the address of a memory reservation. +_Returns_: A pointer to the address of a memory reservation if `size` is +non-zero, otherwise returns `nullptr`. -_Throws_: An exception with the `errc::invalid` error code if `size` is zero. -An exception with the `errc::memory_allocation` error code if the allocation -brings the memory pool over the maximum size. This error can be thrown -asynchronously. - -[_Note:_ Accessing the memory at the address of the pointer returned by -asynchronous malloc functions before the command has completed execution is -undefined behavior. _{endnote}_] +_Throws_: An exception with the `errc::memory_allocation` error code if the +allocation brings the memory pool over the maximum size. This error must be +thrown asynchronously. [source, c++] ---- @@ -658,14 +658,12 @@ will asynchronously free the memory allocation at the address of `ptr`. Memory will be freed from the memory pool to be used by other asynchronous malloc commands which execute later, and will not free to the source until the SYCL queue associated with the asynchronous allocation command has been synchronized -with. +with. Accessing the memory at the address of `ptr` after the asynchronous free +command has completed execution is undefined behavior. _Throws_: An exception with the `errc::invalid` error code if `ptr` is not the address of a memory allocation allocated to a memory pool. -[_Note:_ Accessing the memory at the address of `ptr` after the asynchronous -free command has completed execution is undefined behavior. _{endnote}_] - === Memory pool lifetimes @@ -735,14 +733,14 @@ that is not associated with the queue that the command is enqueued from, should this result in an error? -- -. Should we have a default memory pool for `usm::shared`? +. Should we have a default memory pool for `usm::alloc::shared`? + -- *UNRESOLVED*: Currently the proposed API means that there cannot be default -memory pool for allocations of allocation kind `usm::shared`, and therefore a -user must create their own explicit memory pool to do so. Is this reasonable or -should we extend the API to include a default memory pool for allocations of -allocation kind `usm::shared`? +memory pool for allocations of allocation kind `usm::alloc::shared`, and +therefore a user must create their own explicit memory pool to do so. Is this +reasonable or should we extend the API to include a default memory pool for +allocations of allocation kind `usm::alloc::shared`? -- . Should we allow setting a new threshold that is lower?