Skip to content

Commit

Permalink
[SYCL] Minor changes to async memory alloc ext based on feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
AerialMantis committed Oct 31, 2024
1 parent f7a211f commit 65567ac
Showing 1 changed file with 60 additions and 62 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -76,11 +76,11 @@ from that memory pool specifically, however, if no memory pool object is
provided there is a default memory pool which will be used.

The memory pool introduced is a dynamic memory pool, as such no memory is
immediately allocated on construction, instead memory is allocated by the OS or
an existing memory allocation provided by the user, for the pool when
allocations are made and are freed back to the OS or the memory allocation
provided by the user, when the queue which enqueued the malloc and free commands
is synchronized with.
immediately allocated on construction, instead memory is allocated by the SYCL
runtime or an existing memory allocation provided by the user, for the pool when
allocations are made and are freed back to the SYCL runtime or the memory
allocation provided by the user, when the queue which enqueued the malloc and
free commands is synchronized with.

The immediate benefit of using the asynchronous malloc and free functions, as
opposed to the existing synchronous malloc and free functions, is that they no
Expand All @@ -95,8 +95,8 @@ re-use memory allocations from one kernel to another, without any intermediate
freeing and re-allocation. This benefit can be further extended beyond queue
synchronization by specifying a threshold, which will instruct the runtime to
(if possible) maintain a certain size of memory and not release that back to the
OS or to the memory allocation provided by the user, even when the queue is
synchronized with.
SYCL runtime or to the memory allocation provided by the user, even when the
queue is synchronized with.

There are also other properties which can be used when constructing a memory
pool object to control other aspects of how the memory is allocated.
Expand All @@ -121,8 +121,8 @@ this are not included in this version of the extension.
There are various ways in which this extension can be used but a typical usage
of the memory pool and the asynchronous malloc and free commands is described
below. In this example an explicit memory pool is created and this is used to
share memory allocated from the OS between multiple asynchronous malloc
commands.
share memory allocated from the SYCL runtime between multiple asynchronous
malloc commands.

[source,c++]
----
Expand All @@ -134,7 +134,7 @@ int main(int argc, char *argv[])
q.get_device(), usm::alloc::device);
{
// memory pool allocates memory from the OS
// memory pool allocates memory from the SYCL runtime
void *temp = async_malloc_from_pool(q, 1024, memPool);
// memory allocation is used for first kernel
Expand All @@ -159,7 +159,7 @@ int main(int argc, char *argv[])
async_free(q, temp);
}
// memory pool releases memory back to the OS
// memory pool releases memory back to the SYCL runtime
q.wait();
}
----
Expand All @@ -179,23 +179,23 @@ int main(int argc, char *argv[])
{
void *temp = null;
// memory pool allocates memory from the OS
// memory pool allocates memory from the SYCL runtime
auto e1 = q.submit_with_event([&](handler &cgh) {
temp = async_malloc_from_pool(q, 1024, memPool);
temp = async_malloc_from_pool(cgh, 1024, memPool);
});
// memory allocation is used for first kernel
auto e2 = q.submit_with_event([&](handler &cgh) {
cgh.depends_on(e1);
parallel_for(q, range{1024}, [=](id<1> idx) {
parallel_for(cgh, range{1024}, [=](id<1> idx) {
do_something(idx, temp);
});
});
// memory is available to be used by another allocation
auto e3 = q.submit_with_event([&](handler &cgh) {
cgh.depends_on(e2);
async_free(q, temp);
async_free(cgh, temp);
});
}
Expand All @@ -204,25 +204,26 @@ int main(int argc, char *argv[])
// memory pool re-uses previously allocated memory
auto e4 = q.submit_with_event([&](handler &cgh) {
temp = async_malloc_from_pool(q, 1024, memPool);
cgh.depends_on(e3);
temp = async_malloc_from_pool(cgh, 1024, memPool);
});
// memory allocation is used for second kernel
auto e5 = q.submit_with_event([&](handler &cgh) {
cgh.depends_on(e4);
parallel_for(q, range{1024}, [=](id<1> idx) {
parallel_for(cgh, range{1024}, [=](id<1> idx) {
do_something_else(idx, temp);
});
});
// memory is available to be used by another allocation
q.submit_with_event([&](handler &cgh) {
cgh.depends_on(e5);
async_free(q, temp);
async_free(cgh, temp);
});
}
// memory pool releases memory back to the OS
// memory pool releases memory back to the SYCL runtime
q.wait();
}
----
Expand All @@ -231,9 +232,9 @@ Another example of memory pool usage is described in the example below. In this
example rather than creating an explicit memory pool the default memory pool is
being used instead. There is also additional queue synchronization between the
commands enqueued which would ordinarily lead to memory being released back to
the OS, however, the threshold for the memory pool is extended to the SYCL RT
will maintain this memory allocation, and therefore still provide the benefit of
re-allocating memory from the memory pool.
the SYCL runtime, however, the allocation threshold for the memory pool is
extended so the memory pool maintains the allocations and therefore still
provide the benefit of re-allocating memory from the memory pool.

[source,c++]
----
Expand All @@ -247,7 +248,7 @@ int main(int argc, char *argv[])
memPool.set_new_threshold(1024);
{
// memory pool allocates memory from the OS
// memory pool allocates memory from the SYCL runtime
void *temp = async_malloc_from_pool(q, 1024, memPool);
// memory allocation is used for first kernel
Expand All @@ -259,8 +260,8 @@ int main(int argc, char *argv[])
async_free(q, temp);
}
// memory pool does not release memory back to the OS as it is still within
// the specified threshold
// memory pool does not release memory back to the SYCL runtime as it is still
// within the specified threshold
q.wait();
{
Expand All @@ -276,7 +277,7 @@ int main(int argc, char *argv[])
async_free(q, temp);
}
// again memory pool does not release memory back to the OS
// again memory pool does not release memory back to the SYCL runtime
q.wait();
}
----
Expand Down Expand Up @@ -312,9 +313,9 @@ SYCL common reference semantics.

Memory pools have the following properties:

* A memory pool can allocate memory from two possible sources; either the OS or
an existing USM memory allocation provided by the user. The default source is
the OS.
* A memory pool can allocate memory from two possible sources; either the SYCL
runtime or an existing USM memory allocation provided by the user. The default
source is the SYCL runtime.
* A maximum allocation size (in bytes) is used to manage the total amount of
memory which can be allocated in the memory pool. If the maximum size is
exceeded an error is thrown. The default maximum size is
Expand Down Expand Up @@ -379,7 +380,7 @@ memory_pool(context ctx, Properties props = {});
----

_Effects_: Constructs a memory pool associated with `ctx` and all SYCL devices
associated with it, with the allocation kind `usm::host` and applying any
associated with it, with the allocation kind `usm::alloc::host` and applying any
properties in `props`.

[source, c++]
Expand All @@ -392,7 +393,7 @@ _Effects_: Constructs a memory pool associated with `ctx` and `dev`, with the
allocation kind `kind` and applying any properties in `props`.

_Throws_: An exception with the `errc::invalid` error code if `kind` is
`usm::host`.
`usm::alloc::host`.

[source, c++]
----
Expand Down Expand Up @@ -518,8 +519,8 @@ inline constexpr use_existing_memory_key::value_t use_existing_memory;

|`read_only`
|The `read_only` property is a performance hint which asserts that all memory
allocations from the memory pool will only ever be read from, this can be used
by the SYCL runtime to optimize for performance.
allocations from the memory pool will only ever be read from within SYCL kernel
functions, this can be used by the SYCL runtime to optimize for performance.

|`zero_init`
|The `zero_init` property adds the requirement that all memory allocated to the
Expand All @@ -532,13 +533,14 @@ inline constexpr use_existing_memory_key::value_t use_existing_memory;
|`use_existing_memory`
|The `use_existing_memory` property adds the requirement that the memory pool
will use an existing USM memory allocation provided by the user instead of
allocating from the OS. This property takes a pointer to a valid USM memory
allocation of the same allocation kind as the memory pool is initialized with
and the size of that memory allocation. Using this property will implicitly set
the `maximum_size` and `initial_threshold` property to be that of the size
provided, and as such using the `maximum_size` or `initial_threshold`
properties in conjunction with this property will cause the `memory_pool`
constructor to throw an exception with the `errc::invalid` error code.
allocating from the SYCL runtime. This property takes a pointer to a valid USM
memory allocation of the same allocation kind as the memory pool is initialized
with and the size of that memory allocation. Using this property will
implicitly set the `maximum_size` and `initial_threshold` property to be that
of the size provided, and as such using the `maximum_size` or
`initial_threshold` properties in conjunction with this property will cause the
`memory_pool` constructor to throw an exception with the `errc::invalid` error
code.

|===

Expand Down Expand Up @@ -572,15 +574,15 @@ memory_pool context::ext_oneapi_get_default_memory_pool() const;
----

_Returns_: The default memory pool associated with the context for allocating
with the allocation kind `usm::host`.
with the allocation kind `usm::alloc::host`.

[source, c++]
----
memory_pool context::ext_oneapi_get_default_memory_pool(device dev) const;
----

_Returns_: The default memory pool associated with the context and `dev` for
allocating with the allocation kind `usm::device`.
allocating with the allocation kind `usm::alloc::device`.


=== Asynchronous malloc & free
Expand All @@ -590,7 +592,7 @@ asynchronous malloc and free commands which operate with the memory pools also
introduced in this extension.

All enqueue functions introduced have overloads which take a SYCL `queue` and a
SYCL `handler`. None of enqueue functions return a SYCL `event` direction, as
SYCL `handler`. None of enqueue functions return a SYCL `event` directly, as
this extension is in line with the
link:../experimental/sycl_ext_oneapi_enqueue_functions.asciidoc[
sycl_ext_oneapi_enqueue_functions] extension, so events are returned when
Expand Down Expand Up @@ -633,18 +635,16 @@ memory pool `pool` if provided, otherwise allocation from the default memory
pool associated with the SYCL context and device associated with `q` or `h`.
Memory is first allocated from the memory pool if possible, otherwise memory is
allocated from the source to the memory pool to provide enough memory in the
memory pool for the allocation.
memory pool for the allocation. Accessing the memory at the address of the
pointer returned by asynchronous malloc functions before the command has
completed execution is undefined behavior.

_Returns_: A pointer to the address of a memory reservation.
_Returns_: A pointer to the address of a memory reservation if `size` is
non-zero, otherwise returns `nullptr`.

_Throws_: An exception with the `errc::invalid` error code if `size` is zero.
An exception with the `errc::memory_allocation` error code if the allocation
brings the memory pool over the maximum size. This error can be thrown
asynchronously.

[_Note:_ Accessing the memory at the address of the pointer returned by
asynchronous malloc functions before the command has completed execution is
undefined behavior. _{endnote}_]
_Throws_: An exception with the `errc::memory_allocation` error code if the
allocation brings the memory pool over the maximum size. This error must be
thrown asynchronously.

[source, c++]
----
Expand All @@ -658,14 +658,12 @@ will asynchronously free the memory allocation at the address of `ptr`. Memory
will be freed from the memory pool to be used by other asynchronous malloc
commands which execute later, and will not free to the source until the SYCL
queue associated with the asynchronous allocation command has been synchronized
with.
with. Accessing the memory at the address of `ptr` after the asynchronous free
command has completed execution is undefined behavior.

_Throws_: An exception with the `errc::invalid` error code if `ptr` is not the
address of a memory allocation allocated to a memory pool.

[_Note:_ Accessing the memory at the address of `ptr` after the asynchronous
free command has completed execution is undefined behavior. _{endnote}_]


=== Memory pool lifetimes

Expand Down Expand Up @@ -735,14 +733,14 @@ that is not associated with the queue that the command is enqueued from, should
this result in an error?
--

. Should we have a default memory pool for `usm::shared`?
. Should we have a default memory pool for `usm::alloc::shared`?
+
--
*UNRESOLVED*: Currently the proposed API means that there cannot be default
memory pool for allocations of allocation kind `usm::shared`, and therefore a
user must create their own explicit memory pool to do so. Is this reasonable or
should we extend the API to include a default memory pool for allocations of
allocation kind `usm::shared`?
memory pool for allocations of allocation kind `usm::alloc::shared`, and
therefore a user must create their own explicit memory pool to do so. Is this
reasonable or should we extend the API to include a default memory pool for
allocations of allocation kind `usm::alloc::shared`?
--

. Should we allow setting a new threshold that is lower?
Expand Down

0 comments on commit 65567ac

Please sign in to comment.