USM node clarifications and API generalization #66

reble · 2023-01-09T22:32:32Z

addressing #39

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc

Bensuo · 2023-01-10T13:03:40Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc

+the memory. In that case the memory is accessible after the first execution of
+a `command_graph`. When freed inside a graph, all memory is owned by and associated
+to the lifetime of one specific `command_graph<state::executable>` object.


This seems a bit error prone in that users could add allocation nodes but never free the memory. I assume the intention is so that a graph can write some values to a shared allocation which is visible on the host as well, and since the memory isn't freed by the graph it can be accessed after execution on the host. However at that point wouldn't it make more sense for the user to just allocate the memory themselves outside of the graph and handle both allocation and freeing? Alternatively if we support memcpy as a node of the graph they can copy out to some memory that they manage themselves on the host.

I think the wording is also confusing because if there is a free node inside the graph the lifetime of the memory is tied to the execution of the graph not the executable object itself. In practice yes we'd likely not actually do a free to reuse the memory but this wording seems to guarantee that the memory lives until the destruction of the graph itself. This could lead users to think they can access the memory as long as the graph is alive.

A note that from cudaGraphAddMemAllocNode that there is similar behaviour in adding alloc nodes that don't need to be freed in the graph

If the allocation is not freed in the same graph, then it can be accessed not only by nodes in the graph which are ordered after the allocation node, but also by stream operations ordered after the graph's execution but before the allocation is freed.

That's a fair point, I think the wording could still be improved in this case though.

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc

EwanC · 2023-01-10T14:20:16Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc

-node add_malloc_device(void*& data, size_t numBytes, const property_list& propList = {});
+template <typename T>
+node 
+add_malloc(T*& data, size_t count, sycl::usm::alloc kind, const property_list& propList = {});


I think we may need exception behaviour for when the device doesn't support the requested USM kind, however this may need to be in finalize as it's at that point we know what device we're targetting

I'm actually curious how we can return a device USM pointer here, when we don't know what device we're targetting at this point in time.

The allocation won't happen before at least the executable graph object is created. That is when the context is know.

That make sense, but what does it mean for the data return parameter? Would it be set to a placeholder for that USM allocation since we haven't done the allocation yet? I think we'll need some wording to let the user know what valid usage of the value returned in data is, e.g CUDA graphs says - "When cudaGraphAddMemAllocNode creates an allocation node, it returns the address of the allocation in nodeParams.dptr. The allocation's address remains fixed across instantiations and launches."

command_graph g; float* foo = nullptr; g.add_malloc<float>(foo, sycl::usm::alloc::device, n); // is foo set by this call to a placeholder? auto node_i = g.add([&](sycl::handler& h) { h.parallel_for(n, [=](sycl::id<1> it){ const size_t i = it[0]; foo[i] = 3.14; }); } auto exec_graph = g.finalize(q.get_context()); // Do the USM allocation q.memset(foo, 42.0, sizeof(float); // does foo still point to the placeholder? sycl::free(foo, q.get_context());

I've just realized that a context could contain multiple devices, so we won't know which one to create the device allocation for at graph finalization time.

Looking at the core SYCL sycl::malloc & sycl::free USM functions, they either take a sycl::context & sycl::device or a sycl::queue, maybe add_malloc and add_free could do the same.

However, I'm now starting to think this discussion is now touching on aspects of the multi-device graph concept. i.e should we do something like just pass a context when we create the modifiable graph, and then have a parameter (queue or device) when creating the individual nodes to specify the exact device.

EwanC · 2023-01-10T14:27:44Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc

+the memory. In that case the memory is accessible after the first execution of
+a `command_graph`. When freed inside a graph, all memory is owned by and associated
+to the lifetime of one specific `command_graph<state::executable>` object.


A note that from cudaGraphAddMemAllocNode that there is similar behaviour in adding alloc nodes that don't need to be freed in the graph

If the allocation is not freed in the same graph, then it can be accessed not only by nodes in the graph which are ordered after the allocation node, but also by stream operations ordered after the graph's execution but before the allocation is freed.

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc

Co-authored-by: Ewan Crawford <[email protected]> Co-authored-by: Ben Tracy <[email protected]>

reble · 2023-01-18T14:28:34Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_graph.asciidoc

+If the memory is not freed inside a graph, it's the user's resposiblity to release 
+the memory. In that case the memory is accessible after the first execution of
+a `command_graph`. When freed inside a graph, all memory is owned by and associated
+to the lifetime of one specific `command_graph<state::executable>` object.


Suggested change

If the memory is not freed inside a graph, it's the user's resposiblity to release

the memory. In that case the memory is accessible after the first execution of

a `command_graph`. When freed inside a graph, all memory is owned by and associated

to the lifetime of one specific `command_graph<state::executable>` object.

If the memory is not freed inside a graph, it's accessible outside a graph and the user

has the responsibility to release memory. Accessing memory before related

`command_graph<state::executable>` object is created or while it's being executed

is undefined behavior. When freed inside a graph, memory is internal to related

`command_graph<state::executable>` object. Access of nodes connected in the DAG before

related `malloc` and after related `free` node is undefined behavior.

reble · 2023-03-21T19:39:37Z

Memory allocation nodes have been removed from initial spec. Closed this PR.

some clarifications and malloc API generalization

a8bb2cd

reble added the Graph Specification Extension Specification related label Jan 9, 2023

Bensuo reviewed Jan 10, 2023

View reviewed changes

EwanC reviewed Jan 10, 2023

View reviewed changes

Apply suggestions from code review

6a2f2f6

Co-authored-by: Ewan Crawford <[email protected]> Co-authored-by: Ben Tracy <[email protected]>

reble commented Jan 18, 2023

View reviewed changes

EwanC mentioned this pull request Feb 20, 2023

[SYCL] Remove memory allocation/free nodes #79

Merged

reble closed this Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

USM node clarifications and API generalization #66

USM node clarifications and API generalization #66

reble commented Jan 9, 2023

Bensuo Jan 10, 2023

EwanC Jan 10, 2023

Bensuo Jan 10, 2023

EwanC Jan 10, 2023

EwanC Jan 10, 2023

reble Jan 11, 2023

EwanC Jan 11, 2023 •

edited

Loading

EwanC Jan 13, 2023

EwanC Jan 10, 2023

reble Jan 18, 2023

reble commented Mar 21, 2023

-If the memory is not freed inside a graph, it's the user's resposiblity to release
-the memory. In that case the memory is accessible after the first execution of
-a `command_graph`. When freed inside a graph, all memory is owned by and associated
-to the lifetime of one specific `command_graph<state::executable>` object.
+If the memory is not freed inside a graph, it's accessible outside a graph and the user
+has the responsibility to release memory. Accessing memory before related
+`command_graph<state::executable>` object is created or while it's being executed
+is undefined behavior. When freed inside a graph, memory is internal to related
+`command_graph<state::executable>` object. Access of nodes connected in the DAG before
+related `malloc` and after related `free` node is undefined behavior.

USM node clarifications and API generalization #66

USM node clarifications and API generalization #66

Conversation

reble commented Jan 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EwanC Jan 11, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reble commented Mar 21, 2023

EwanC Jan 11, 2023 •

edited

Loading