-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USM node clarifications and API generalization #66
Conversation
the memory. In that case the memory is accessible after the first execution of | ||
a `command_graph`. When freed inside a graph, all memory is owned by and associated | ||
to the lifetime of one specific `command_graph<state::executable>` object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit error prone in that users could add allocation nodes but never free the memory. I assume the intention is so that a graph can write some values to a shared allocation which is visible on the host as well, and since the memory isn't freed by the graph it can be accessed after execution on the host. However at that point wouldn't it make more sense for the user to just allocate the memory themselves outside of the graph and handle both allocation and freeing? Alternatively if we support memcpy as a node of the graph they can copy out to some memory that they manage themselves on the host.
I think the wording is also confusing because if there is a free node inside the graph the lifetime of the memory is tied to the execution of the graph not the executable object itself. In practice yes we'd likely not actually do a free to reuse the memory but this wording seems to guarantee that the memory lives until the destruction of the graph itself. This could lead users to think they can access the memory as long as the graph is alive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note that from cudaGraphAddMemAllocNode that there is similar behaviour in adding alloc nodes that don't need to be freed in the graph
If the allocation is not freed in the same graph, then it can be accessed not only by nodes in the graph which are ordered after the allocation node, but also by stream operations ordered after the graph's execution but before the allocation is freed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point, I think the wording could still be improved in this case though.
node add_malloc_device(void*& data, size_t numBytes, const property_list& propList = {}); | ||
template <typename T> | ||
node | ||
add_malloc(T*& data, size_t count, sycl::usm::alloc kind, const property_list& propList = {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may need exception behaviour for when the device doesn't support the requested USM kind, however this may need to be in finalize as it's at that point we know what device we're targetting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually curious how we can return a device USM pointer here, when we don't know what device we're targetting at this point in time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The allocation won't happen before at least the executable graph object is created. That is when the context is know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That make sense, but what does it mean for the data
return parameter? Would it be set to a placeholder for that USM allocation since we haven't done the allocation yet? I think we'll need some wording to let the user know what valid usage of the value returned in data
is, e.g CUDA graphs says - "When cudaGraphAddMemAllocNode creates an allocation node, it returns the address of the allocation in nodeParams.dptr. The allocation's address remains fixed across instantiations and launches."
command_graph g;
float* foo = nullptr;
g.add_malloc<float>(foo, sycl::usm::alloc::device, n); // is foo set by this call to a placeholder?
auto node_i = g.add([&](sycl::handler& h) {
h.parallel_for(n, [=](sycl::id<1> it){
const size_t i = it[0];
foo[i] = 3.14;
});
}
auto exec_graph = g.finalize(q.get_context()); // Do the USM allocation
q.memset(foo, 42.0, sizeof(float); // does foo still point to the placeholder?
sycl::free(foo, q.get_context());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just realized that a context could contain multiple devices, so we won't know which one to create the device allocation for at graph finalization time.
Looking at the core SYCL sycl::malloc
& sycl::free
USM functions, they either take a sycl::context
& sycl::device
or a sycl::queue
, maybe add_malloc
and add_free
could do the same.
However, I'm now starting to think this discussion is now touching on aspects of the multi-device graph concept. i.e should we do something like just pass a context when we create the modifiable graph, and then have a parameter (queue or device) when creating the individual nodes to specify the exact device.
the memory. In that case the memory is accessible after the first execution of | ||
a `command_graph`. When freed inside a graph, all memory is owned by and associated | ||
to the lifetime of one specific `command_graph<state::executable>` object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note that from cudaGraphAddMemAllocNode that there is similar behaviour in adding alloc nodes that don't need to be freed in the graph
If the allocation is not freed in the same graph, then it can be accessed not only by nodes in the graph which are ordered after the allocation node, but also by stream operations ordered after the graph's execution but before the allocation is freed.
Co-authored-by: Ewan Crawford <[email protected]> Co-authored-by: Ben Tracy <[email protected]>
If the memory is not freed inside a graph, it's the user's resposiblity to release | ||
the memory. In that case the memory is accessible after the first execution of | ||
a `command_graph`. When freed inside a graph, all memory is owned by and associated | ||
to the lifetime of one specific `command_graph<state::executable>` object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the memory is not freed inside a graph, it's the user's resposiblity to release | |
the memory. In that case the memory is accessible after the first execution of | |
a `command_graph`. When freed inside a graph, all memory is owned by and associated | |
to the lifetime of one specific `command_graph<state::executable>` object. | |
If the memory is not freed inside a graph, it's accessible outside a graph and the user | |
has the responsibility to release memory. Accessing memory before related | |
`command_graph<state::executable>` object is created or while it's being executed | |
is undefined behavior. When freed inside a graph, memory is internal to related | |
`command_graph<state::executable>` object. Access of nodes connected in the DAG before | |
related `malloc` and after related `free` node is undefined behavior. |
Memory allocation nodes have been removed from initial spec. Closed this PR. |
addressing #39