Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USM node clarifications and API generalization #66

Closed
wants to merge 2 commits into from

Conversation

reble
Copy link
Owner

@reble reble commented Jan 9, 2023

addressing #39

@reble reble added the Graph Specification Extension Specification related label Jan 9, 2023
Comment on lines +972 to +974
the memory. In that case the memory is accessible after the first execution of
a `command_graph`. When freed inside a graph, all memory is owned by and associated
to the lifetime of one specific `command_graph<state::executable>` object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit error prone in that users could add allocation nodes but never free the memory. I assume the intention is so that a graph can write some values to a shared allocation which is visible on the host as well, and since the memory isn't freed by the graph it can be accessed after execution on the host. However at that point wouldn't it make more sense for the user to just allocate the memory themselves outside of the graph and handle both allocation and freeing? Alternatively if we support memcpy as a node of the graph they can copy out to some memory that they manage themselves on the host.

I think the wording is also confusing because if there is a free node inside the graph the lifetime of the memory is tied to the execution of the graph not the executable object itself. In practice yes we'd likely not actually do a free to reuse the memory but this wording seems to guarantee that the memory lives until the destruction of the graph itself. This could lead users to think they can access the memory as long as the graph is alive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note that from cudaGraphAddMemAllocNode that there is similar behaviour in adding alloc nodes that don't need to be freed in the graph

If the allocation is not freed in the same graph, then it can be accessed not only by nodes in the graph which are ordered after the allocation node, but also by stream operations ordered after the graph's execution but before the allocation is freed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point, I think the wording could still be improved in this case though.

node add_malloc_device(void*& data, size_t numBytes, const property_list& propList = {});
template <typename T>
node
add_malloc(T*& data, size_t count, sycl::usm::alloc kind, const property_list& propList = {});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may need exception behaviour for when the device doesn't support the requested USM kind, however this may need to be in finalize as it's at that point we know what device we're targetting

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually curious how we can return a device USM pointer here, when we don't know what device we're targetting at this point in time.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocation won't happen before at least the executable graph object is created. That is when the context is know.

Copy link
Collaborator

@EwanC EwanC Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense, but what does it mean for the data return parameter? Would it be set to a placeholder for that USM allocation since we haven't done the allocation yet? I think we'll need some wording to let the user know what valid usage of the value returned in data is, e.g CUDA graphs says - "When cudaGraphAddMemAllocNode creates an allocation node, it returns the address of the allocation in nodeParams.dptr. The allocation's address remains fixed across instantiations and launches."

command_graph g;
float* foo = nullptr;
g.add_malloc<float>(foo, sycl::usm::alloc::device, n); // is foo set by this call to a placeholder?
auto node_i = g.add([&](sycl::handler& h) {
    h.parallel_for(n, [=](sycl::id<1> it){
      const size_t i = it[0];
      foo[i] = 3.14;
    });
}
auto exec_graph = g.finalize(q.get_context()); // Do the USM allocation
q.memset(foo, 42.0, sizeof(float);   // does foo still point to the placeholder?
sycl::free(foo, q.get_context());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just realized that a context could contain multiple devices, so we won't know which one to create the device allocation for at graph finalization time.

Looking at the core SYCL sycl::malloc & sycl::free USM functions, they either take a sycl::context & sycl::device or a sycl::queue, maybe add_malloc and add_free could do the same.

However, I'm now starting to think this discussion is now touching on aspects of the multi-device graph concept. i.e should we do something like just pass a context when we create the modifiable graph, and then have a parameter (queue or device) when creating the individual nodes to specify the exact device.

Comment on lines +972 to +974
the memory. In that case the memory is accessible after the first execution of
a `command_graph`. When freed inside a graph, all memory is owned by and associated
to the lifetime of one specific `command_graph<state::executable>` object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note that from cudaGraphAddMemAllocNode that there is similar behaviour in adding alloc nodes that don't need to be freed in the graph

If the allocation is not freed in the same graph, then it can be accessed not only by nodes in the graph which are ordered after the allocation node, but also by stream operations ordered after the graph's execution but before the allocation is freed.

Co-authored-by: Ewan Crawford <[email protected]>
Co-authored-by: Ben Tracy <[email protected]>
Comment on lines +971 to +974
If the memory is not freed inside a graph, it's the user's resposiblity to release
the memory. In that case the memory is accessible after the first execution of
a `command_graph`. When freed inside a graph, all memory is owned by and associated
to the lifetime of one specific `command_graph<state::executable>` object.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the memory is not freed inside a graph, it's the user's resposiblity to release
the memory. In that case the memory is accessible after the first execution of
a `command_graph`. When freed inside a graph, all memory is owned by and associated
to the lifetime of one specific `command_graph<state::executable>` object.
If the memory is not freed inside a graph, it's accessible outside a graph and the user
has the responsibility to release memory. Accessing memory before related
`command_graph<state::executable>` object is created or while it's being executed
is undefined behavior. When freed inside a graph, memory is internal to related
`command_graph<state::executable>` object. Access of nodes connected in the DAG before
related `malloc` and after related `free` node is undefined behavior.

@reble
Copy link
Owner Author

reble commented Mar 21, 2023

Memory allocation nodes have been removed from initial spec. Closed this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Graph Specification Extension Specification related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants