Revise RAJA plugin support #1742

adayton1 · 2024-09-24T19:24:21Z

Is your feature request related to a problem? Please describe.

RAJA plugins are used by CHAI to make sure the data backing ManagedArray is in the correct memory space and that it is up to date. However, the approach used now is not stream aware. This leads to suboptimal performance on GPU platforms. Where there is a dual memory space (CUDA), memory copies to the host are done on stream 0, which forces the whole device to synchronize. Where there is a single memory space (HIP), we have to do a synchronize across the whole device to make sure the data is valid during host accesses.

Describe the solution you'd like

Making CHAI stream aware would be relatively straightforward if the camp resource used by RAJA was passed as an argument to the plugin functions. Additionally, the postLaunch function should also receive an event with a wait method that CHAI can call when it needs to be sure the kernel has been completed.

Describe alternatives you've considered

Instead of modifying the plugin, RAJA could set some global state that is accessible when the plugin methods are called.

Additional context

Umpire is working on camp resource aware allocators (LLNL/Umpire#901), which CHAI will also be using.

Also, note that even if only one stream is being used in an application, this new approach will be more efficient than synchronizing across the whole device.

adayton1 · 2024-09-25T16:36:09Z

Actually, passing an event to postLaunch might need some more design work. For preCapture, we set some global state that is then used by chai::ManagedArray's copy constructor. The global state is then reset in postCapture. I'm imagining analogous preDestroy and postDestroy functions that set and reset global state and chai::ManagedArray's destructor would operate off that global state. That could get tricky to do, though.

I'm also trying to re-evaluate if I can do some kind of registry pattern where chai::ManagedArrays register themselves or some callbacks and then the postLaunch could do the work without having to set some global state.

adayton1 · 2024-09-25T16:47:00Z

I've been operating on the assumption that an event can be waited on more than once. Is that true?

MrBurmark · 2024-09-25T17:35:07Z

I think you can wait multiple times on an event. Note that we will also have to be careful using events as camp never frees/destroys the underlying cuda/hip event currently. I've thought about making events move-only or having a shared pointer so they can automatically clean up after themselves. I would prefer move-only as shared pointer can be expensive, but it would be a breaking change in the API. @long58

rhornung67 · 2024-10-15T16:32:28Z

Target this for next RAJA Suite release.

rhornung67 · 2024-10-15T16:35:22Z

Talk to ALED and Spheral teams about move-only vs. reference counting for event tracking.

adayton1 · 2024-10-22T16:06:33Z

We would need reference counting in events since multiple CHAI ManagedArrays could track the same event.

adayton1 · 2024-10-29T16:18:05Z

If you wanted to avoid passing an event on the postLaunch method, you could have RAJA::forall avoid returning an event and allow the application to create their own event if desired (since they passed in the resource to begin with, they can easily create one).

MrBurmark · 2024-10-29T16:52:58Z

We return something this is convertible to an event from the forall, so if you don't use it then no event is created.

adayton1 · 2024-10-29T16:56:22Z

We return something this is convertible to an event from the forall, so if you don't use it then no event is created.

Nice! I guess we'll still have to be careful about both CHAI and the application creating separate events. Or is the overhead of creating an event pretty low?

MrBurmark · 2024-10-29T17:02:48Z

That is still a concern, we'd have to measure. I don't think many apps use events.

MrBurmark · 2024-10-29T17:07:49Z

Thinking about things like multiple ownership for events. I can imagine having events that contain a shared pointer to their contents or events that are move-only and those wanting multiple ownership semantics wrap them in a shared pointer.

MrBurmark · 2024-10-29T17:14:17Z

Right now generic resources always come with the overhead of a shared pointer. I've been lamenting that for a while. If we wanted to avoid this we could perhaps have something like shared_resource and shared_event classes and unique_resource and unique_event classes? That way you could choose as a user which semantics you want. Though we'd still have to decide what semantics the typed resources and events should have. Currently typed resources and events are basically views to a stream or event so they can be copied and destroyed without affecting the underlying stream or event. The lifetime of the underlying resources, cuda/hip streams and events, are basically static storage duration and none of them are ever freed.

adayton1 · 2024-10-29T17:19:50Z

We'll definitely need events to be cleaned up - in a single run I could easily see tens of thousands of them being created. I'm less worried about streams at the moment, though that's still good to consider.

adayton1 · 2024-10-29T17:32:21Z

Hmm, I'm not sure how to handle typed resources/events. I don't love the idea of a typed resource destroying a stream when it goes out of scope. Or for that matter, a typed event being waited on when the typed event goes out of scope. Though, if it didn't wait on the event, just released it (is that possible to do without waiting on it?), then that seems reasonable.

MrBurmark · 2024-10-31T15:08:58Z

I don't think you have to wait on events to free them in cuda/hip.

rhornung67 added API/usability Enhancement reviewed Mark with this label when issue has been discussed by team labels Oct 15, 2024

rhornung67 assigned johnbowen42 Oct 29, 2024

rhornung67 added this to the FY25 Development milestone Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise RAJA plugin support #1742

Revise RAJA plugin support #1742

adayton1 commented Sep 24, 2024

adayton1 commented Sep 25, 2024

adayton1 commented Sep 25, 2024

MrBurmark commented Sep 25, 2024 •

edited

Loading

rhornung67 commented Oct 15, 2024

rhornung67 commented Oct 15, 2024

adayton1 commented Oct 22, 2024 •

edited

Loading

adayton1 commented Oct 29, 2024

MrBurmark commented Oct 29, 2024

adayton1 commented Oct 29, 2024

MrBurmark commented Oct 29, 2024

MrBurmark commented Oct 29, 2024

MrBurmark commented Oct 29, 2024 •

edited

Loading

adayton1 commented Oct 29, 2024

adayton1 commented Oct 29, 2024 •

edited

Loading

MrBurmark commented Oct 31, 2024 •

edited

Loading

Revise RAJA plugin support #1742

Revise RAJA plugin support #1742

Comments

adayton1 commented Sep 24, 2024

adayton1 commented Sep 25, 2024

adayton1 commented Sep 25, 2024

MrBurmark commented Sep 25, 2024 • edited Loading

rhornung67 commented Oct 15, 2024

rhornung67 commented Oct 15, 2024

adayton1 commented Oct 22, 2024 • edited Loading

adayton1 commented Oct 29, 2024

MrBurmark commented Oct 29, 2024

adayton1 commented Oct 29, 2024

MrBurmark commented Oct 29, 2024

MrBurmark commented Oct 29, 2024

MrBurmark commented Oct 29, 2024 • edited Loading

adayton1 commented Oct 29, 2024

adayton1 commented Oct 29, 2024 • edited Loading

MrBurmark commented Oct 31, 2024 • edited Loading

MrBurmark commented Sep 25, 2024 •

edited

Loading

adayton1 commented Oct 22, 2024 •

edited

Loading

MrBurmark commented Oct 29, 2024 •

edited

Loading

adayton1 commented Oct 29, 2024 •

edited

Loading

MrBurmark commented Oct 31, 2024 •

edited

Loading