You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Several years ago, we considered (see #266) adding a variant of GPU-STUMP that utilized cooperative groups and that would allow us to push the multiple kernel launches onto the device. Earlier work was concerned about:
Breaking backwards compatibility
Adding unnecessary complexity to the code
However, cudatoolkit support is much better now and older GPUs that lack cooperative group support are likely end-of-life (and so the above concerns are likely a thing of the pst now). Additionally, numba has moved ahead many, many versions since our last attempt. Thus, we should reconsider adding this to STUMPY. PR #266 provides some clear code for how to proceed and had demonstrated a 12% speedup, which is great!
@joehiggi1758 Do you have access to an NVIDIA GPU for testing? Otherwise, it might be very painful to assess the performance of any code changes. If you do then please proceed and let me if you have any questions or we can also reach out to our collaborator at NVIDIA for help as well (I'm sure there are new features that we may be able to leverage).
Alternatively, you may be interested in this new issue #1031 and attempting to reproduce the work. It has less baggage than this current issue.
Unfortunately I don't have access to a GPU other than maybe a free subscription to Azure. I think starting with the NVIDIA contact is a better plan of attack! If I can help there in any way lmk, I'd love to more about GPUs!
I'll focus on #1031 for now as you're right that does look a bit better as a next issue for me!
Several years ago, we considered (see #266) adding a variant of GPU-STUMP that utilized cooperative groups and that would allow us to push the multiple kernel launches onto the device. Earlier work was concerned about:
However, cudatoolkit support is much better now and older GPUs that lack cooperative group support are likely end-of-life (and so the above concerns are likely a thing of the pst now). Additionally,
numba
has moved ahead many, many versions since our last attempt. Thus, we should reconsider adding this to STUMPY. PR #266 provides some clear code for how to proceed and had demonstrated a 12% speedup, which is great!See also the
numba
docs on cooperative groupsThe text was updated successfully, but these errors were encountered: