-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Writing the coordination with openACC #1075
Comments
Regarding the vector and tensor with generic type, I tried to do the same a few years ago and I remember that with intel compiler the performances were measurably affected (to my surprise). Maybe you can double check this. In case it's true, maybe we can duplicate the code. Otherwise I am also happy with a more general version, it would be useful in other parts of the code as well |
If it's limited to this maybe we can adjust the code. It would be ideal if we could also install nvc++ on one job in GitHub actions to test for this |
Ok, so I set up the PR as a wip, then I will produce some benchmarks
I'm trying to do it in #1076 |
As I did with CUDA(#1028) and I tried to do with Arrayfire(#1049) and pytorch, I tried to rewrite the COORDINATION cv with openACC as accelerator.
Here's the result, using the new benchmark tool
Is slower than CUDA, but writing in openACC may be more familiar, because it looks like openMP and also because you can leave the compiler to guess how to implement the parallelization of the loops and you do not have to use the
<<<>>>
to launch kernels like in CUDA. And is way more flexible than the tensor libraries.On the compilation I have some mixed feelings, as you can read in the spoiler below.
Details about compilation and script used
I run everything on my workstation (NVIDIA T1000 8GB + AMD Ryzen 5 PRO 5650G)
I used nvhpc24.3, downloaded already compiled from the Nvidia site.
The environment used is actually slightly complex:
I compiled plumed from master with plain gcc+mpi
Then I compiled the plugin with my wild Makefile that uses nvc++ for the accelerated part and g++ for the main body of the CV.
Then I ran the benchmark without nvhpc in the environment, because it conflicts with the mpi that I used with plumed:
(I have to try to make everything run compiled with plain nvhpc
But since nvhpc does not like the kw auto for deducing return types (as used in tools/MergeVectorTools.h:54), it needs some massages to the plumed source and I did not want to touch src for this project)
If you look at the code I also added a few extra headers:
LoopUnroller.h
Tensor.h
Vector.h
that are a variant to the originals header with the possibility of declaring Tensors and Vector of any type.and some splashes of refactor to c++17 where I did not managed to convince nvc++ to deduce the template arguments as I wanted
Tools_pow.h
that templatizes the type int the runtime version offastpow
Since these modifications are a prerequisite to the use of openACC but are completely independent from it. If you are ok with this, I would like to open a PR with a patch to the original
.h
filesThe text was updated successfully, but these errors were encountered: