Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to build and run PeleC using GPU? #769

Open
EarlFan opened this issue Mar 17, 2024 · 17 comments
Open

How to build and run PeleC using GPU? #769

EarlFan opened this issue Mar 17, 2024 · 17 comments

Comments

@EarlFan
Copy link

EarlFan commented Mar 17, 2024

Dear all,

Hi!

I want to build and run PeleC using GPU, however, I am not able to find any tutorial on installations relevant to GPU or the CUDA enviroment. Can anyone provide some tutorial? Any help will be appreciated!

Thanks!

Regards,
Fan E

@baperry2
Copy link
Contributor

baperry2 commented Mar 18, 2024

It's hard to provide detailed instructions for GPU use as the details can vary from system to system. But if you want to run on a system with Nvidia GPUs using cuda and your system is set up properly, all you should need to do is compile as normal, but with USE_CUDA = TRUE (and USE_MPI = TRUE assuming you also want MPI support) in your GNUmakefile. I'd recommend trying this for the PMF case using the pmf-lidryer-cvode.inp input file.

When running on GPUs, certain simulation input parameters may benefit from being re-optimized for performance. In particular, you may want larger values for amr.blocking_factor and amr.max_grid_size, and you may want to look at different options for cvode.solve_type. Every problem is different so it's usually good to do a little experimentation.

@jrood-nrel
Copy link
Contributor

It's useful to know that certain things need to be done for certain sites, and AMReX has some supported sites here https://github.com/AMReX-Codes/amrex/tree/development/Tools/GNUMake/sites . The machine query logic is here https://github.com/AMReX-Codes/amrex/blob/development/Tools/GNUMake/Make.machines .

@EarlFan
Copy link
Author

EarlFan commented Mar 25, 2024

Dear all,

Thank you for your assistance!

I try to compile PeleC with nvcc on the WSL but have encountered some challenges, particularly with the Sundial package. Currently, I am able to run PeleC on CPUs without issues, but I am eager to explore the capabilities of GPU acceleration.

If it is OK, I would like to keep this issue open to share my future experiences regarding the use of PeleC with GPU computing.

Regards,
Fan E

@baperry2
Copy link
Contributor

Yeah that's fine to leave this issue open and add more detail on any issues you have running on GPUs, which we can then try to address.

@SRkumar97
Copy link

It's hard to provide detailed instructions for GPU use as the details can vary from system to system. But if you want to run on a system with Nvidia GPUs using cuda and your system is set up properly, all you should need to do is compile as normal, but with USE_CUDA = TRUE (and USE_MPI = TRUE assuming you also want MPI support) in your GNUmakefile. I'd recommend trying this for the PMF case using the pmf-lidryer-cvode.inp input file.

When running on GPUs, certain simulation input parameters may benefit from being re-optimized for performance. In particular, you may want larger values for amr.blocking_factor and amr.max_grid_size, and you may want to look at different options for cvode.solve_type. Every problem is different so it's usually good to do a little experimentation.

Hello! I have a doubt to clarify. When I first tested the code in CPU parallel mode, by running the basic PMF testcase I had not set MPI=TRUE in the example.inp file. But still the mpirun -np command worked out, to run the PeleC executable. Did I miss out anything?

@jrood-nrel
Copy link
Contributor

mpirun will run any application with multiple instances, for example try mpirun -np 8 echo "hello".

Without MPI enabled in PeleC it will run the same application with np instances but they won't communicate to solve a single problem. Without MPI enabled mpirun will run the same problem in multiple instances with no benefit of concurrency.

@baperry2
Copy link
Contributor

Note that when you compile for MPI, you should have USE_MPI = TRUE in your GNUmakefile, and MPI should appear in the name of the PeleC executable that gets generated. No changes are needed in the input files to run with MPI. But if the executable doesn't have MPI in the name, you generated a serial executable and it will run independent instances as mentioned by @jrood-nrel.

@SRkumar97
Copy link

Thanks for your clarifications on this! @jrood-nrel @baperry2 .

@RSuryaNarayan
Copy link

I am trying to get Pele to work with GPUs on kestrel and any instructions on the relevant modules to be loaded will be greatly appreciated. So far I've tried using PrgEnv-nvhpc and PrgEnv-nvidia along with openmpi but I keep getting the following error after I compile TPL

/scratch/ramac106/PeleC/Submodules/PelePhysics/Submodules/amrex/Src/Base/AMReX_ccse-mpi.H:14:10: fatal error: mpi.h: No such file or directory
 #include <mpi.h>
          ^~~~~~~
compilation terminated.

@baperry2
Copy link
Contributor

baperry2 commented Aug 21, 2024

For Kestrel GPUs, you can use the modules specified here, which should also work for PeleC: https://erf.readthedocs.io/en/latest/GettingStarted.html#kestrel-nrel

Let us know if there are any issues, it's been a bit since I tested PeleC on Kestrel GPUs and they've been periodically reshuffling the modules as they get the GPUs on line

@RSuryaNarayan
Copy link

thank you @baperry2. this is really helpful. Will let you know how it goes

@RSuryaNarayan
Copy link

I followed the steps outlined ERF's website (with the latest branches of PeleC and Submodules). I seem to run into the following error:

In file included from /scratch/ramac106/PeleC_latest/PeleC/Submodules/PelePhysics/Submodules/amrex/Src/Extern/SUNDIALS/AMReX_SUNMemory.cpp:1:
/scratch/ramac106/PeleC_latest/PeleC/Submodules/PelePhysics/Submodules/amrex/Src/Extern/SUNDIALS/AMReX_Sundials_Core.H:7:10: fatal error: sundials/sundials_config.h: No such file or directory
    7 | #include <sundials/sundials_config.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

I did try re-making TPL after loading the modules suggested, but still get this error...

@baperry2
Copy link
Contributor

Make sure you've done git submodule update --recursive before make TPLrealclean && make TPL, and double check that your the sundials commit you are using is 2abd63bd6.

However, it does seem that there may be another issue here, as when I try it's getting past the step you are seeing but failing to generate the executable after linking.

@RSuryaNarayan
Copy link

RSuryaNarayan commented Aug 23, 2024

The following procedure looks to work but fails to produce an executable towards the end (i.e. goes all the way upto AMReX_BuildInfo but the linking looks like an issue for some reason)

  1. load all the modules here: https://erf.readthedocs.io/en/latest/GettingStarted.html#kestrel-nrel:~:text=For%20compiling%20and%20running%20on%20GPUs%2C%20the%20following%20commands%20can%20be%20used%20to%20set%20up%20your%20environment%3A
  2. make TPLrealclean; make TPL USE_CUDA=TRUE
  3. make realclean; make -j COMP=gnu USE_CUDA=TRUE

using MPI+CUDA btw i.e. USE_MPI=TRUE and USE_CUDA=TRUE. COMP=nvhpc results in sundials issues again...

@baperry2
Copy link
Contributor

As I mentioned, the setup of the GPU partition of Kestrel has been frustratingly unstable. It appears they have again changed things in a way that makes the prior instructions no longer functional.

You should be able to use the following module setup:

module purge;
module load PrgEnv-gnu/8.5.0;
module load cuda/12.3;
module load craype-x86-milan;

And then compile with:

make TPLrealclean; make TPL COMP=gnu USE_CUDA=TRUE USE_MPI=TRUE
make realclean; make -j COMP=gnu USE_CUDA=TRUE USE_MPI=TRUE

@jrood-nrel
Copy link
Contributor

I happened to be looking at this as well and I used this:

git clone --recursive [email protected]:AMReX-Combustion/PeleC.git && cd PeleC/Exec/RegTests/PMF && module purge && module load PrgEnv-gnu/8.5.0 && module load craype-x86-trento && module load cray-libsci && module load cmake && module load cuda && module load cray-mpich/8.1.28 && make realclean && nice make USE_MPI=TRUE USE_CUDA=TRUE COMP=gnu -j24 TPLrealclean && nice make USE_MPI=TRUE USE_CUDA=TRUE COMP=gnu -j24 TPL && nice make USE_MPI=TRUE USE_CUDA=TRUE COMP=gnu -j24

@RSuryaNarayan
Copy link

RSuryaNarayan commented Aug 27, 2024

thanks a lot @jrood-nrel @baperry2 I am able to get a linked executable for the PMF case successfully with CUDA. My specific case though still faces the issue. I guess its something to do with the way the PMF functions and data-structures have been defined and will align everything with the way its done in the present case folder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants