The smoothed particle hydrodynamics (SPH) technique is a purely Lagrangian method. SPH discretizes a fluid in a series of interpolation points (SPH particles) whose distribution follows the mass density of the fluid and their evolution relies on a weighted interpolation over close neighboring particles.
The parallelization of SPH codes is not trivial due to their boundless nature and the absence of a structured grid.
SPH-EXA is a C++20 simulation code for hydrodynamics simulations (with gravity and other physics), parallelized with MPI, OpenMP, CUDA, and HIP.
SPH-EXA is built with high performance, scalability, portability, and resilience in mind. Its SPH implementation is based on SPHYNX, ChaNGa, and SPH-flow, three SPH codes selected in the PASC SPH-EXA project to act as parent and reference codes to SPH-EXA.
The performance of standard codes is negatively impacted by factors such as imbalanced multi-scale physics, individual time-stepping, halos exchange, and long-range forces. Therefore, the goal is to extrapolate common basic SPH features, and consolidate them in a fully optimized, Exascale-ready, MPI+X, SPH code: SPH-EXA.
Check our wiki for more details
SPH-EXA
├── README.md
├── docs
├── domain - Cornerstone library: octree building and domain decomposition
│ ├── include
│ │ └── cstone
│ │ ├── CMakeLists.txt
│ │ ├── cuda
│ │ ├── domain
│ │ ├── findneighbors.hpp
│ │ ├── halos
│ │ ├── primitives
│ │ ├── sfc
│ │ ├── tree
│ │ └── util
│ └── test - Cornerstone unit- performance-
│ ├── integration_mpi and integration tests
│ ├── performance
│ ├── unit
│ └── unit_cuda
├── ryoanji - Ryoanji: N-body solver for gravity
│ ├─── src
│ └─── test - demonstrator app and unit tests
│
├── sph - SPH implementation
│ ├─── include
│ │ └── sph
│ └─── test - SPH kernel unit tests
│
└── main/src
├── init - initial conditions for test cases
├── io - file output functionality
└── sphexa - SPH main application front-end
The C++ (.cpp) part of the code requires a C++20 compiler, at least GCC 10, clang 12 or cray-clang 14. For CUDA (.cu), the minimum supported CUDA version is CUDA 11.2 with a C++17 host compiler, e.g. GCC 9.3.0.
Note that GCC 10.3.0 does not work as CUDA host compiler due to known compiler bugs. For ease of use, the recommended minimum version of CUDA is 11.4.1 which supports GCC 11, providing both the required C++20 support and bug-free CUDA host compilation. [NOTE: CUDA/11.3.1 seems to have solved the compatibility issues with GCC 10.3.0]
Minimal CMake configuration:
mkdir build
cd build
cmake <GIT_SOURCE_DIR>
Compilation at sciCORE (UniBas):
ml HDF5/1.10.7-gompi-2021a
ml CMake/3.23.1-GCCcore-10.3.0
ml CUDA/11.3.1
mkdir build
cd build
cmake <GIT_SOURCE_DIR>
CMake configuration on Piz Daint for clang: Cray-clang 14 for CPU code (.cpp), CUDA 11.6 + GCC 11.2.0 for GPU code (.cu):
module load daint-gpu
module load CMake/3.22.1
module load PrgEnv-cray
module load cdt/22.05 # will load cce/14.0.0
module load nvhpc-nompi/22.2 # will load nvcc/11.6
module load gcc/11.2.0
module load cray-hdf5-parallel
mkdir build
cd build
# C-compiler is needed for hdf5 detection
CC=cc CXX=CC cmake -DCMAKE_CUDA_ARCHITECTURES=60 -S <GIT_SOURCE_DIR>
Module and CMake configuration on LUMI
module load CrayEnv buildtools/22.08 craype-accel-amd-gfx90a rocm cray-hdf5-parallel
cd <GIT_SOURCE_DIR>; hipify-perl -inplace `find -name *.cu -o -name *.cuh` && find -name *.prehip -delete
cmake -DCMAKE_CXX_COMPILER=CC -DCMAKE_HIP_ARCHITECTURES=gfx90a -DCMAKE_HIP_COMPILER=CC -DCMAKE_HIP_COMPILER_FORCED=ON -DGPU_DIRECT=<ON/OFF> -S <GIT_SOURCE_DIR>
Build everything: make -j
The main sphexa
(and sphexa-cuda
, if GPUs are available) application can either start a simulation by reading initial conditions
from a file or generate an initial configuration for a named test case.
Self-gravity will be activated automatically based on named test-case choice or if the HDF5 initial
configuration file has an HDF5 attribute with a non-zero value for the gravitational constant.
Arguments:
--init CASE/FILE
: use the case name as seen below or provide an HDF5 file with initial conditions-n NUM
: Run the simulation with NUM^3 (NUM to the cube) number of particles (for named test cases). [NOTE: This might vary with the test case]-s NUM
: Run the simulation with NUM of iterations (time-steps) if NUM is integer. Run until the specified physical time if NUM is real.-w NUM
: Dump particle data every NUM iterations (time-steps) if NUM is integer. Dump data at the specified physical time if NUM is real.-f FIELDS
: Comma separated list of particle fields for file output dumps. See a list of common ouput fields below.--quiet
: Don't print any output to stdout
Implemented cases:
--sedov
: spherical blast wave--noh
: spherical implosion--evrard
: gravitational collapse of a isothermal cloud--turbulence
: subsonic turbulence in a box--kelvin-helmholtz
: devlopment of the subsonic Kelvin-Helmholtz instability in a thin slice
Common output fields:
x, y, z
: positionvx, vy, vz
: velocityh
: smoothing lengthrho
: densityc
: speed of soundp
: pressuretemp
: temperatureu
: internal energync
: number of neighborsdivv
: Module of the divergence of the velocity fieldcurlv
: Module of the curl of the velocity field
Example usage:
OMP_NUM_THREADS=4 ./sphexa --init sedov -n 100 -s 1000 -w 10 -f x,y,z,rho,p
Runs Sedov with 100^3 particles for 1000 iterations (time-steps) with 4 OpenMP threads and dumps particle xyz-coordinates, density and pressure data every 10 iterationsOMP_NUM_THREADS=4 ./sphexa-cuda --init -n 100 -s 1000 -w 10 -f x,y,z,rho,p
Runs Sedov with 100^3 particles for 1000 iterations (time-steps) with 4 OpenMP threads. Uses the GPU for most of the compute work.OMP_NUM_THREADS=4 mpiexec -np 2 ./sphexa --init noh -n 100 -s 1000 -w 10
Runs Noh with 100^3 particles for 1000 iterations (time-steps) with 2 MPI ranks of 4 OpenMP threads each. Works when using MPICH. For OpenMPI, usempirun
instead.OMP_NUM_THREADS=12 srun -Cgpu -A<your account> -n<nnodes> -c12 ./sphexa-cuda --init sedov -n 100 -s 1000 -w 10
Optimal runtime configuration on Piz Daint fornnodes
GPU compute nodes. Launches 1 MPI rank with 12 OpenMP threads per node../sphexa-cuda --init evrard.h5 -s 2000 -w 100 -f x,y,z,rho,p,vx,vy,vz
Run SPH-EXA, initializing particle data from an input file (e.g. for the Evrard collapse). Includes gravitational forces between particles. The angle dependent accuracy parameter theta can be specificed with--theta <value>
, the default is0.5
.
If output to file is enabled and if the -f
option is not provided, sphexa will output all conserved particle
fields which allows restoring the simulation to the exact state at the time of writing the output.
This includes the following fields x_m1, y_m1, z_m1, du_m1
.
In order to save diskspace, sphexa can be instructed to omit these fields by setting the -f option
, e.g.
-f x,y,z,m,h,temp,alpha,vx,vy,vz
. If one wants to restart the simulation from an output file containing
these fields, it is necessary to add the _m1
. We provide an example script that can be used to achieve this:
./scripts/add_m1.py <hdf5-output-file>
Cornerstone octree and domain unit tests:
./domain/test/unit/component_units
GPU-enabled unit tests:
./domain/test/unit_cuda/component_units_cuda
MPI-enabled integration and regression tests:
mpiexec -np 2 ./domain/test/integration_mpi/domain_2ranks
mpiexec -np 2 ./domain/test/integration_mpi/exchange_focus
mpiexec -np 2 ./domain/test/integration_mpi/exchange_halos
mpiexec -np 2 ./domain/test/integration_mpi/globaloctree
mpiexec -np 5 ./domain/test/integration_mpi/domain_nranks
mpiexec -np 5 ./domain/test/integration_mpi/exchange_domain
mpiexec -np 5 ./domain/test/integration_mpi/exchange_keys
mpiexec -np 5 ./domain/test/integration_mpi/focus_tree
mpiexec -np 5 ./domain/test/integration_mpi/treedomain
SPH-kernel unit tests:
./include/sph/test/kernel/kernel_tests
Some tests require input data. For example, the Evrard test case will check that a Test3DEvrardRel.bin file exists and can be read at the beginning of the job. This file can be downloaded from zenodo.org.
Ryoanji is a high-performance GPU N-body solver for gravity. It relies on the cornerstone octree framework for tree construction, EXAFMM multipole kernels, and a warp-aware tree-traversal inspired by the Bonsai GPU tree-code.
./test/ryoanji <log2(numParticles)> <computeDirectReference:yes=1,no=0>
./test/ryoanji_unit_tests
- Ruben Cabezon
- Aurelien Cavelan
- Florina Ciorba
- Jean M. Favre
- Michal Grabarczyk
- Danilo Guerrera
- David Imbert
- Sebastian Keller
- Lucio Mayer
- Ali Mohammed
- Jg Piccinali
- Tom Quinn
- Darren Reed
- Osman Seckin Simsek
This project is licensed under the MIT License - see the LICENSE file for details