-
Notifications
You must be signed in to change notification settings - Fork 16
Home
Author: Matt Norman (Oak Ridge National Laboratory) - mrnorman.github.io
Contributors:
Matt Norman (Oak Ridge National Laboratory) Isaac Lyngaas (Oak Ridge National Laboratory) Abhishek Bagusetty (Argonne National Laboratory) Mark Berrill (Oak Ridge National Laboratory)
YAKL (like Kokkos and RAJA) is a portable C++ library that allows developers to conveniently export code to different hardware backends like CUDA, HIP, and SYCL for single-source portability. YAKL, Kokkos, and RAJA are all just C++ libraries, and the code is purely C++ without any language extensions. For more information about portable C++ libraries, particularly from the perspective of using directives, please read this article.
The YAKL API is similar to Kokkos in many ways, but is quite simplified and has much stronger and Fortran-like behavior in the arrays and parallel loops. YAKL currently has backends for:
- CPUs (serial)
- CPU OpenMP threading
- CUDA
- HIP
- SYCL
- OpenMP offload (in progress)
- Multi-dimensional dynamically allocated arrays in Fortran and C styles
- Multi-dimensional statically defined arrays in Fortran and C styles
- Kernel launchers to launch code in parallel over threads on different hardware backends
- Various methods of transferring data between host and device memory spaces
- Basic atomic operations (add, min, and max) using hardware atomics when available
- Efficient reductions via convenient syntax patterned after Fortran's
sum()
,minval()
, andmaxval()
using vendor libraries - Synchronization via a
fence()
function - Pool allocator that is automatically turned on for all device allocations in separate memory address spaces
- Fortran bindings for YAKL allocators and YAKL init and finalize
- Limited Fortran intrinsics library
- Classes to handle scalars that need to be read after being written to in a parallel kernel.
- NetCDF and Parallel NetCDF I/O routines using YAKL's multi-dimensional Arrays
- Automated timers for YAKL's parallel_for calls using the General Purpose Timing Library (GPTL)
The following is an example of a section of code in Fortran + OpenACC, parallel YAKL C++ in Fortran-style, and parallel YAKL in Fortran-style:
real stateTend (nx ,ny,nz,numState);
real stateFluxLimits(nx+1,ny,nz,numState);
!$acc parallel loop collapse(4)
do l = 1 , numState
do k = 1 , nz
do j = 1 , ny
do i = 1 , nx
stateTend(i,j,k,l) = - ( stateFluxLimits(i+1,j,k,l) -
stateFluxLimits(i ,j,k,l) ) / dx;
enddo
enddo
enddo
enddo
typedef yakl::Array<float,4,yakl::memDevice,yakl::styleFortran> real4d;
using yakl::fortran::parallel_for;
using yakl::fortran::Bounds;
real4d stateTend ("stateTend" ,nx ,ny,nz,numState);
real4d stateFluxLimits("stateFluxLimits",nx+1,ny,nz,numState);
// do l = 1 , numState
// do k = 1 , nz
// do j = 1 , ny
// do i = 1 , nx
parallel_for( Bounds<4>(numState,nz,ny,nx) ,
YAKL_LAMBDA(int l, int k, int j, int i) {
stateTend(i,j,k,l) = - ( stateFluxLimits(i+1,j,k,l) -
stateFluxLimits(i ,j,k,l) ) / dx;
});
typedef yakl::Array<float,4,yakl::memDevice,yakl::styleC> real4d;
using yakl::c::parallel_for;
using yakl::c::Bounds;
real4d stateTend ("stateTend" ,numState,nz,ny,nx );
real4d stateFluxLimits("stateFluxLimits",numState,nz,ny,nx+1);
// for (int l=0; l < numState; l++) {
// for (int k=0; k < nz; k++) {
// for (int j=0; j < ny; j++) {
// for (int i=0; i < nx; i++) {
parallel_for( Bounds<4>(numState,nz,ny,nx) ,
YAKL_LAMBDA(int l, int k, int j, int i) {
stateTend(l,k,j,i) = - ( stateFluxLimits(l,k,j,i+1) -
stateFluxLimits(l,k,j,i ) ) / dx;
});