-
Notifications
You must be signed in to change notification settings - Fork 15
Introduction and Overview
Matt Norman edited this page Feb 26, 2022
·
3 revisions
YAKL (like Kokkos and RAJA) is a portable C++ library that allows developers to conveniently run code on different hardware backends like CUDA, HIP, and SYCL for single-source portability. YAKL, Kokkos, and RAJA are all just C++ libraries, and the code is purely C++ without any language extensions. For more information about portable C++ libraries, particularly from the perspective of using directives, please read this article.
The YAKL API is similar to Kokkos in many ways, but is simplified and has stronger Fortran-like behavior in the arrays and parallel loops. YAKL currently has backends for:
- CPUs (serial)
- CPU OpenMP threading
- CUDA
- HIP
- SYCL
- OpenMP offload (in progress)
- Multi-dimensional dynamically allocated arrays in Fortran and C styles
- Multi-dimensional statically defined arrays in Fortran and C styles
- Kernel launchers to launch code in parallel over threads on different hardware backends
- Various methods of transferring data between host and device memory spaces
- Basic atomic operations (add, min, and max) using hardware atomics when available
- Efficient reductions via convenient syntax patterned after Fortran's
sum()
,minval()
, andmaxval()
using vendor libraries - Synchronization via a
fence()
function - Pool allocator that is automatically turned on for all device allocations in separate memory address spaces
- Fortran bindings for YAKL allocators and YAKL init and finalize
- Limited Fortran intrinsics library
- Classes to handle scalars that need to be read after being written to in a parallel kernel.
- NetCDF and Parallel NetCDF I/O routines using YAKL's multi-dimensional Arrays
- Automated timers for YAKL's parallel_for calls using the General Purpose Timing Library (GPTL)
The following is an example of a section of code in Fortran + OpenACC, parallel YAKL C++ in Fortran-style, and parallel YAKL in Fortran-style:
real stateTend (nx ,ny,nz,numState);
real stateFluxLimits(nx+1,ny,nz,numState);
!$acc parallel loop collapse(4)
do l = 1 , numState
do k = 1 , nz
do j = 1 , ny
do i = 1 , nx
stateTend(i,j,k,l) = - ( stateFluxLimits(i+1,j,k,l) -
stateFluxLimits(i ,j,k,l) ) / dx;
enddo
enddo
enddo
enddo
typedef yakl::Array<float,4,yakl::memDevice,yakl::styleFortran> real4d;
using yakl::fortran::parallel_for;
using yakl::fortran::Bounds;
real4d stateTend ("stateTend" ,nx ,ny,nz,numState);
real4d stateFluxLimits("stateFluxLimits",nx+1,ny,nz,numState);
// do l = 1 , numState
// do k = 1 , nz
// do j = 1 , ny
// do i = 1 , nx
parallel_for( Bounds<4>(numState,nz,ny,nx) ,
YAKL_LAMBDA(int l, int k, int j, int i) {
stateTend(i,j,k,l) = - ( stateFluxLimits(i+1,j,k,l) -
stateFluxLimits(i ,j,k,l) ) / dx;
});
typedef yakl::Array<float,4,yakl::memDevice,yakl::styleC> real4d;
using yakl::c::parallel_for;
using yakl::c::Bounds;
real4d stateTend ("stateTend" ,numState,nz,ny,nx );
real4d stateFluxLimits("stateFluxLimits",numState,nz,ny,nx+1);
// for (int l=0; l < numState; l++) {
// for (int k=0; k < nz; k++) {
// for (int j=0; j < ny; j++) {
// for (int i=0; i < nx; i++) {
parallel_for( Bounds<4>(numState,nz,ny,nx) ,
YAKL_LAMBDA(int l, int k, int j, int i) {
stateTend(l,k,j,i) = - ( stateFluxLimits(l,k,j,i+1) -
stateFluxLimits(l,k,j,i ) ) / dx;
});