Skip to content

Introduction and Overview

Matt Norman edited this page Feb 26, 2022 · 3 revisions

Overview

YAKL (like Kokkos and RAJA) is a portable C++ library that allows developers to conveniently run code on different hardware backends like CUDA, HIP, and SYCL for single-source portability. YAKL, Kokkos, and RAJA are all just C++ libraries, and the code is purely C++ without any language extensions. For more information about portable C++ libraries, particularly from the perspective of using directives, please read this article.

The YAKL API is similar to Kokkos in many ways, but is simplified and has stronger Fortran-like behavior in the arrays and parallel loops. YAKL currently has backends for:

  • CPUs (serial)
  • CPU OpenMP threading
  • CUDA
  • HIP
  • SYCL
  • OpenMP offload (in progress)

What does YAKL provide?

  • Multi-dimensional dynamically allocated arrays in Fortran and C styles
  • Multi-dimensional statically defined arrays in Fortran and C styles
  • Kernel launchers to launch code in parallel over threads on different hardware backends
  • Various methods of transferring data between host and device memory spaces
  • Basic atomic operations (add, min, and max) using hardware atomics when available
  • Efficient reductions via convenient syntax patterned after Fortran's sum(), minval(), and maxval() using vendor libraries
  • Synchronization via a fence() function
  • Pool allocator that is automatically turned on for all device allocations in separate memory address spaces
  • Fortran bindings for YAKL allocators and YAKL init and finalize
  • Limited Fortran intrinsics library
  • Classes to handle scalars that need to be read after being written to in a parallel kernel.
  • NetCDF and Parallel NetCDF I/O routines using YAKL's multi-dimensional Arrays
  • Automated timers for YAKL's parallel_for calls using the General Purpose Timing Library (GPTL)

Example YAKL Code

The following is an example of a section of code in Fortran + OpenACC, parallel YAKL C++ in Fortran-style, and parallel YAKL in Fortran-style:

OpenACC Fortran Code

real stateTend      (nx  ,ny,nz,numState);
real stateFluxLimits(nx+1,ny,nz,numState);

!$acc parallel loop collapse(4)
do l = 1 , numState
  do k = 1 , nz
    do j = 1 , ny
      do i = 1 , nx
        stateTend(i,j,k,l) = - ( stateFluxLimits(i+1,j,k,l) -
                                 stateFluxLimits(i  ,j,k,l) ) / dx;
      enddo
    enddo
  enddo
enddo

Portable C++ Code (Fortran-style YAKL Arrays)

typedef yakl::Array<float,4,yakl::memDevice,yakl::styleFortran> real4d;
using yakl::fortran::parallel_for;
using yakl::fortran::Bounds;

real4d stateTend      ("stateTend"      ,nx  ,ny,nz,numState);
real4d stateFluxLimits("stateFluxLimits",nx+1,ny,nz,numState);

// do l = 1 , numState
//   do k = 1 , nz
//     do j = 1 , ny
//       do i = 1 , nx
parallel_for( Bounds<4>(numState,nz,ny,nx) ,
              YAKL_LAMBDA(int l, int k, int j, int i) { 
  stateTend(i,j,k,l) = - ( stateFluxLimits(i+1,j,k,l) -
                           stateFluxLimits(i  ,j,k,l) ) / dx;
});

Portable C++ Code (C-style YAKL Arrays)

typedef yakl::Array<float,4,yakl::memDevice,yakl::styleC> real4d;
using yakl::c::parallel_for;
using yakl::c::Bounds;

real4d stateTend      ("stateTend"      ,numState,nz,ny,nx  );
real4d stateFluxLimits("stateFluxLimits",numState,nz,ny,nx+1);

// for (int l=0; l < numState; l++) {
//   for (int k=0; k < nz; k++) {
//     for (int j=0; j < ny; j++) {
//       for (int i=0; i < nx; i++) {
parallel_for( Bounds<4>(numState,nz,ny,nx) ,
              YAKL_LAMBDA(int l, int k, int j, int i) { 
  stateTend(l,k,j,i) = - ( stateFluxLimits(l,k,j,i+1) -
                           stateFluxLimits(l,k,j,i  ) ) / dx;
});  
Clone this wiki locally