To fully exploit the computational power of the GPU generally a large amount of data parallelism must be expressed. In the specific case of accelerated libraries such as cuBLAS
, cuFFT
, and cuSPARSE
if each operation does not possess a sufficient amount of data parallelism another option is to batch many smaller operations into a single large operation. This tutorial will demonstrate how to take advantage of batched cuBLAS
operations to improve GPU utilization. Additionally this tutorial will expand on the GPU concurrency topics from the first tutorial through the use of streams
and Hyper-Q
. The full source can be viewed or downloaded from the OLCF GitHub. Please direct any questions or comments to [email protected]