To fully exploit the computational power of the GPU generally a large amount of data parallelism must be expressed. In the specific case of accelerated libraries such as cuBLAS
, cuFFT
, and cuSPARSE
if each operation does not possess a sufficient amount of data parallelism another option is to batch many smaller operations into a single large operation. This tutorial will demonstrate how to take advantage of batched cuBLAS
operations to improve GPU utilization. Additionally this tutorial will expand on the GPU concurrency topics from the first tutorial through the use of streams
and Hyper-Q
. The full source can be viewed or downloaded from the OLCF GitHub. Please direct any questions or comments to [email protected]
forked from olcf/Batched_cuBLAS
-
Notifications
You must be signed in to change notification settings - Fork 0
zheliu137/Batched_cuBLAS
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Fortran 51.2%
- C 37.8%
- Cuda 5.8%
- Makefile 5.2%