Skip to content
This repository has been archived by the owner on Feb 27, 2024. It is now read-only.

References

James Newling edited this page Oct 17, 2017 · 3 revisions

Some of the works or projects referenced in this wiki

  1. Implementing a Code Generator for Fast Matrix Multiplication in OpenCL on the GPU { Matsumoto et al., (2012) }

  2. A Portable and High-Performance General Matrix-Multiply (GEMM) Library for GPUs and Single-Chip CPU/GPU Systems { Garg and Hendren, (2014) }

  3. CLTune: A Generic Auto-Tuner for OpenCL Kernels { Nugteren and Codreanu, (2015) }

  4. CLBlast: A Tuned OpenCL BLAS Library { git and arXiv }

  5. Accelerating GPU kernels for dense linear algebra { Nath et al. (2011) }

  6. Accelerating cuBLAS/cuDNN using Input-Aware Auto-Tuning: The ISAAC library { code and slides }

  7. A three-dimensional approach to parallel matrix multiplication { Agarwal et al. (1995) }

  8. Matrix multiplication beyond auto-tuning: Rewrite-based gpu code generation. { Steuwer et al.(2016) }

More coming soon.

Clone this wiki locally