This repo includes codes & examples for "CUDA - From Correctness to Performance".
The lecture can be found at https://wiki.lcpu.dev/zh/hpc/from-scratch/cuda or here
Make sure you have installed the CUDA toolkit, and a CUDA-compatible GPU is available.
Run make all
to build this repo.
Usage:
./gemm_test <n> <m> <k> [implementation]
If implementation
is not specified, all implementations will be benchmarked.