Skip to content

Latest commit

 

History

History
15 lines (13 loc) · 492 Bytes

README.md

File metadata and controls

15 lines (13 loc) · 492 Bytes

hw3-1

  • implment with C++ and OpenMP
  • Use Floyed-Warshall and multiple thread to speedup the performance

hw3-2

  • implement with CUDA and single GPU
  • use Blocked-Floyed-Warshall algorithm
  • Optimize coalesced memory access for more cache access
  • Load data from global memory to local memory to reduce access latency
  • Prevent bank conflict

hw3-2

  • Same optimization technique as hw3-2
  • Multiple GPU to share the calculation
  • Only copy few data one time to reduct memory copy time