Skip to content

Steps For Running PPT GPU

Yehia Arafa edited this page Mar 18, 2022 · 5 revisions

High-level steps for modeling a GPU workload/application:

  1. Extract and collect the traces of the application you want to model

    • This step needs a real Nvidia GPU (with a compute SM capability >= 3.5)
    • build the application using nvcc (for SASS) or clang (for PTX)
    • run the application through the tracing_tool and/or the llvm_tool
    • Traces will be in 3 folders: sass_traces, ptx_traces, and memory_traces
  2. Run PPT-GPU on the collected traces

    • build the reuse_distance_tool
    • choosing a GPU HWto model, or add new configs inside the (hardware)[https://github.com/NMSU-PEARL/PPT-GPU/tree/main/hardware] directory
    • run PPT-GPU with the appropriate knobs
    • performance results (per kernel) are written in the application parent directory.

Steps of running:

(A) Using your Linux OS (not using Docker)

Assuming you have all the SW & HW (dependencies)[https://github.com/NMSU-PEARL/PPT-GPU/wiki/SW-&-HW-Dependencies] installed

  1. Configure & Update MPI path

    • Update defaultMpichLibName with the correct libmpich.so path in the simian.py file
  2. Extract the traces of the application

    a. SASS/memory traces:

    • Navigate to tracing_tool directory and follow the instructions in the Readme file to build the tool

    • To get the traces for a certain application you have preload the tracer.so file as a shared library while running the application:

      LD_PRELOAD=~/PPT-GPU/tracing_tool/tracer.so ./app.out
      

    b. PTX traces:

    • Navigate to llvm_tool directory and follow the instructions in the Readme file to build the tool
    • (1) you need to recompile the application using llvm and clang++ compiler option, (2) execute and run the application normally as you are executing it on the GPU HW
  3. Build the Reuse Distance tool

    • Navigate to the reuse_distance_tool and follow the instructions in the Readme file to build the tool
  4. Model the correct GPU configurations

    The (hardware)[https://github.com/NMSU-PEARL/PPT-GPU/tree/main/hardware] directory has examples of multiple hardware configurations. You can choose to model one of these or define your own in a new file. You can also define the ISA latencies numbers, and the compute capability configurations inside hardware/ISA and hardware/compute_capability, respectively

  5. Run the simulations

  • Navigate to PPT-GPU home directory

    For example, running a 2mm application on TITAN V with sass traces. Assuming that 2mm path is "/home/test/Workloads/2mm"

    mpiexec -n 2 python ppt.py --app /home/test/Workloads/2mm/ --sass --config TITANV --granularity 2 
    

    To choose specific kernels only, (let's say in PTX traces):

    mpiexec -n 1 python ppt.py --app /home/test/Workloads/2mm/ --ptx --config TITANV --granularity 2 --kernel 1
    

    Kernels are ordered in the app_config.py file. Please refer to the file to know the information of kernels and the orders


(B) Using our pre-configured Docker Images

  1. Extract the traces of the application

    See to Docker-Images-and-Usage#trace-extraction and follow the instructions there

  2. Build the Reuse Distance tool

    • Navigate to the reuse_distance_tool
      cd PPT-GPU/reuse_distance_tool
      
    • build the tool with docker:
      UUID=$(id -u) GID=$(id -g); docker run --user $UUID:$GID --rm -v $(pwd):$(pwd) -w $(pwd) yarafa/ppt-gpu:simulation-latest make clean && make
      
  3. Run the simulations

    See Docker-Images-and-Usage#simulations and follow the instructions there