-
Notifications
You must be signed in to change notification settings - Fork 7
Steps For Running PPT GPU
-
Extract and collect the traces of the application you want to model
- This step needs a real Nvidia GPU (with a compute SM capability >= 3.5)
- build the application using nvcc (for SASS) or clang (for PTX)
- run the application through the tracing_tool and/or the llvm_tool
- Traces will be in 3 folders: sass_traces, ptx_traces, and memory_traces
-
Run PPT-GPU on the collected traces
- build the reuse_distance_tool
- choosing a GPU HWto model, or add new configs inside the (hardware)[https://github.com/NMSU-PEARL/PPT-GPU/tree/main/hardware] directory
- run PPT-GPU with the appropriate knobs
- performance results (per kernel) are written in the application parent directory.
Assuming you have all the SW & HW (dependencies)[https://github.com/NMSU-PEARL/PPT-GPU/wiki/SW-&-HW-Dependencies] installed
-
Configure & Update MPI path
- Update defaultMpichLibName with the correct libmpich.so path in the simian.py file
-
Extract the traces of the application
a. SASS/memory traces:
-
Navigate to tracing_tool directory and follow the instructions in the Readme file to build the tool
-
To get the traces for a certain application you have preload the tracer.so file as a shared library while running the application:
LD_PRELOAD=~/PPT-GPU/tracing_tool/tracer.so ./app.out
b. PTX traces:
- Navigate to llvm_tool directory and follow the instructions in the Readme file to build the tool
- (1) you need to recompile the application using llvm and clang++ compiler option, (2) execute and run the application normally as you are executing it on the GPU HW
-
-
Build the Reuse Distance tool
- Navigate to the reuse_distance_tool and follow the instructions in the Readme file to build the tool
-
Model the correct GPU configurations
The (hardware)[https://github.com/NMSU-PEARL/PPT-GPU/tree/main/hardware] directory has examples of multiple hardware configurations. You can choose to model one of these or define your own in a new file. You can also define the ISA latencies numbers, and the compute capability configurations inside hardware/ISA and hardware/compute_capability, respectively
-
Run the simulations
-
Navigate to PPT-GPU home directory
For example, running a 2mm application on TITAN V with sass traces. Assuming that 2mm path is "/home/test/Workloads/2mm"
mpiexec -n 2 python ppt.py --app /home/test/Workloads/2mm/ --sass --config TITANV --granularity 2
To choose specific kernels only, (let's say in PTX traces):
mpiexec -n 1 python ppt.py --app /home/test/Workloads/2mm/ --ptx --config TITANV --granularity 2 --kernel 1
Kernels are ordered in the app_config.py file. Please refer to the file to know the information of kernels and the orders
(B) Using our pre-configured Docker Images
-
Extract the traces of the application
See to Docker-Images-and-Usage#trace-extraction and follow the instructions there
-
Build the Reuse Distance tool
- Navigate to the reuse_distance_tool
cd PPT-GPU/reuse_distance_tool
- build the tool with docker:
UUID=$(id -u) GID=$(id -g); docker run --user $UUID:$GID --rm -v $(pwd):$(pwd) -w $(pwd) yarafa/ppt-gpu:simulation-latest make clean && make
- Navigate to the reuse_distance_tool
-
Run the simulations
See Docker-Images-and-Usage#simulations and follow the instructions there