Steps For Running PPT GPU

High-level steps for modeling a GPU workload/application:

Extract and collect the traces of the application you want to model
- This step needs a real Nvidia GPU (with a compute SM capability >= 3.5)
- build the application using nvcc (for SASS) or clang (for PTX)
- run the application through the tracing_tool and/or the llvm_tool
- Traces will be in 3 folders: sass_traces, ptx_traces, and memory_traces
Run PPT-GPU on the collected traces
- build the reuse_distance_tool
- choosing a GPU HWto model, or add new configs inside the (hardware)[https://github.com/NMSU-PEARL/PPT-GPU/tree/main/hardware] directory
- run PPT-GPU with the appropriate knobs
- performance results (per kernel) are written in the application parent directory.

Assuming you have all the SW & HW (dependencies)[https://github.com/NMSU-PEARL/PPT-GPU/wiki/SW-&-HW-Dependencies] installed

Configure & Update MPI path
- Update defaultMpichLibName with the correct libmpich.so path in the simian.py file
Extract the traces of the application

a. SASS/memory traces:
- Navigate to tracing_tool directory and follow the instructions in the Readme file to build the tool
- To get the traces for a certain application you have preload the tracer.so file as a shared library while running the application:
```
LD_PRELOAD=~/PPT-GPU/tracing_tool/tracer.so ./app.out
```
b. PTX traces:
- Navigate to llvm_tool directory and follow the instructions in the Readme file to build the tool
- (1) you need to recompile the application using llvm and clang++ compiler option, (2) execute and run the application normally as you are executing it on the GPU HW
Build the Reuse Distance tool
- Navigate to the reuse_distance_tool and follow the instructions in the Readme file to build the tool
Model the correct GPU configurations

The (hardware)[https://github.com/NMSU-PEARL/PPT-GPU/tree/main/hardware] directory has examples of multiple hardware configurations. You can choose to model one of these or define your own in a new file. You can also define the ISA latencies numbers, and the compute capability configurations inside hardware/ISA and hardware/compute_capability, respectively
Run the simulations

Extract the traces of the application

See to Docker-Images-and-Usage#trace-extraction and follow the instructions there

Build the Reuse Distance tool

build the tool with docker:

UUID=$(id -u) GID=$(id -g); docker run --user $UUID:$GID --rm -v $(pwd):$(pwd) -w $(pwd) yarafa/ppt-gpu:simulation-latest make clean && make

Run the simulations

See Docker-Images-and-Usage#simulations and follow the instructions there