forked from msasongko17/hpctoolkit
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ReuseTracker.HowToRun
executable file
·84 lines (48 loc) · 5.53 KB
/
ReuseTracker.HowToRun
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
Requirements
============
To run perf_event_open system call without having to use sudo access,
set the value of perf_event_paranoid to -1 by typing the following command:
sudo sysctl -w kernel.perf_event_paranoid=-1
Usage on Intel
==============
- To profile the private cache temporal reuse distance of an application, run the following command.
HPCRUN_WP_REUSE_PROFILE_TYPE="TEMPORAL" HPCRUN_PROFILE_L3=false HPCRUN_WP_REUSE_BIN_SCHEME=4000,2 HPCRUN_WP_CACHELINE_INVALIDATION=true HPCRUN_WP_DONT_FIX_IP=true HPCRUN_WP_DONT_DISASSEMBLE_TRIGGER_ADDRESS=true hpcrun -e WP_REUSETRACKER -e MEM_UOPS_RETIRED:ALL_LOADS@100000 -e MEM_UOPS_RETIRED:ALL_STORES@100000 <./your_executable> your_args
- To profile the private cache spatial reuse distance of an application, run the following command.
HPCRUN_WP_REUSE_PROFILE_TYPE="SPATIAL" HPCRUN_PROFILE_L3=false HPCRUN_WP_REUSE_BIN_SCHEME=4000,2 HPCRUN_WP_CACHELINE_INVALIDATION=true HPCRUN_WP_DONT_FIX_IP=true HPCRUN_WP_DONT_DISASSEMBLE_TRIGGER_ADDRESS=true hpcrun -e WP_REUSETRACKER -e MEM_UOPS_RETIRED:ALL_LOADS@100000 -e MEM_UOPS_RETIRED:ALL_STORES@100000 <./your_executable> your_args
- To profile the shared cache temporal reuse distance of an application, run the following command.
HPCRUN_WP_REUSE_PROFILE_TYPE="TEMPORAL" HPCRUN_PROFILE_L3=true HPCRUN_WP_REUSE_BIN_SCHEME=4000,2 HPCRUN_WP_CACHELINE_INVALIDATION=true HPCRUN_WP_DONT_FIX_IP=true HPCRUN_WP_DONT_DISASSEMBLE_TRIGGER_ADDRESS=true hpcrun -e WP_REUSETRACKER -e MEM_UOPS_RETIRED:ALL_LOADS@100000 -e MEM_UOPS_RETIRED:ALL_STORES@100000 <./your_executable> your_args
- To profile the shared cache spatial reuse distance of an application, run the following command.
HPCRUN_WP_REUSE_PROFILE_TYPE="SPATIAL" HPCRUN_PROFILE_L3=true HPCRUN_WP_REUSE_BIN_SCHEME=4000,2 HPCRUN_WP_CACHELINE_INVALIDATION=true HPCRUN_WP_DONT_FIX_IP=true HPCRUN_WP_DONT_DISASSEMBLE_TRIGGER_ADDRESS=true hpcrun -e WP_REUSETRACKER -e MEM_UOPS_RETIRED:ALL_LOADS@100000 -e MEM_UOPS_RETIRED:ALL_STORES@100000 <./your_executable> your_args
Usage on AMD
============
- To profile the private cache temporal reuse distance of an application, run the following command.
HPCRUN_WP_REUSE_PROFILE_TYPE="TEMPORAL" HPCRUN_PROFILE_L3=false HPCRUN_WP_REUSE_BIN_SCHEME=4000,2 HPCRUN_WP_CACHELINE_INVALIDATION=true HPCRUN_WP_DONT_FIX_IP=true HPCRUN_WP_DONT_DISASSEMBLE_TRIGGER_ADDRESS=true hpcrun -e WP_AMD_REUSETRACKER -e IBS_OP@100000 -e AMD_L1_DATA_ACCESS@100000000 <./your_executable> your_args
- To profile the private cache spatial reuse distance of an application, run the following command.
HPCRUN_WP_REUSE_PROFILE_TYPE="SPATIAL" HPCRUN_PROFILE_L3=false HPCRUN_WP_REUSE_BIN_SCHEME=4000,2 HPCRUN_WP_CACHELINE_INVALIDATION=true HPCRUN_WP_DONT_FIX_IP=true HPCRUN_WP_DONT_DISASSEMBLE_TRIGGER_ADDRESS=true hpcrun -e WP_AMD_REUSETRACKER -e IBS_OP@100000 -e AMD_L1_DATA_ACCESS@100000000 <./your_executable> your_args
- To profile the shared cache temporal reuse distance of an application, run the following command.
HPCRUN_WP_REUSE_PROFILE_TYPE="TEMPORAL" HPCRUN_PROFILE_L3=true HPCRUN_WP_REUSE_BIN_SCHEME=4000,2 HPCRUN_WP_CACHELINE_INVALIDATION=true HPCRUN_WP_DONT_FIX_IP=true HPCRUN_WP_DONT_DISASSEMBLE_TRIGGER_ADDRESS=true hpcrun -e WP_AMD_REUSETRACKER -e IBS_OP@100000 -e AMD_L1_DATA_ACCESS@100000000 <./your_executable> your_args
- To profile the shared cache spatial reuse distance of an application, run the following command.
HPCRUN_WP_REUSE_PROFILE_TYPE="SPATIAL" HPCRUN_PROFILE_L3=true HPCRUN_WP_REUSE_BIN_SCHEME=4000,2 HPCRUN_WP_CACHELINE_INVALIDATION=true HPCRUN_WP_DONT_FIX_IP=true HPCRUN_WP_DONT_DISASSEMBLE_TRIGGER_ADDRESS=true hpcrun -e WP_AMD_REUSETRACKER -e IBS_OP@100000 -e AMD_L1_DATA_ACCESS@100000000 <./your_executable> your_args
Attribution to Locations in Source Code
=======================================
1. Compile the code that you want to profile using "-g" flag to allow for debugging.
2. To attribute the detected communications to their locations in source code lines and program stacks,
you need to take the following steps:
a. Download and extract a binary release of hpcviewer from
http://hpctoolkit.org/download/hpcviewer/latest/hpcviewer-linux.gtk.x86_64.tgz
b. Run ReuseTracker on a program to be profiled
In Intel:
HPCRUN_WP_REUSE_PROFILE_TYPE="TEMPORAL" HPCRUN_PROFILE_L3=false HPCRUN_WP_REUSE_BIN_SCHEME=4000,2 HPCRUN_WP_CACHELINE_INVALIDATION=true HPCRUN_WP_DONT_FIX_IP=true HPCRUN_WP_DONT_DISASSEMBLE_TRIGGER_ADDRESS=true hpcrun -e WP_REUSETRACKER -e MEM_UOPS_RETIRED:ALL_LOADS@100000 -e MEM_UOPS_RETIRED:ALL_STORES@100000 <./your_executable> your_args
In AMD:
HPCRUN_WP_REUSE_PROFILE_TYPE="TEMPORAL" HPCRUN_PROFILE_L3=false HPCRUN_WP_REUSE_BIN_SCHEME=4000,2 HPCRUN_WP_CACHELINE_INVALIDATION=true HPCRUN_WP_DONT_FIX_IP=true HPCRUN_WP_DONT_DISASSEMBLE_TRIGGER_ADDRESS=true hpcrun -e WP_AMD_REUSETRACKER -e IBS_OP@100000 -e AMD_L1_DATA_ACCESS@100000000 <./your_executable> your_args
c. Extract the static program structure from the profiled program by using hpcstruct
hpcstruct <./your_executable>
The output of hpcstruct is <./your_executable>.hpcstruct.
d. Generate an experiment result database using hpcprof
hpcprof -S <./your_executable>.hpcstruct -o <name of database> <name of output folder>
The output of hpcprof is a folder named <name of database>.
e. Use hpcviewer to read the content of the experiment result database in a GUI interface
hpcviewer/hpcviewer <name of database>
Information on program stack and source code lines is available in the Scope column,
and the other columns display numbers of detected reuses, total time distance,
and total reuse distance that correspond to the source code lines.