forked from ysshao/WIICA
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
137 lines (111 loc) · 5.9 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
WIICA: Workload ISA-Independent Characterization for Applications
v1.0 Public Release
=================================================================
WIICA is a workload characterization tool to characterize the ISA-independent
characteristics of applications in the context of specialized architectures.
If you use WIICA in your research, please cite:
ISA-Independent Workload Characterization and its Implications for Specialized
Architectures,
Yakun Sophia Shao and David Brooks,
International Symposium on Performance Analysis of Systems and Software
(ISPASS), April 2013
==================================================================
0. Build WIICA
1) LLVM 3.4 and Clang 3.4 64-bit
2) LLVM IR Trace Profiler (LLVM-Tracer)
LLVM-Tracer is an LLVM compiler pass that instruments code in LLVM
machine-independent IR. It prints out a dynamic trace of your program, which
then be take as input for WIICA (and Aladdin.)
You can download LLVM-Tracer from here:
[https://github.com/ysshao/LLVM-Tracer]
To build LLVM-Tracer, please follow the instructions in README.md in
LLVM-Tracer.
=================================================================
1. Run WIICA:
After you build LLVM-Tracer, you need to
1) Set environment variable TRACER_HOME to /path/to/your/LLVM-Tracer:
```
export TRACER_HOME=/path/to/your/LLVM-Tracer/
```
2) Specify the kernels that you want to instrument. For example, for Benchmark
FFT, we want to instrument functions: fft1D_512, step1, step2, ..., step11. In
the `compile.py` script, we already specify all the common functions in SHOC in
the `kernel` dictionary.
An example to run wiica is:
cd scripts
python run_wiica.py --directory /your/path/to/wiica/SHOC/fft/ --source fft --analysis_types memory
=================================================================
Related scripts:
1) run_wiica.py
The interface of wiica.
usage: run_wiica.py [-h] [--directory DIRECTORY]
[--source SOURCE]
[--analysis_types [{opcode,staticinst,memory,branch,basicblock,register}]]
optional arguments:
-h, --help show this help message and exit
--directory DIRECTORY
ABSOLUTE directory of the benchmark
--source SOURCE a list of source files with suffixes, e.g. fft.c, md.c, etc.
--analysis_types [{opcode,staticinst,memory,branch,basicblock,register} ]
Type of analysis. Separate multiple values with
spaces. The supported analysis types are shown.
2) compile.py
Compiling the program with LLVM-Tracer to generate a dynamic LLVM IR trace.
3) process_trace.py
For those benchmarks with "llvm.memset" instrinsic. It replaces "llvm.memset" with several non-intrinsic instructions.
4) analysis.py
Performing opcode,staticinst,memory,branch,basicblock,register analysis.
Opcode: Opcode Breakdown into Compute, Memory, and Branch
StaticInst: Number of dynamic executions for each static instruction, sorted
by the dynamic counts
Memory: Memory Footprint, Memory Global/Local Entropy[Shao2013]
Branch: Branch Entropy[Shao2013]
BasicBlock: Size and number of dynamic executions of each basic block
5) mem_analysis.py
Spatial Locality Score, see [Weinberg2005] for more details.
Temporal Locality Score, see [Weinberg2005] for more details.
6) reg_analysis.py
Register Degrees: The averarge use of registers, equals to the total number of register read divided by the total number of register write, see [Franklin1992] for more details.
Register Distribution: The distribution of the register dependency distance.
Register Lifetime: The distribution of the distance between the creation and the last use of registers, see [Franklin1992] for more details.
Register Number: The number of register required at a certain point. We assume the application is executed 1 instruction per cycle.
=================================================================
2.WIICA Outputs:
Stats files are generated to store the results including:
(These files are generated from analysis.py)
[bench name]_opcode_profile
[bench name]_staticinst_profile
[bench name]_footprint Memory footprint
[bench name]_mem_entropy
[bench name]_branch_entropy
[bench name]_basicblock_profile
(These files are generated from mem_analysis.py)
[bench name]_spatial_locality
[bench name]_temporal_locality
[bench name]_stride_profile Used to compute spatial locality
[bench name]_reuse_profile Used to compute temporal locality
(These files are generated from reg_analysis.py)
[bench name]_reg_degree total read / total write
[bench name]_reg_distribution dependency distance distribution
[bench name]_reg_lifetime the distance (between when the register is created with when it is used for the last time) distribution
[bench name]_reg_number The dynamic register number needed at each cycle (assume 1 cycle / instruction)
[bench name]_reg_maxn The maximun number in [bench name]_reg_number, which is the minimun number of registers needed to run the program
=================================================================
4. Feedback
Feel free to leave us a message on github if you have any questions or
comments.
=================================================================
Yu Emma Wang, Sophia Yakun Shao
VLSI-Arch group
Harvard University
July 26, 2014
=================================================================
References:
[Weinberg2005] J. Weinberg, M.O. McCracken, E. Strohmaier, and A. Snavely.
Quantifying Locality in the Memory Access Patterns of HPC Applications, SC, 2005
[Shao2013] Y.S. Shao and D. Brooks.
ISA-Independent Workload Characterization and its Implications for Specialized
Architecture, ISPASS, 2013
[Franklin1992] Franklin, M., & Sohi, G. S.
Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors.
In ACM SIGMICRO Newsletter (Vol. 23, No. 1-2, pp. 236-245). IEEE Computer Society Press, 1992.