a. commited instructions
- specbzip : 100000001
- spechmmer : 10000001
- speclibm : 10000000
- specmcf : 100000000
- specsjeng : 100000000
The executed instructions are not equal to commited instructions because after the execution,the instructions stay in the buffer. The buffer has to check if the orders are speculative. If they are , they commit to memory (committed instructions).
execute --> re order buffer --> check if speculative ---> commited instructions
Non-Speculative : If we don't need the result of the instruction the instruction is non-speculative.
b. REPLACEMENTS (L1 DATA CACHE) :
- specbzip : 681759
- spechmmer : 10275
- speclibm : 147631
- specmcf : 55092
- specsjeng : 5262346
c. NUMBER OF ACCESSESS (L2 CACHE) :
system.l2.overall_accesses :: total
- specbzip : 683562
- spechmmer : 13327
- speclibm : 149222
- specmcf : 190604
- specsjeng : 5264008
We can estimate the number of l2 cache accesses by these two metrics:
system.cpu.dcache.overall_mshr_misses::total + system.cpu.icache.overall_mshr_misses::total
number of overall MSHR misses + number of overall MSHR misses
MSHR : miss status holding register
I) SIMULATION SECONDS
- specbzip : 0.083664
- spechmmer : 0.006329
- speclibm : 0.017289
- specmcf : 0.058458
- specsjeng : 0.518333
II) CPI
- specbzip : 1.673271
- spechmmer :1.265776
- speclibm : 3.457857
- specmcf : 1.16916
- specsjeng : 10.276660
III) MISS RATES
- L1 ICACHE
- specbzip : 0.000076
- spechmmer : 0.001121
- speclibm : 0.000928
- specmcf : 0.004844
- specsjeng : 0.000015
- L1 DCACHE
- specbzip : 0.014311
- spechmmer : 0.003436
- speclibm : 0.060934
- specmcf : 0.002124
- specsjeng : 0.121831
- L2 CACHE
- specbzip : 0.295248
- spechmmer : 0.348466
- speclibm : 0.999430
- specmcf : 0.209015
- specsjeng : 0.999978
Conclusuion :
-
After taking into consideration the three charts, we can understand that there isn't a significant analogy between the CPI and the simulation seconds between the benchmarks.Additionally, there is no obvious analogy between the CPI and the miss rates of L1 icache and dcache because of their inconsiderable miss rates. On the contrary,we observe that the CPI is highly affected by miss rate in l2 cache. L2 cache has bigger miss penalty which justifies the fact that it affects the CPI more than the L1 cache.
-
We can also observe that specsjeng benchmark has much bigger metrics in all aspects in comparison to the other benchmarks.
system_clk.domain - DEFAULT ----- system_cpu_clk_domain.clock -DEFAULT = 1 / ticks(ps) = 1/500ps = 1/0.5 ns = 2GHz
-
specbzip : 1000 ----- 1. specbzip : 500
-
spechmmer : 1000 ----- 2. spechmmer : 500
-
speclibm : 1000 ----- 3. speclibm : 500
-
specmcf : 1000 ----- 4. specmcf : 500
-
specsjeng : 1000 ----- 5. specsjeng : 500
system_clk.domain - 1.5GHz ---- system_cpu_clk_domain.clock -1.5GHz = 1 / ticks(ps) = 1/667ps = 1/0.667 ns = 1.5GHz
-
specbzip : 1000 ------ 1. specbzip : 667
-
spechmmer : 1000 ------ 2. spechmmer :667
-
speclibm : 1000 ------ 3. speclibm : 667
-
specmcf : 1000 ------ 4. specmcf : 667
-
specsjeng : 1000 ------ 5. specsjeng : 667
The cpu clock is being affected by the frequency change and it is set to 1.5GHz. The system cock remains the same. The CPU clock domain is the CPU core's clock while system clock domain is responsible for synchronizing the rest of the system.
The config.json file shows this difference in this following script.
"cpu_clk_domain": { "type": "SrcClockDomain", "cxx_class": "SrcClockDomain", "name": "cpu_clk_domain", "path": "system.cpu_clk_domain", "clock": [ 667 ]
If we have N CPUs then the instructions will be executed in X time.
If we have N+1 CPUs then the instructions will be executed in (X*N)/(N+1) time.
So, if we put 2 CPUs the estimated time will be 667/2 ps. Which means that the frequency will be 3GHz.
SCALING
Due to the cpu frequency change from 2GHz to 1.5GHz (1.5 = 3/4 x 2) the simulation seconds must be 4/3 x B (B:default sim_seconds). N = new sim_seconds/default sim_seconds
- specbzip : 0.109329 , N=1.307
- spechmmer : 0.008370 , N=1.322
- speclibm : 0.020359, N=1.177
- specmcf : 0.077242 , N=1.321
- specsjeng : 0.582132 , N=1.12
SCALING ORDER spechmmer > specmcf > specbzip > speclibm > specsjeng
Why is the scaling not perfect?
We can understand that there are mismatching values between the N parameter and 4/3.The pipeline has some stalls or accelerations which can cause different values in simulation seconds.Some benchmarks might also be more affected by frequency changes than others.
*How can we achieve bigger CPI value by changing
- Associativity
- Block size
- Size allocation for L1 instruction cache
- L1 data cache
- L2 cache
- Cache line size ?
There have been multiple tests in order to figure out how these parameters affect the CPI. Let's take for examble SPECBZIP:
We tried different sizes in l1_dcache and the CPI is:
-
default: 1.673271
16kB: 1.706060
32kB: 1.672933
64kB: 1.639115
So we concluded that as the size increases, the CPI becomes smaller. We also tried different sizes in l1_icache:
-
default : 1.673271
16kB : 1.639097
64kB : 1.638997
So we concluded that as the size increases, the CPI becomes smaller. Then we tried changing the l2 cache.
-
default : 1.673271
540kB : 1.702099
2MB : 1.638
So we concluded that as the l2 size increases, the CPI becomes smaller.
-
Different associativities:
default : 1.673271
l1i=1 l1d=1 l2=2
CPI=1.661490
-
l1i=1 l1d=1 l2=1
CPI=1.673903
-
l1i=2 l1d=2 l2=4
CPI=1.6381
*So we concluded that as the associativity increases, the CPI becomes smaller.
-
Different cacheline size:
cachline_size=64
CPI=1.61
-
cachline_size=128
CPI=1.595917
So we realized that as we increase the cacheline size, the CPI has a smaller value.
Finally, we put the maximum values so as to achieve a smaller CPI.
L1i cache=256kB L1d cache=256kB L2 cache=4MB associativity=2,2,4 cachline_size=256
So the CPI is 1.539.
We understand that the higher the caches' size the smaller the CPI value is. The same happens with associativity. It is quite logical because as we increase associativity the miss_rates become smaller and these can cause CPI to have a smaller value.
How each parameter affects CPI , l1 miss rate and l2 miss rate in every benchmark?
We took the optimal parameters , that created the best CPU.
(Miss rates of l1 cache are multiplied with )
SPECBZIP
SPECHMMER
SPECLIBM
SPECMCF
SPECSJENG
- SPECBZIP: It was most affected by l1 dcache size change.Specbzip is a procedure highly related to memory and that's why we see a big change when we resize l1 dcache.
- SPECHMMER: It was most affected by l1 dcache size change for the same reason.Spechmmer is used for sequence analysis and that's why it is highly affected by memory size.
- SPECLIBM: It was most affected by cacheline size change. It is used as a C library for math , which needs big memory size.
- SPECMCF: It was most affected by cacheline size change.The factors we are unvestigating are being increased after changing this parameter.
- SPECSJENG: It was most affected by cacheline size change.It is used as a game engine.As a matter of the fact big cacheline size is optimal for its function, because we want fast and big memory.
FINDING A FUNCTION WHICH DESCRIBES THE COST (AFTER TAKING INTO CONSIDERATION THE FACTORS IN PART 2)
Our main purpose is to create a CPU with the smallest possible CPI (best performance) that simultaniously has the smallest possible cost. To achieve that, we will create a function that describes the cost.
Part 2 has shown that the main factor that affects the CPI value is the cacheline. L1 dcache also plays a big role in changing the CPI.After that, l1i slightly changes te CPI (specmcf).Associativities don't seem to play as much role as the other factors.
Taking into consideration the factors that affect the performance , we can put some multipliers to make obvious which show the impact of each parameter.
x: cacheline size/ maximum cacheline size
y: l1 dcache size/maximum l1 dcache size
z: l1 icache size/maximum l1 icache size
w: l2 cache size/maximum l2 cache size
q: associativity/maximum associativity
-
cacheline=5 x
-
l1 dcache=3 y
-
l1 icache=1.5 z
-
l2 cache=1 w
-
associativity=1 q
COST function:
We can understand that the bigger the memory is, the cost is larger. L1 cache has to be quite smaller than l2 cache for a bigger speed.
So, if we put multipliers in the terms of cost:
-
cacheline=5 x
-
l1 dcache=5 y
-
l1 icache=5 z
-
l2 cache=3 w
-
associativity=1 q
COST = 5x + 5y + 5z + 3w + 1q
The cost has to be as low as possible and the performance has to be as big as possible.
EFFECTIVENESS=PERFORMANCE/COST we want this fraction to be as big as possible (small cost and high performance)...
EFFECTIVENESS =
Some of the values that we tried are:
-
cacheline size = 16
l1 dcache size = 32 kB
l1 icache size = 16 kB
l2 cache size = 1 MB
associativity = 1
EFFECTIVENESS = 0.56944
-
cacheline size = 32
l1 dcache size = 64 kB
l1 icache size = 32 kB
l2 cache size = 2 MB
associativity = 2
EFFECTIVENESS = 0.788461
-
cacheline size = 32
l1 dcache size = 256 kB
l1 icache size = 32 kB
l2 cache size = 2 MB
associativity = 4
EFFECTIVENESS = 0.607142
According to the type, the most efficient cpu comes from small l1i cache like 32kB or 64kB, small associativities like 2 , mediocre l2 cache like 2MB and big l1d cache like 128kB.
OPTIMAL CACHE CONFIGURATION After taking into consideration the diagrams in part 2, we realised that each parameter affects the benchmarks differently. SPECBZIP's CPI shows a big decline after we raise the l1 d cache size.We saw in the EFFECTIVENESS function that l1 dcache has a big cost.So, the optimum size could be 128kB which is not the biggest size and can also affect the CPI in our favour.Cacheline size also plays a big role in both performance and cost.We could use a 64kB cacheline which is neither big nor small.L2 size affects EFFECTIVNESS in a smaller amount and could be 1MB.Big associativities complicate the system so 2 could be optimal.SPECHMMER shows a similar behavior to specbzip, so the same parameters could be ideal.SPECLIBM's CPI shows a big decline after we raise the cacheline.This factor has a lot of cost too ,so we don't want to raise it a lot.64kB could be ideal.The other factors don't affect CPI as much small values could be ideal in order to achieve a small cost.These, could be 64kB l1 icache, 64 kB li dcache , 2M l2 cache , 2 associativity.SPECMCF on the other hand, shows an increase in CPI as the cacheline size raises,so 32kB could be ideal.As L1 icache size increases ,CPI decreases so 64kB could be fine.SPECSJENG shows that it's being affected by cacheline size so a mediocre size like 64kB could be fine (it has a big cost).
COMMENTS The Assignment 2 mainly focused on understanding how gem5 works in different benchmarks.The only problem was that it took a lot of time to gather all results due to the big execution time.We learned to search more effectively the information and compare it with real simulation statistics.The assistants were very helpful as well.