Assignment2

Part 1

Question 1

a. commited instructions

specbzip : 100000001
spechmmer : 10000001
speclibm : 10000000
specmcf : 100000000
specsjeng : 100000000

The executed instructions are not equal to commited instructions because after the execution,the instructions stay in the buffer. The buffer has to check if the orders are speculative. If they are , they commit to memory (committed instructions).

execute --> re order buffer --> check if speculative ---> commited instructions

Non-Speculative : If we don't need the result of the instruction the instruction is non-speculative.

b. REPLACEMENTS (L1 DATA CACHE) :

specbzip : 681759
spechmmer : 10275
speclibm : 147631
specmcf : 55092
specsjeng : 5262346

c. NUMBER OF ACCESSESS (L2 CACHE) :

system.l2.overall_accesses :: total

specbzip : 683562
spechmmer : 13327
speclibm : 149222
specmcf : 190604
specsjeng : 5264008

We can estimate the number of l2 cache accesses by these two metrics:

system.cpu.dcache.overall_mshr_misses::total + system.cpu.icache.overall_mshr_misses::total

number of overall MSHR misses + number of overall MSHR misses

MSHR : miss status holding register

Question 2

I) SIMULATION SECONDS

specbzip : 0.083664
spechmmer : 0.006329
speclibm : 0.017289
specmcf : 0.058458
specsjeng : 0.518333

II) CPI

specbzip : 1.673271
spechmmer :1.265776
speclibm : 3.457857
specmcf : 1.16916
specsjeng : 10.276660

III) MISS RATES

L1 ICACHE

specbzip : 0.000076
spechmmer : 0.001121
speclibm : 0.000928
specmcf : 0.004844
specsjeng : 0.000015

L1 DCACHE

specbzip : 0.014311
spechmmer : 0.003436
speclibm : 0.060934
specmcf : 0.002124
specsjeng : 0.121831

L2 CACHE

specbzip : 0.295248
spechmmer : 0.348466
speclibm : 0.999430
specmcf : 0.209015
specsjeng : 0.999978

Conclusuion :

After taking into consideration the three charts, we can understand that there isn't a significant analogy between the CPI and the simulation seconds between the benchmarks.Additionally, there is no obvious analogy between the CPI and the miss rates of L1 icache and dcache because of their inconsiderable miss rates. On the contrary,we observe that the CPI is highly affected by miss rate in l2 cache. L2 cache has bigger miss penalty which justifies the fact that it affects the CPI more than the L1 cache.
We can also observe that specsjeng benchmark has much bigger metrics in all aspects in comparison to the other benchmarks.

Question 3

system_clk.domain - DEFAULT ----- system_cpu_clk_domain.clock -DEFAULT = 1 / ticks(ps) = 1/500ps = 1/0.5 ns = 2GHz

specbzip : 1000 ----- 1. specbzip : 500
spechmmer : 1000 ----- 2. spechmmer : 500
speclibm : 1000 ----- 3. speclibm : 500
specmcf : 1000 ----- 4. specmcf : 500
specsjeng : 1000 ----- 5. specsjeng : 500

system_clk.domain - 1.5GHz ---- system_cpu_clk_domain.clock -1.5GHz = 1 / ticks(ps) = 1/667ps = 1/0.667 ns = 1.5GHz

specbzip : 1000 ------ 1. specbzip : 667
spechmmer : 1000 ------ 2. spechmmer :667
speclibm : 1000 ------ 3. speclibm : 667
specmcf : 1000 ------ 4. specmcf : 667
specsjeng : 1000 ------ 5. specsjeng : 667

The cpu clock is being affected by the frequency change and it is set to 1.5GHz. The system cock remains the same. The CPU clock domain is the CPU core's clock while system clock domain is responsible for synchronizing the rest of the system.

The config.json file shows this difference in this following script.

"cpu_clk_domain": { "type": "SrcClockDomain", "cxx_class": "SrcClockDomain", "name": "cpu_clk_domain", "path": "system.cpu_clk_domain", "clock": [ 667 ]

If we have N CPUs then the instructions will be executed in X time.

If we have N+1 CPUs then the instructions will be executed in (X*N)/(N+1) time.

So, if we put 2 CPUs the estimated time will be 667/2 ps. Which means that the frequency will be 3GHz.

SCALING

Due to the cpu frequency change from 2GHz to 1.5GHz (1.5 = 3/4 x 2) the simulation seconds must be 4/3 x B (B:default sim_seconds). N = new sim_seconds/default sim_seconds

specbzip : 0.109329 , N=1.307
spechmmer : 0.008370 , N=1.322
speclibm : 0.020359, N=1.177
specmcf : 0.077242 , N=1.321
specsjeng : 0.582132 , N=1.12

SCALING ORDER spechmmer > specmcf > specbzip > speclibm > specsjeng

Why is the scaling not perfect?

We can understand that there are mismatching values between the N parameter and 4/3.The pipeline has some stalls or accelerations which can cause different values in simulation seconds.Some benchmarks might also be more affected by frequency changes than others.

Part 2

Question 1

*How can we achieve bigger CPI value by changing

Associativity
Block size
Size allocation for L1 instruction cache
L1 data cache
L2 cache
Cache line size ?

There have been multiple tests in order to figure out how these parameters affect the CPI. Let's take for examble SPECBZIP:

We tried different sizes in l1_dcache and the CPI is:

default: 1.673271

16kB: 1.706060

32kB: 1.672933

64kB: 1.639115

So we concluded that as the size increases, the CPI becomes smaller. We also tried different sizes in l1_icache:

default : 1.673271

16kB : 1.639097

64kB : 1.638997

So we concluded that as the size increases, the CPI becomes smaller. Then we tried changing the l2 cache.

default : 1.673271

540kB : 1.702099

2MB : 1.638

So we concluded that as the l2 size increases, the CPI becomes smaller.

Different associativities:

default : 1.673271

l1i=1 l1d=1 l2=2

CPI=1.661490
l1i=1 l1d=1 l2=1

CPI=1.673903
l1i=2 l1d=2 l2=4

CPI=1.6381

*So we concluded that as the associativity increases, the CPI becomes smaller.

Different cacheline size:

cachline_size=64

CPI=1.61
cachline_size=128

CPI=1.595917

So we realized that as we increase the cacheline size, the CPI has a smaller value.

Finally, we put the maximum values so as to achieve a smaller CPI.

L1i cache=256kB L1d cache=256kB L2 cache=4MB associativity=2,2,4 cachline_size=256

So the CPI is 1.539.

We understand that the higher the caches' size the smaller the CPI value is. The same happens with associativity. It is quite logical because as we increase associativity the miss_rates become smaller and these can cause CPI to have a smaller value.

Question 2

How each parameter affects CPI , l1 miss rate and l2 miss rate in every benchmark?

We took the optimal parameters , that created the best CPU.

(Miss rates of l1 cache are multiplied with )

SPECBZIP

SPECHMMER

SPECLIBM

SPECMCF

SPECSJENG

SPECBZIP: It was most affected by l1 dcache size change.Specbzip is a procedure highly related to memory and that's why we see a big change when we resize l1 dcache.
SPECHMMER: It was most affected by l1 dcache size change for the same reason.Spechmmer is used for sequence analysis and that's why it is highly affected by memory size.
SPECLIBM: It was most affected by cacheline size change. It is used as a C library for math , which needs big memory size.
SPECMCF: It was most affected by cacheline size change.The factors we are unvestigating are being increased after changing this parameter.
SPECSJENG: It was most affected by cacheline size change.It is used as a game engine.As a matter of the fact big cacheline size is optimal for its function, because we want fast and big memory.

Part 3

FINDING A FUNCTION WHICH DESCRIBES THE COST (AFTER TAKING INTO CONSIDERATION THE FACTORS IN PART 2)

Our main purpose is to create a CPU with the smallest possible CPI (best performance) that simultaniously has the smallest possible cost. To achieve that, we will create a function that describes the cost.

Part 2 has shown that the main factor that affects the CPI value is the cacheline. L1 dcache also plays a big role in changing the CPI.After that, l1i slightly changes te CPI (specmcf).Associativities don't seem to play as much role as the other factors.

Taking into consideration the factors that affect the performance , we can put some multipliers to make obvious which show the impact of each parameter.

x: cacheline size/ maximum cacheline size

y: l1 dcache size/maximum l1 dcache size

z: l1 icache size/maximum l1 icache size

w: l2 cache size/maximum l2 cache size

q: associativity/maximum associativity

cacheline=5 x
l1 dcache=3 y
l1 icache=1.5 z
l2 cache=1 w
associativity=1 q

PERFORMANCE =5x + 3y + 1.5z +1w +1q

COST function:

We can understand that the bigger the memory is, the cost is larger. L1 cache has to be quite smaller than l2 cache for a bigger speed.

So, if we put multipliers in the terms of cost:

cacheline=5 x
l1 dcache=5 y
l1 icache=5 z
l2 cache=3 w
associativity=1 q

COST = 5x + 5y + 5z + 3w + 1q

The cost has to be as low as possible and the performance has to be as big as possible.

EFFECTIVENESS=PERFORMANCE/COST we want this fraction to be as big as possible (small cost and high performance)...

EFFECTIVENESS =

Some of the values that we tried are:

cacheline size = 16

l1 dcache size = 32 kB

l1 icache size = 16 kB

l2 cache size = 1 MB

associativity = 1

EFFECTIVENESS = 0.56944

cacheline size = 32

l1 dcache size = 64 kB

l1 icache size = 32 kB

l2 cache size = 2 MB

associativity = 2

EFFECTIVENESS = 0.788461

cacheline size = 32

l1 dcache size = 256 kB

l1 icache size = 32 kB

l2 cache size = 2 MB

associativity = 4

EFFECTIVENESS = 0.607142

According to the type, the most efficient cpu comes from small l1i cache like 32kB or 64kB, small associativities like 2 , mediocre l2 cache like 2MB and big l1d cache like 128kB.

OPTIMAL CACHE CONFIGURATION After taking into consideration the diagrams in part 2, we realised that each parameter affects the benchmarks differently. SPECBZIP's CPI shows a big decline after we raise the l1 d cache size.We saw in the EFFECTIVENESS function that l1 dcache has a big cost.So, the optimum size could be 128kB which is not the biggest size and can also affect the CPI in our favour.Cacheline size also plays a big role in both performance and cost.We could use a 64kB cacheline which is neither big nor small.L2 size affects EFFECTIVNESS in a smaller amount and could be 1MB.Big associativities complicate the system so 2 could be optimal.SPECHMMER shows a similar behavior to specbzip, so the same parameters could be ideal.SPECLIBM's CPI shows a big decline after we raise the cacheline.This factor has a lot of cost too ,so we don't want to raise it a lot.64kB could be ideal.The other factors don't affect CPI as much small values could be ideal in order to achieve a small cost.These, could be 64kB l1 icache, 64 kB li dcache , 2M l2 cache , 2 associativity.SPECMCF on the other hand, shows an increase in CPI as the cacheline size raises,so 32kB could be ideal.As L1 icache size increases ,CPI decreases so 64kB could be fine.SPECSJENG shows that it's being affected by cacheline size so a mediocre size like 64kB could be fine (it has a big cost).

COMMENTS The Assignment 2 mainly focused on understanding how gem5 works in different benchmarks.The only problem was that it took a lot of time to gather all results due to the big execution time.We learned to search more effectively the information and compare it with real simulation statistics.The assistants were very helpful as well.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
specbzip		specbzip
specbzip_assoc1		specbzip_assoc1
specbzip_assoc2		specbzip_assoc2
specbzip_assoc3		specbzip_assoc3
specbzip_cacheline		specbzip_cacheline
specbzip_freq		specbzip_freq
specbzip_l1d		specbzip_l1d
specbzip_l1i		specbzip_l1i
specbzip_l1i_l2		specbzip_l1i_l2
specbzip_l1i_l2_assoc		specbzip_l1i_l2_assoc
specbzip_l1i_l2b		specbzip_l1i_l2b
specbzip_l1i_l2b_assoc		specbzip_l1i_l2b_assoc
specbzip_l1i_l2b_assoc_cacheline/fs		specbzip_l1i_l2b_assoc_cacheline/fs
spechmmer		spechmmer
spechmmer_assoc1		spechmmer_assoc1
spechmmer_assoc2		spechmmer_assoc2
spechmmer_assoc3		spechmmer_assoc3
spechmmer_cacheline		spechmmer_cacheline
spechmmer_cachelineA		spechmmer_cachelineA
spechmmer_cachelineB		spechmmer_cachelineB
spechmmer_cachelineC		spechmmer_cachelineC
spechmmer_freq		spechmmer_freq
spechmmer_l1d		spechmmer_l1d
spechmmer_l1da		spechmmer_l1da
spechmmer_l1db		spechmmer_l1db
spechmmer_l1i		spechmmer_l1i
spechmmer_l1ia		spechmmer_l1ia
spechmmer_l1ib		spechmmer_l1ib
spechmmer_l1ic		spechmmer_l1ic
spechmmer_l2		spechmmer_l2
spechmmer_l2a		spechmmer_l2a
spechmmer_l2b		spechmmer_l2b
spechmmer_l2c		spechmmer_l2c
speclibm		speclibm
speclibm_assoc1		speclibm_assoc1
speclibm_assoc2		speclibm_assoc2
speclibm_assoc3		speclibm_assoc3
speclibm_cacheline		speclibm_cacheline
speclibm_freq		speclibm_freq
speclibm_l1d		speclibm_l1d
speclibm_l1i		speclibm_l1i
speclibm_l2		speclibm_l2
specmcf		specmcf
specmcf_assoc1		specmcf_assoc1
specmcf_assoc2		specmcf_assoc2
specmcf_assoc3		specmcf_assoc3
specmcf_cachline		specmcf_cachline
specmcf_freq		specmcf_freq
specmcf_l1i		specmcf_l1i
specmcf_l2		specmcf_l2
specsjeng_assoc1		specsjeng_assoc1
specsjeng_assoc2		specsjeng_assoc2
specsjeng_assoc3		specsjeng_assoc3
specsjeng_cacheline		specsjeng_cacheline
specsjeng_l1d		specsjeng_l1d
specsjeng_l1i		specsjeng_l1i
specsjeng_l2		specsjeng_l2
specsjng		specsjng
specsjng_freq		specsjng_freq
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assignment2

Part 1

Question 1

Question 2

Question 3

Part 2

Question 1

Question 2

Part 3

PERFORMANCE =5x + 3y + 1.5z +1w +1q

About

Releases

Packages

MariaXouri/Assignment2

Folders and files

Latest commit

History

Repository files navigation

Assignment2

Part 1

Question 1

Question 2

Question 3

Part 2

Question 1

Question 2

Part 3

PERFORMANCE =5x + 3y + 1.5z +1w +1q

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages