Homework 9 Solutions

H&P 8.3

First calculate the number of memory accesses which are reads because they are the same for both parts of the question.

90% of all accesses are found in the cache which means that 10% of all memory accesses are cache misses, and therefore cause a read of 2 words to occur. Cache references occur at a rate of 10,000,000 per second.

10,000,000 * 10% * 2 = 2,000,000 words/second for reads

H&P 8.3(a)

In the write-through scheme, every cache write causes a memory write.

25% of all references are writes, therefore:
10,000,000 * 25% = 2,500,000 words/second for writes
Therefore, (2,000,000 + 2,500,000)/10,000,000 = 45% bandwidth

H&P 8.3(b)

In the write-back scheme, a write occurs whenever a dirty block is replaced in the cache. Blocks are replaced in the cache on every miss, and 30% are dirty. Therefore:

misses * 30% * 2 = number of dirty writes

(10,000,000 * 10%) * 30% * 2 = 600,000 words/second for dirty writes

Therefore, (2,000,000 + 600,000)/10,000,000 = 26% bandwidth

H&P 8.4

%Hit_Data = 96%
%Hit_Inst = 98%
%Dirty = 50%

WriteBack

TA_Inst = (%Hit_Inst)(1) + (%Miss)(10)
TA_Inst = (0.98)(1) + (0.02)(10)
TA_Inst = 1.18

TA_Data = (%Hit_Data)(1) + (%Miss)(10) + (%Miss)(%Dirty)(10)
TA_Data = (0.96)(1) + (0.04)(10) + (0.04)(0.50)(10)
TA_Data = 1.56

Stores take one extra cycle in WriteBack:

Stalls/Inst = (TA_Inst - 1) + (TA_Data - 1)(%Load) + (TA_Data - 1 + 1)(%Store)
Stalls/Inst = (1.18 - 1) + (1.56 - 1)(0.18) + (1.56)(0.08)
Stalls/Inst = 0.4056

CPI = IdealCPI + Stalls/Inst
CPI = 1 + 0.4056
CPI = 1.406

WriteThrough

TA_Inst = same as in WriteBack
TA_Inst = 1.18

TA_DataRead = (%Hit_Data)(1) + (%Miss)(10)
TA_DataRead = (0.96)(1) + (0.04)(10)
TA_DataRead = 1.36

TA_DataWrite = 1

Stalls/Inst = (TA_Inst - 1) + (TA_DataRead - 1)(%Load) + (TA_DataWrite - 1)(%Store)
Stalls/Inst = (1.18 - 1) + (1.36 - 1)(0.18) + (0)(0.08)
Stalls/Inst = 0.245

CPI = IdealCPI + Stalls/Inst
CPI = 1 + 0.245
CPI = 1.245

Compare CPIs

CPI_WriteThrough = 1.24
CPI_WriteBack = 1.41

CPI_WriteThrough is n% faster than CPI_WriteBack:
(CPI_WriteBack - CPI_WriteThrough) / CPI_WriteThrough
(1.41 - 1.24)/1.24 = 14%

WriteThrough is 14% faster than WriteBack:

H&P 8.5

4-way-set-associative unified cache of 64KB

The unified cache incurs a 1 cycle penalty for each load and store instruction.
From figure 8.12, it has a miss rate of 2.8%
From appendix C, 18% of instructions are data reads and 8% are data writes. Of course, 100% of instructions are instruction reads.
The miss penalty is 12 cycles

a.

The access times are 1 cycle for a hit and 12 cycles for a miss. Therefore, the average access time is:

(100% - 2.8%) * 1 + 2.8% * 12 = 1.308 cycles

b.

The CPI is the base CPI (1.5) plus the unified cache penalty (1 cycle for loads and stores) plus the read-miss penalty. That is as follows:

1.5 + ((18% + 8%) * 1) + ((18% + 100%) * (1.308 - 1)) = 2.12

Two 2-way-set-associative caches of 32KB each

From figure 8.12, each cache has a miss rate of 4.1%
Again, 18% of instructions are data reads and 100% are instruction reads.

a.

The access times are 1 cycle for a hit and 12 cycles for a miss. Therefore, the average access time is:

(100% - 4.1%) * 1 + 4.1% * 12 = 1.451 cycles

b.

The CPI is the base CPI (1.5) plus the instruction read penalty plus the data read-miss penalty. That is as follows:

1.5 + (100% * (1.451 - 1)) + (18% * (1.451 - 1)) = 2.03

Direct mapped unified cache of 128KB

From figure 8.12, a unified cache of size 128KB has about the same miss rate as a 2-way cache of size 64KB which is 3%.
The clock is 10% faster. (This is only relevant for comparing the results in part c)
The miss penalty is 13 cycles.

a.

The access times are 1 cycle for a hit and 13 cycles for a miss. Therefore, the average access time is:

(100% - 3%) * 1 + 3% * 13 = 1.36 cycles

b.

The CPI is the base CPI (1.5) plus the unified cache penalty (1 for both loads and stores) plus the read miss penalty. That is as follows:

1.5 + ((18% + 8%)*(1)) + ((18% + 100%)*(1.36 - 1)) = 2.18

c.

In order to compare the CPI of the 128k direct mapped cache to the others, its faster clock rate must be taken into account. n-cycles on this machine take 90% of the time that they would on either of the other two machines, so we'll compare 90% of it's CPI (2.18 * 90% = 1.96) to the CPI of the other machines.

With a normalized CPI of 1.96, the 128KB direct mapped, unified cache gives the best performance.