Homework 9 Solutions
H&P 8.3
First calculate the number of memory accesses which are reads because
they are the same for both parts of the question.
90% of all accesses are found in the cache which means that
10% of all memory accesses are cache misses, and therefore
cause a read of 2 words to occur. Cache references occur at a
rate of 10,000,000 per second.
10,000,000 * 10% * 2 = 2,000,000 words/second for reads
H&P 8.3(a)
In the write-through scheme, every cache write causes a memory write.
25% of all references are writes, therefore:
10,000,000 * 25% = 2,500,000 words/second for writes
Therefore, (2,000,000 + 2,500,000)/10,000,000 = 45% bandwidth
H&P 8.3(b)
In the write-back scheme, a write occurs whenever a dirty block is
replaced in the cache. Blocks are replaced in the cache on every
miss, and 30% are dirty. Therefore:
misses * 30% * 2 = number of dirty writes
(10,000,000 * 10%) * 30% * 2 = 600,000 words/second for dirty writes
Therefore, (2,000,000 + 600,000)/10,000,000 = 26% bandwidth
H&P 8.4
- %Hit_Data = 96%
- %Hit_Inst = 98%
- %Dirty = 50%
WriteBack
TA_Inst = (%Hit_Inst)(1) + (%Miss)(10)
TA_Inst = (0.98)(1) + (0.02)(10)
TA_Inst = 1.18
TA_Data = (%Hit_Data)(1) + (%Miss)(10) + (%Miss)(%Dirty)(10)
TA_Data = (0.96)(1) + (0.04)(10) + (0.04)(0.50)(10)
TA_Data = 1.56
Stores take one extra cycle in WriteBack:
Stalls/Inst = (TA_Inst - 1) + (TA_Data - 1)(%Load) + (TA_Data - 1 + 1)(%Store)
Stalls/Inst = (1.18 - 1) + (1.56 - 1)(0.18) + (1.56)(0.08)
Stalls/Inst = 0.4056
CPI = IdealCPI + Stalls/Inst
CPI = 1 + 0.4056
CPI = 1.406
WriteThrough
TA_Inst = same as in WriteBack
TA_Inst = 1.18
TA_DataRead = (%Hit_Data)(1) + (%Miss)(10)
TA_DataRead = (0.96)(1) + (0.04)(10)
TA_DataRead = 1.36
TA_DataWrite = 1
Stalls/Inst = (TA_Inst - 1) + (TA_DataRead - 1)(%Load) + (TA_DataWrite - 1)(%Store)
Stalls/Inst = (1.18 - 1) + (1.36 - 1)(0.18) + (0)(0.08)
Stalls/Inst = 0.245
CPI = IdealCPI + Stalls/Inst
CPI = 1 + 0.245
CPI = 1.245
Compare CPIs
CPI_WriteThrough = 1.24
CPI_WriteBack = 1.41
CPI_WriteThrough is n% faster than CPI_WriteBack:
(CPI_WriteBack - CPI_WriteThrough) / CPI_WriteThrough
(1.41 - 1.24)/1.24 = 14%
WriteThrough is 14% faster than WriteBack:
H&P 8.5
4-way-set-associative unified cache of 64KB
- The unified cache incurs a 1 cycle penalty for each load and
store instruction.
- From figure 8.12, it has a miss rate of 2.8%
- From appendix C, 18% of instructions are data reads and 8% are
data writes. Of course, 100% of instructions are instruction reads.
- The miss penalty is 12 cycles
a.
The access times are 1 cycle for a hit and 12 cycles
for a miss. Therefore, the average access time is:
(100% - 2.8%) * 1 + 2.8% * 12 = 1.308 cycles
b.
The CPI is the base CPI (1.5) plus the unified cache
penalty (1 cycle for loads and stores) plus the read-miss penalty. That is
as follows:
1.5 + ((18% + 8%) * 1) + ((18% + 100%) * (1.308 - 1)) = 2.12
Two 2-way-set-associative caches of 32KB each
- From figure 8.12, each cache has a miss rate of 4.1%
- Again, 18% of instructions are data reads and 100% are
instruction reads.
a.
The access times are 1 cycle for a hit and 12
cycles for a miss. Therefore, the average access
time is:
(100% - 4.1%) * 1 + 4.1% * 12 = 1.451 cycles
b.
The CPI is the base CPI (1.5) plus the instruction
read penalty plus the data read-miss penalty. That is as
follows:
1.5 + (100% * (1.451 - 1)) + (18% * (1.451 - 1)) = 2.03
Direct mapped unified cache of 128KB
- From figure 8.12, a unified cache of size 128KB has about the
same miss rate as a 2-way cache of size 64KB which is 3%.
- The clock is 10% faster. (This is only relevant for comparing
the results in part c)
- The miss penalty is 13 cycles.
a.
The access times are 1 cycle for a hit and 13 cycles
for a miss. Therefore, the average access time is:
(100% - 3%) * 1 + 3% * 13 = 1.36 cycles
b.
The CPI is the base CPI (1.5) plus the unified cache penalty (1 for
both loads and stores) plus the read miss
penalty. That is as follows:
1.5 + ((18% + 8%)*(1)) + ((18% + 100%)*(1.36 - 1)) = 2.18
c.
In order to compare the CPI of the 128k direct mapped cache
to the others, its faster
clock rate must be taken into account. n-cycles on this machine take
90% of the time that they would on either of the other two machines,
so we'll compare 90% of it's CPI (2.18 * 90% = 1.96)
to the CPI of the other machines.
With a normalized CPI of 1.96, the 128KB direct mapped, unified cache gives the
best performance.