CPSC 418: Solution 08

Home

CPSC 418 --- Advanced Computer Architecture --- Spring 1996
Solution 08

Problem 1 (6 Points)

Parts (a) and (b) both refer to an instruction cache with:

word addressing
64 words
4 words per block
least-recently-used replacement scheme

1(a) (3 Points)

Use a picture and (words and/or pseudo-code) to describe how the above cache with two-way set associativivity detects a cache miss and handles the subsequent fill.

Postscript source for figures

A W-word cache with w words per block (or line) has l = W div w cache lines.

A cache with L lines and A way set-associativity has S = L div A sets with A lines per set.

W words per cache 64
w words per block 4
A associativity 2
L number of cache lines W div w 64 div 4 16
S number of sets L div A 16 div 2 8
AddrSize total width of address
WIdxSize word index log2 w bits log2 4 2 bits
SIdxSize set index log2 S bits log2 8 3 bits
TagSize tag size AddrSize-(WIdxSize+IdxSize) bits AddrSize-(2+3)
With an A-way set-associative cache, a setIdx points to a set of A cache lines. To detect a hit/miss, A different tag comparisons must be made: the address tag must be compared to the tag of each of the A lines in the set.

W	words per cache	64
w	words per block	4
A	associativity	2
L	number of cache lines	W div w	64 div 4	16
S	number of sets	L div A	16 div 2	8
AddrSize	total width of address
WIdxSize	word index	log2 w bits	log2 4	2 bits
SIdxSize	set index	log2 S bits	log2 8	3 bits
TagSize	tag size	AddrSize-(WIdxSize+IdxSize) bits	AddrSize-(2+3)

If one of the tags in the set matches the address tag and the valid bit for that line is true, then there is a hit, otherwise we get a cache-miss.

When we get a cache miss, we replace one of the cache lines in the set with the cache line needed by the address. With a least-recently-used replacement scheme and 2-way set-associativity, we can determine which line is to be replaced simply by marking the opposite line as least-recently-used (the u bit in the figure) whenever a line is accessed.

With an instruction cache, we never write to a cache line, so when a cache line is replaced, we just load in the new tag and words.

1(b) (3 Points)

Use a picture and (words and/or pseudo-code) to describe how the above cache with four-way set associativivity detects a cache miss and handles the subsequent fill.

Postscript source for figures

A 4-way set-associative cache differs from a 2-way set-associative cache, in that there are 4 cache lines per set. This means that 4 tag comparisons must be made to detect cache hit/miss.

Determining which cache line in a set was least-recently used is more complicated for 4-way set-associative caches than for 2-way. The LRU bit for each line is replaced with a 2-bit counter. Rather than setting a single bit each time a cache line is accessed, we increment the counters for all of the other lines in the set when a line is accessed.

The least-recently-used line is the one with the largest value in its counter.

Problem 2 (8 Points)

Given a two-level memory system (split instruction/data L1-caches and main memory):
L1 data-cache:

95% hit rate
30% of cache accesses are writes
at any point in time, 25% of blocks in cache have been modified
8 words per block
1 cycle access time
Transfer rate between registers and L1-cache: 1 word/cycle
not-last-used replacement scheme
Main memory:

20 cycle access time
Transfer rate between L1-cache and main memory: 8 words/cycle

2(a) (4 Points)

Calculate the average data access time if the cache uses write-through with no-write-allocate on write miss.

From Handout 12:
T_Avg = (%Write)(T_Acc1) + (%Read)((%Hit1)(T_Acc1) + (%Miss1)(T_Fill1))

T_Fill1 = T_Acc1 + T_Acc2 + T_Xfr1_2
= 1 + 20 + (8 words/block)/(1/8 cycles/word)
= 22 cycles

T_Avg =
= 0.30*1 + 0.70*(0.95*1 + 0.05*22)
= 1.74 cycles

2(b) (4 Points)

Calculate the average access time if the cache uses write-back with write-allocate on write miss.

Write Back

From Handout 12:
T_Avg = (%Hit1)(T_Acc1 + (%Write)(T_Write1)) + (%Miss1)((%Dirty1)(T_WB1) + (T_Fill1))

From part (a), T_Fill1 = 22 cycles.

T_WB1 =
= T_Acc2 + T_Xfr_1_2
= 20 + 1
= 21

T_Avg =
= 0.95(1 + 0.30*1) + 0.05*(0.25*21 + 22)
= 2.6 cycles

Problem 3 (5 Points)

3(a) (3 Points)

What are the tradeoffs in using virtual addresses or physical addresses for caches?

Virtual addressed cache

+

No address translation needed on cache hit

-

VirtAddr to PhysAddr mapping differs for each process.

Same VirtAddr may point to different PhysAddrs for different processes
Different VirtAddrs for different processes may point to same PhysAddrs

This requires one of:

flushing cache on context switch
including process id with tag, and keeping unique physical addresses

Physically addressed cache

-: TLB needed for virtual-to-physical address translation
+: No aliasing problems

3(b) (2 Points)

Would you use virtual or physical addresses for a cache? Justify your answer.

For data-caches, physically addressed caches are generally preferable. A relatively small size TLB (64 - 1024 entries) is sufficient for even large caches. In comparison, the aliasing problems associated with virtually addressed caches do not scale well as cache sizes increase, because flushing a cache becomes prohibitively expensive and the number of comparisons that must be made to prevent aliasing grows linearily with the cache size.

For instruction-caches, virtual addressing may be preferable. This is because aliasing not much of a problem with instruction caches.

instruction caches are never written to
a context switch almost always causes the instruction cache to be reloaded with new cache-lines. Thus invalidating the cache (flushing it) on a context switch will not cause any real problems.

Home

Suggestion Box

Last modified: 18 Mar 96