CPSC 418, Problem Set #2, Due 10am, February 24, 1998 Problem 1: Precise Exceptions (5 pts) In class, we saw the contortions required to guarantee precise exception handling in a pipelined or superscalar processor. Consider the following simple scheme to handle exceptions. Whenever an exception occurs, stop fetching any more instructions. Then, let all instructions that are already in the processor finish executing. Now, every instruction that has been fetched has completed, and every instruction that hasn't been fetched hasn't been completed, so we can just set the saved PC to point to the last instruction fetched. Does this method guarantee precise exceptions? Why or why not? Problem 2: Branch Prediction Assume floating-point registers f0..f31, integer registers r0..r31. fsub, fdiv, fadd are single-precision floating point instructions. Initially, assume the registers have the following values: f0 = 0.0 f1 = 1.0 f2 = 2.0 f3 = 999999.0 r1 = 999999 Consider this code fragment: loop: L1: andi r2, r1, #3 -- r2 = r1^0x11 (r2 = r1 mod 4) L2: beq r2, #3, minus -- if r2==3 goto minus: L3: fdiv f4, f1, f3 -- f4 = 1.0/f3 L4: fadd f0, f0, f4 -- f0 += f4 L5 br next -- goto next: minus: L6: fdiv f4, f1, f3 -- f4 = 1.0/f3 L7: fsub f0, f0, f4 -- f0 -= f4 next: L8: fsub f3, f3, f2 -- f3 -= 2.0; L9: subi r1, r1, #2 -- r1 -= 2; L10: bge r1, #0, loop -- if r1>0 goto loop: Note that there are three branches: L2, L5, and L10 (a) How many times is each branch executed? (5 pts) (b) If we use BTFN, what percentage do we predict correctly. (Include the correct "predictions" of L5.) (5 pts) (c) If we use 2-bit saturating counters for each branch, initialized to 10 (weakly predict taken) for backward branches and 01 (weakly predict not taken) for forward branches, what will be the counter values for each branch after the loop terminates? (5 pts) (d) What is the prediction rate for part (c)? (5 pts) (e) Suppose we use a 2-level branch predictor, with a separate 3-bit history register for each branch (initialized to 000), and a single, global pattern table for all branches (with all entries initialized to 10 weakly predict taken). Show the history registers and pattern table after one pass through the loop (including executing L10). (10 pts) (f) Show the history registers and pattern table after another pass through the loop. (10 pts) (g) In the preceding two iterations, the two-level scheme correctly predicted 3 out of 5 branches -- not very good. As the loop continues, will the two-level scheme perform well or poorly? Justify your answer. (10 pts) Problem 3: Non-Blocking and Victim Caches For this problem, we will combine the banked, non-blocking caches from the Sohi and Franklin paper, with the victim cache idea presented in the Jouppi and Wilton paper. Assume a machine with 32-bit addresses, byte addressable. We have a 16KByte L1 data cache, divided into 4 banks of 4KBytes each. The L1 cache is direct-mapped. The L1 is also non-blocking, with 1 MSHR per bank. Backing up the L1 is an 8 entry, fully-associative victim cache, with a least-recently used eviction policy. Cache lines are 16 bytes each in both the L1 and L2 caches. Assume that the caches are initialized to garbage values, so all accesses will miss until we put something there. (a) How many bits of the 32-bit address are devoted to tag, cache line index, bank select, and byte offset, respectively, when looking up an address in the L1. For the remainder of this problem, assume the address fields are in that order (i.e., tag bits are high order, byte offset is lowest order). (5 pts) (b) The processor requests a read at memory address 0x1000. (Address is written in hexadecimal. Ask me if you don't know what this means.) This will miss in both the L1 and the victim cache, so the result has to be read from memory. Indicate which line of which bank in the L1 is evicted. Show what information the MSHR for that bank contains. What are the tag bits for that line? (5 pts) (c) Assume that the miss from (b) hasn't been filled from memory yet. Next, we receive a request to read location 0x1020. Which bank and cache line handle this request? Does it block? Why or why not? (5 pts) (d) Assume neither of the preceding misses have been filled from memory yet. We receive a request to read location 0x1040. Which bank and cache line handle this request? Does it block? Why or why not? (5 pts) (e) Now, assume the memory misses in parts (b), (c), and (d) have all been filled from memory, returning values B, C, and D respectively. (Assume that B, C, and D indicate the full 16 bytes of the line.) What are the contents of the caches and MSHRs? (5 pts) (f) Next, a read to location 0x11000 arrives. What are the contents of the caches and MSHRs? (Remember that since the victim cache is fully associative, the tag in the victim cache is the full 32 bits.) (5 pts) (g) Assume the read in (f) completes, filling in data value F into the appropriate line of the appropriate bank. Next comes a request for address 0x21000. After this request is filled from memory (with value G), what are the contents of the caches and MSHRs? (5 pts) (h) Now, another request for location 0x1000 comes in. Show the contents of the caches and MSHRs after servicing this request. (10 pts)