CPSC 418, Problem Set #1, Due 10am, February 3, 1998 Assume floating-point registers f0..f31, integer registers r0..r31. fmul, fdiv, fadd are single-precision floating point instructions. The processor has the following functional units: 1 floating point divider 10 cycle latency 1 floating point add/multiply 3 cycle latency for add or multiply 1 integer unit 1 cycle latency 1 branch unit 1 cycle latency All units fully pipelined (i.e., you can start a new operation every cycle, but the results aren't available until the indicated latency). Initially, assume the registers have the following values: f0 = 1.0 f1 = 1.0 f2 = 0.0 r1 = 1000 Consider this code fragment: loop: L1: fmul f3, f0, f0 -- f3 = f0*f0 L2: fdiv f4, f1, f3 -- f4 = 1.0/f3 L3: fadd f2, f2, f4 -- f2 += f4 L4: fadd f0, f0, f1 -- f0 += 1.0 L5: subi r1, r1, #1 -- r1-- L6: bne r1, loop Problem 1: (25 pts) Show the execution schedule for the functional units for one iteration through the loop. How many cycles does one pass of the loop take to EXECUTE? Ignore time spent in other pipeline stages, just the time in execution, as in the Fisher and Rau paper. Problem 2: (5 pts) If the floating point adder/multiplier were not pipelined (so you could only do one operation at a time), would the code run slower? Why or why not? Problem 3: (5 pts) Unroll the loop once (so you have two copies). You do not need to show the prologue to handle when r1 is odd. Problem 4: (25 pts) Assuming pipelined functional units again, show the execution schedule for the pipelined (two copies) loop. How many cycles does it take? Problem 5: (40 pts) Now, we'll examine how a superscalar processor would execute the given code. Return to the original loop, and now assume that the functional units are unpipelined (can only do one operation at a time) with the given latencies. Using a pipeline, reservation station, and reorder buffer model as we did in class, show the state of the reservation station, register mapping table, and reorder buffer at the end of the clock cycle when L6 is fetched. (Show earlier cycles for partial credit.) Extra Credit: (1 pt) What number is the loop computing (in the limit with infinite iterations)?