CPSC 418, Problem Set #3, Optional Turn this in by 10am on April 8, 1999 if you want it marked. If marked, it will be averaged in with your other homeworks. Problem 1: ISA We've learned in this course the goal of defining an ISA: it provides a clean interface for what the programmer or compiler can assume about the processor, thereby allowing changes to software and hardware implementation without breaking software compatibility. Unfortunately, it's not always clear exactly what's part of the ISA and what's part of the implementation. For example, AMD's K6 processor line is supposed to be completely compatible with Intel's processors. When the K6 first came out, there was a problem with a Windows device driver that required a patch to fix. It turns out that this device driver was using a busy loop in software to generate a fixed time delay. In the K6, the loop instruction had been optimized, so this delay loop ran too fast on the K6. (a) Was the way Microsoft wrote the delay loop good programming practice? Were they obeying the ISA, or were they relying on implementation details? Justify your answer. (b) Was it OK for AMD to change the timing of the loop instruction? Was this a violation of the ISA? Justify your answer. (c) Recently, AMD has introduced new 3D-Now instructions to perform faster single precision floating point. Similarly, Intel has introduced Streaming SIMD Extensions, which are new numeric instructions for graphics. Are these changes to the ISA? If so, how can one write code that will run on both platforms? Problem 2: CISC vs. RISC As senior architect at Foobar Computer Corp., you've been charged with updating the Foobium II processor by adding Intel's new Streaming SIMD Extension instructions. Your design teams have come up with two options. Option 1 is that they can add functional units for the new instructions. This will allow the new instructions to execute in 1 CPI. Unfortunately, the added complexity reduces the clock frequency by ten percent. Option 2 is handle the new instructions in additional microcode, just as the complex x86 instructions are already handled. This will allow your new processor to run at full clock speed, but the average CPI for the new instructions is now 10. The CPI of the old instructions is unchanged. (a) If in the future, you expect 5 percent of all instructions executed to be the new Streaming SIMD Extension instructions, which option gives you the faster processor? (b) What percentage of code being new instructions is the crossover point: the point where Options 1 and 2 have the same average performance? (c) The product manager (the marketing leader of the team) is worried the project is going over budget. He has just read an article about picoJava (the same one you read), where they described implementing really complex instructions as traps to the OS software. He recommends scrapping the project and simply selling Foobium II processors as new "Foobium III with Streaming SIMD Extension" processors, and handling the new instructions in software. There are obvious performance problems with this approach, but the product manager argues that no one is using the new instructions yet, so no one will notice the performance problems for a while. Would the "new" processor (relabeled Foobium II) implement the new Pentium III ISA (with Streaming SIMD Extensions)? Assuming appropriate software to handle the new instructions, would your "new" processor be able to execute applications programs that used the new instructions? Would the "new" processor be compatible at the system software level? Explain your answers. Problem 3: Predication Assume integer registers r0..r31, and predicate registers p0..p31. You have predicate assignment statements like: peq p1, r1, r2 -- set p1 to true iff r1==r2 pne p2, r1, r2 -- set p2 to true iff r1!=r2 pge p2, r1, r2 -- set p2 to true iff r1>=r2 etc. You also have predicate logical operations like: pand p1, p2, p3 -- set p1 to be p2 AND p3 por p1, p2, p3 -- set p1 to be p2 OR p3 pnot p1, p2 -- set p1 to be NOT p2 All instructions can be predicated, e.g. fadd f1,f2,f3 (p1) -- if p1 then f1=f2+f3 Consider this code fragment: L1: bge r1, 1900, done: -- if r1>=1900 goto done: L2: blt r1, 30, y2k: -- if r1<=30 goto y2k: L3: addi r1, r1, 1900 -- r1+=1900 L4: br done -- goto done: y2k: L5: addi r1, r1, 2000 -- r1+=2000 done: Convert the above code to use predication. Problem 4: Power You are team leader for the new Furby2000 toy. One of the key design goals is full digital audio processing and speech recognition, performed by the single microprocessor in the toy. The primary performance design requirement is that the Furby2000 processor be able to process a minute of audio input within ten seconds. Your engineering team has greatly exceeded the performance goal: the prototype can process a minute of audio input in 1 second. Unfortunately, the prototype has a battery life of only 15 minutes. (a) Your goal is to reduce processor power consumption. The current prototype uses a processor running at 5 volts and 100Mhz, but you can scale the voltage down to 3.3 or 2.5 volts (forcing a slower clock frequency of 66 and 50Mhz respectively). There is also a lower performance processor available (with the same ISA and CPI) running at 16Mhz and consuming 1/8 of the power, running at 5 volts. This processor can also be scaled down to 3.3 or 2.5 volts, with the expected slow-down in clock speed. For each processor and voltage combination, estimate the performance and power consumption relative to the original prototype. You may ignore leakage current, and threshold voltage effects. (b) If half of the total system power consumption is the microprocessor, (The other half is mainly the satellite transponder that relays collected marketing data to company headquarters.) what will be the battery life of the improved toy, given your best choice from part (a)? You may assume the battery holds a fixed total amount of energy, regardless of power drain (not true in reality).