CPSC 418, Practice Problem Set #3, Do Not Turn In -- This is just for practice. This is a practice problem set, since we didn't have any homework in the second half of the course. It's just for your practice; I'm making a solution set available simultaneously. Obviously, this is not a comprehensive set of study questions, but it should give you a feel for the level of detail I expect you to know, and the sorts of things I expect you to be able to do. Problem 1: ISA (I'm told that this is actually a true story, although I don't have documentation, so don't quote me on this. I'm also making up some missing details to make this problem work.) The Intel Pentium, like most microprocessors, has certain opcode bit patterns that do not designate any instruction. These are labeled "reserved" in the processor documentation. If a program tries to execute such an "instruction" (for example, if you accidently jumped into data), the processor will take an invalid instruction exception, which hands control to the operating system. Microsoft allegedly used some of these illegal instructions as a convenient way to invoke OS system calls -- the application code includes an illegal instruction, which traps to the operating system, which can then examine the instruction and invoke the appropriate system call. (a) Is what Microsoft allegedly did OK or a violation of the ISA? Why? I'd argue that Microsoft broke the ISA, since the bit patterns in question were explicitly labeled reserved. This means that the ISA is saying that programmers are not supposed to use those instructions. Now, suppose Intel had simply labeled all of the unused bit patterns as "illegal instructions", which generate an illegal instruction exception. In that case, one *could* argue that Microsoft did not violate the ISA, since the specification said that those instructions are illegal, and Microsoft is taking advantage of the specified behavior. In general, using illegal instructions is not recommended practice. (b) Suppose Intel wanted to add MMX instructions to the Pentium line. Is this a change to the ISA? Why? Yes. Obviously, adding new instructions is a change to the ISA, since they change the programmer's view of the processor, and they change the contract between the processor manufacturer and the programmer (i.e., Intel promises that MMX processors will implement MMX instructions). (c) Given that Intel reserved those illegal opcodes for future use, it would seem reasonable to add the MMX instructions at the same opcodes that Microsoft is already using for system calls. Would this be a good idea? Nope. Even though I argue that Microsoft is in the wrong, compatibility requires that you compensate for the stupidity of others, if the other person is dominant in the marketplace. Problem 2: CISC vs. RISC As senior architect at Foobar Computer Corp., you've been charged with designing the new FA-64 (Foobar Architecture 64 bit) ISA, which will be used by the next generation of Foobar's computers. You've chosen a RISC ISA in general, but with a few CISCy features thrown in for backward compatibility (these are implemented with a small microcode interpreter as part of the instruction decode, similar to how the high performance x86 processors work). However, given that microprocessor vendors compete on the basis of SPEC benchmark results, one of your designers recommends adding "SPECinstructions" -- a single instruction for each SPEC 95 benchmark that implements the entire benchmark program in microcode. (a) Is adding SPECinstructions part of the ISA? Yup. Again, adding new instructions is a change to the ISA. (This is generally considered a minor change, since adding new instructions shouldn't break previously written code -- unless the legacy code used illegal instructions in illegal ways...) (b) Will this boost the SPECmark rating of your processor? Probably. At the microcode level, you have much better access to the full resources of the hardware. Furthermore, you could optimize the hardwired SPEC programs very very aggressively. Finally, you could avoid any I-cache misses, since the whole program would be in ROM on chip. (c) How is this likely to affect the performance the typical user sees? It certainly won't help! The typical user isn't running SPEC benchmarks all day. This is an extreme case of making an uncommon case fast. Furthermore, the chip area that we use for the microcode ROM could have been spent on larger caches or more functional units -- things that would have really helped the typical user. And having all that extra hardware around is likely to slow down the clock speed somewhat. (d) If your compensation package includes an enormous bonus (large enough to retire on) for maximizing SPECmarks, are the SPECinstructions a good idea? Yes, it'd be a great idea. You'd get your big bonus (for a processor that runs SPEC really fast, and nothing else), and then be unemployable thereafter. (e) If you add SPECinstructions now, how will future designers of FA-64 processors view you? They'd hate you. Remember, an typical lasts through several processor designs. Once you add something to the ISA, future designers must support it. Once SPEC changes to a new benchmark suite, as they have in the past, the old SPECinstructions are completely useless baggage. Problem 3: Predication Assume floating-point registers f0..f31, integer registers r0..r31, and predicate registers p0..p31. fsub, fdiv, fadd are floating point instructions. You have predicate assignment statements like: peq p1, r1, r2 -- set p1 to true iff r1==r2 pne p2, r1, r2 -- set p2 to true iff r1!=r2 pge p2, r1, r2 -- set p2 to true iff r1>=r2 etc. All instructions can be predicated, e.g. fadd f1,f2,f3 (p1) -- if p1 then f1=f2+f3 fadd f1,f2,f3 (!p1) -- if !p1 then f1=f2+f3 Consider this code fragment: loop: L1: andi r2, r1, #3 -- r2 = r1^0x11 (r2 = r1 mod 4) L2: beq r2, #3, minus -- if r2==3 goto minus: L3: fdiv f4, f1, f3 -- f4 = 1.0/f3 L4: fadd f0, f0, f4 -- f0 += f4 L5 br next -- goto next: minus: L6: fdiv f4, f1, f3 -- f4 = 1.0/f3 L7: fsub f0, f0, f4 -- f0 -= f4 next: L8: fsub f3, f3, f2 -- f3 -= 2.0; L9: subi r1, r1, #2 -- r1 -= 2; L10: bge r1, #0, loop -- if r1>0 goto loop: Convert the above code to use predication. loop: L1: andi r2, r1, #3 -- r2 = r1^0x11 (r2 = r1 mod 4) L2: peq p0, r2, #3 -- if r2==3 goto minus: L3: fdiv f4, f1, f3 (!p0) -- f4 = 1.0/f3 L4: fadd f0, f0, f4 (!p0) -- f0 += f4 minus: L6: fdiv f4, f1, f3 (p0) -- f4 = 1.0/f3 L7: fsub f0, f0, f4 (p0) -- f0 -= f4 next: L8: fsub f3, f3, f2 -- f3 -= 2.0; L9: subi r1, r1, #2 -- r1 -= 2; L10: bge r1, #0, loop -- if r1>0 goto loop: You may want to think about how much superscalarity you need in order for the predicated code to run faster than the unpredicated one. Problem 4: Number Representation In IEEE double-precision floating point, the exponent is 11 bits, with a bias of 1023. The fraction is normalized to be between 1 and 2, with the leading 1 hidden. (a) Convert 3.4 into a double-precision floating point number. Let's see. 3.4 = 11.011001100110011001100110... So, the sign bit will be 0 (positive). The exponent will be 1. Adding the bias, we get 1024 = 10000000000 The fraction will be 1.101100110011001100... Remove the leading 1. So we get: 0 10000000000 10110011001100110011 00110011001100110011001100110011 or in hexadecimal: 0x400B3333 0x33333333 (b) The 64 bits 0xC00CAC00 0x00000000 (high order first) represent an IEEE double-precision floating point number. Convert that into decimal. Unpacking the hexadecimal number we get: 1 10000000000 11001010110000000000 00000000000000000000000000000000 So the number is negative. The exponent is 1024 - 1023 = 1. The fraction is 1.1100101011 (remember the hidden 1). So the number is 11.100101011 = 3 + 1/2 + 1/16 + 1/64 + 1/256 + 1/512 = 3.583984375