CPSC 418, Problem Set #3, Optional

Turn this in by 10am on April 8, 1999 if you want it marked.
If marked, it will be averaged in with your other homeworks.


Problem 1:  ISA

We've learned in this course the goal of defining an ISA:  it provides
a clean interface for what the programmer or compiler can assume about
the processor, thereby allowing changes to software and hardware
implementation without breaking software compatibility.  Unfortunately,
it's not always clear exactly what's part of the ISA and what's part
of the implementation.  For example, AMD's K6 processor line is supposed
to be completely compatible with Intel's processors.  When the K6 first
came out, there was a problem with a Windows device driver that required
a patch to fix.  It turns out that this device driver was using a busy
loop in software to generate a fixed time delay.  In the K6, the loop
instruction had been optimized, so this delay loop ran too fast on the K6.
(a)  Was the way Microsoft wrote the delay loop good programming practice?
	Were they obeying the ISA, or were they relying on implementation
	details?  Justify your answer.
(b)  Was it OK for AMD to change the timing of the loop instruction?
	Was this a violation of the ISA?  Justify your answer.
(c)  Recently, AMD has introduced new 3D-Now instructions to perform faster
	single precision floating point.  Similarly, Intel has introduced
	Streaming SIMD Extensions, which are new numeric instructions
	for graphics.  Are these changes to the ISA?  If so, how can one
	write code that will run on both platforms?


Problem 2:  CISC vs. RISC

As senior architect at Foobar Computer Corp., you've been charged with
updating the Foobium II processor by adding Intel's new Streaming SIMD
Extension instructions.  Your design teams have come up with two
options.  Option 1 is that they can add functional units for the new
instructions.  This will allow the new instructions to execute in 1
CPI.  Unfortunately, the added complexity reduces the clock frequency
by ten percent.  Option 2 is handle the new instructions in additional
microcode, just as the complex x86 instructions are already handled.
This will allow your new processor to run at full clock speed, but the
average CPI for the new instructions is now 10.  The CPI of the old
instructions is unchanged.
(a)  If in the future, you expect 5 percent of all instructions executed
	to be the new Streaming SIMD Extension instructions, which option
	gives you the faster processor?
(b)  What percentage of code being new instructions is the crossover point:
	the point where Options 1 and 2 have the same average performance?
(c)  The product manager (the marketing leader of the team) is worried
	the project is going over budget.  He has just read an article
	about picoJava (the same one you read), where they described
	implementing really complex instructions as traps to the OS software.
	He recommends scrapping the project and simply selling Foobium II
	processors as new "Foobium III with Streaming SIMD Extension"
	processors, and handling the new instructions in software.
	There are obvious performance problems with this approach, but
	the product manager argues that no one is using the new instructions
	yet, so no one will notice the performance problems for a while.
	Would the "new" processor (relabeled Foobium II) implement the
	new Pentium III ISA (with Streaming SIMD Extensions)?  Assuming
	appropriate software to handle the new instructions, would your
	"new" processor be able to execute applications programs that
	used the new instructions?  Would the "new" processor be compatible
	at the system software level?  Explain your answers.


Problem 3:  Predication

Assume integer registers r0..r31, and predicate registers p0..p31.
You have predicate assignment statements like:
	peq	p1, r1, r2	-- set p1 to true iff r1==r2
	pne	p2, r1, r2	-- set p2 to true iff r1!=r2
	pge	p2, r1, r2	-- set p2 to true iff r1>=r2
	etc.
You also have predicate logical operations like:
	pand	p1, p2, p3	-- set p1 to be p2 AND p3
	por	p1, p2, p3	-- set p1 to be p2 OR p3
	pnot	p1, p2		-- set p1 to be NOT p2
All instructions can be predicated, e.g.
	fadd	f1,f2,f3 (p1)	-- if p1 then f1=f2+f3

Consider this code fragment:
	L1:	bge	r1, 1900, done:	-- if r1>=1900 goto done:
	L2:	blt	r1, 30, y2k:	-- if r1<=30 goto y2k:
	L3:	addi	r1, r1, 1900	-- r1+=1900
	L4:	br	done		-- goto done:
	y2k:
	L5:	addi	r1, r1, 2000	-- r1+=2000
	done:

Convert the above code to use predication.


Problem 4:  Power

You are team leader for the new Furby2000 toy.  One of the key design
goals is full digital audio processing and speech recognition, performed
by the single microprocessor in the toy.  The primary performance
design requirement is that the Furby2000 processor be able to process
a minute of audio input within ten seconds.  Your engineering team has
greatly exceeded the performance goal:  the prototype can process a minute
of audio input in 1 second.  Unfortunately, the prototype has a battery
life of only 15 minutes.
(a)  Your goal is to reduce processor power consumption.  The current
	prototype uses a processor running at 5 volts and 100Mhz, but
	you can scale the voltage down to 3.3 or 2.5 volts (forcing a
	slower clock frequency of 66 and 50Mhz respectively).  There is
	also a lower performance processor available (with the same ISA
	and CPI) running at 16Mhz and consuming 1/8 of the power,
	running at 5 volts.  This processor can also be scaled down to
	3.3 or 2.5 volts, with the expected slow-down in clock speed.
	For each processor and voltage combination, estimate the
	performance and power consumption relative to the original
	prototype.  You may ignore leakage current, and threshold
	voltage effects.
(b)  If half of the total system power consumption is the microprocessor,
	(The other half is mainly the satellite transponder that relays
	collected marketing data to company headquarters.) what will be
	the battery life of the improved toy, given your best choice from
	part (a)?  You may assume the battery holds a fixed total amount
	of energy, regardless of power drain (not true in reality).