CPSC 418, Practice Problem Set #3, Do Not Turn In -- This is just for practice.

This is a practice problem set, since we didn't have any homework in the
second half of the course.  It's just for your practice; I'm making a solution
set available simultaneously.  Obviously, this is not a comprehensive set
of study questions, but it should give you a feel for the level of detail
I expect you to know, and the sorts of things I expect you to be able to do.

Problem 1:  ISA

(I'm told that this is actually a true story, although I don't have
documentation, so don't quote me on this.  I'm also making up some missing
details to make this problem work.)  The Intel Pentium, like most
microprocessors, has certain opcode bit patterns that do not designate any
instruction.  These are labeled "reserved" in the processor documentation.
If a program tries to execute such an "instruction" (for example, if you
accidently jumped into data), the processor will take an invalid instruction
exception, which hands control to the operating system.  Microsoft
allegedly used some of these illegal instructions as a convenient way
to invoke OS system calls -- the application code includes an illegal
instruction, which traps to the operating system, which can then examine
the instruction and invoke the appropriate system call.

(a)  Is what Microsoft allegedly did OK or a violation of the ISA?  Why?

I'd argue that Microsoft broke the ISA, since the bit patterns in
question were explicitly labeled reserved.  This means that the ISA is
saying that programmers are not supposed to use those instructions.

Now, suppose Intel had simply labeled all of the unused bit patterns
as "illegal instructions", which generate an illegal instruction exception.
In that case, one *could* argue that Microsoft did not violate the ISA,
since the specification said that those instructions are illegal, and
Microsoft is taking advantage of the specified behavior.

In general, using illegal instructions is not recommended practice.

(b)  Suppose Intel wanted to add MMX instructions to the Pentium line.
	Is this a change to the ISA?  Why?

Yes.  Obviously, adding new instructions is a change to the ISA, since
they change the programmer's view of the processor, and they change the
contract between the processor manufacturer and the programmer (i.e.,
Intel promises that MMX processors will implement MMX instructions).

(c)  Given that Intel reserved those illegal opcodes for future use,
	it would seem reasonable to add the MMX instructions at the
	same opcodes that Microsoft is already using for system calls.
	Would this be a good idea?

Nope.  Even though I argue that Microsoft is in the wrong, compatibility
requires that you compensate for the stupidity of others, if the other
person is dominant in the marketplace.

Problem 2:  CISC vs. RISC

As senior architect at Foobar Computer Corp., you've been charged with
designing the new FA-64 (Foobar Architecture 64 bit) ISA, which will be
used by the next generation of Foobar's computers.  You've chosen a RISC
ISA in general, but with a few CISCy features thrown in for backward
compatibility (these are implemented with a small microcode interpreter
as part of the instruction decode, similar to how the high performance
x86 processors work).  However, given that microprocessor vendors compete
on the basis of SPEC benchmark results, one of your designers recommends
adding "SPECinstructions" -- a single instruction for each SPEC 95
benchmark that implements the entire benchmark program in microcode.
(a)  Is adding SPECinstructions part of the ISA?

Yup.  Again, adding new instructions is a change to the ISA.  (This
is generally considered a minor change, since adding new instructions
shouldn't break previously written code -- unless the legacy code
used illegal instructions in illegal ways...)

(b)  Will this boost the SPECmark rating of your processor?

Probably.  At the microcode level, you have much better access to the
full resources of the hardware.  Furthermore, you could optimize the
hardwired SPEC programs very very aggressively.  Finally, you could
avoid any I-cache misses, since the whole program would be in ROM on chip.

(c)  How is this likely to affect the performance the typical user sees?

It certainly won't help!  The typical user isn't running SPEC benchmarks
all day.  This is an extreme case of making an uncommon case fast.
Furthermore, the chip area that we use for the microcode ROM could have
been spent on larger caches or more functional units -- things that would
have really helped the typical user.  And having all that extra hardware
around is likely to slow down the clock speed somewhat.

(d)  If your compensation package includes an enormous bonus (large enough
	to retire on) for maximizing SPECmarks, are the SPECinstructions
	a good idea?

Yes, it'd be a great idea.  You'd get your big bonus (for a processor
that runs SPEC really fast, and nothing else), and then be unemployable
thereafter.

(e)  If you add SPECinstructions now, how will future designers of FA-64
	processors view you?

They'd hate you.  Remember, an typical lasts through several processor
designs.  Once you add something to the ISA, future designers must support
it.  Once SPEC changes to a new benchmark suite, as they have in the past,
the old SPECinstructions are completely useless baggage.

Problem 3:  Predication

Assume floating-point registers f0..f31, integer registers r0..r31,
and predicate registers p0..p31.  fsub, fdiv, fadd are floating point
instructions.  You have predicate assignment statements like:
	peq	p1, r1, r2	-- set p1 to true iff r1==r2
	pne	p2, r1, r2	-- set p2 to true iff r1!=r2
	pge	p2, r1, r2	-- set p2 to true iff r1>=r2
	etc.
All instructions can be predicated, e.g.
	fadd	f1,f2,f3 (p1)	-- if p1 then f1=f2+f3
	fadd	f1,f2,f3 (!p1)	-- if !p1 then f1=f2+f3

Consider this code fragment:
	loop:
	L1:	andi	r2, r1, #3	-- r2 = r1^0x11 (r2 = r1 mod 4)
	L2:	beq	r2, #3, minus	-- if r2==3 goto minus:
	L3:	fdiv	f4, f1, f3	-- f4 = 1.0/f3
	L4:	fadd	f0, f0, f4	-- f0 += f4
	L5	br	next		-- goto next:
	minus:
	L6:	fdiv	f4, f1, f3	-- f4 = 1.0/f3
	L7:	fsub	f0, f0, f4	-- f0 -= f4
	next:
	L8:	fsub	f3, f3, f2	-- f3 -= 2.0;
	L9:	subi	r1, r1, #2	-- r1 -= 2;
	L10:	bge	r1, #0, loop	-- if r1>0 goto loop:

Convert the above code to use predication.
	loop:
	L1:	andi	r2, r1, #3	-- r2 = r1^0x11 (r2 = r1 mod 4)
	L2:	peq	p0, r2, #3	-- if r2==3 goto minus:
	L3:	fdiv	f4, f1, f3 (!p0)	-- f4 = 1.0/f3
	L4:	fadd	f0, f0, f4 (!p0)	-- f0 += f4
	minus:
	L6:	fdiv	f4, f1, f3 (p0)		-- f4 = 1.0/f3
	L7:	fsub	f0, f0, f4 (p0)		-- f0 -= f4
	next:
	L8:	fsub	f3, f3, f2	-- f3 -= 2.0;
	L9:	subi	r1, r1, #2	-- r1 -= 2;
	L10:	bge	r1, #0, loop	-- if r1>0 goto loop:

You may want to think about how much superscalarity you need in order for
the predicated code to run faster than the unpredicated one.

Problem 4:  Number Representation

In IEEE double-precision floating point, the exponent is 11 bits, with
a bias of 1023.  The fraction is normalized to be between 1 and 2, with
the leading 1 hidden.
(a)  Convert 3.4 into a double-precision floating point number.

Let's see.  3.4 = 11.011001100110011001100110...
So, the sign bit will be 0 (positive).
The exponent will be 1.  Adding the bias, we get 1024 = 10000000000
The fraction will be 1.101100110011001100...  Remove the leading 1.
So we get:
	0 10000000000 10110011001100110011 00110011001100110011001100110011
or in hexadecimal:
	0x400B3333 0x33333333

(b)  The 64 bits 0xC00CAC00 0x00000000 (high order first) represent
	an IEEE double-precision floating point number.  Convert that
	into decimal.

Unpacking the hexadecimal number we get:
	1 10000000000 11001010110000000000 00000000000000000000000000000000
So the number is negative.
The exponent is 1024 - 1023 = 1.
The fraction is 1.1100101011 (remember the hidden 1).
So the number is 11.100101011 = 3 + 1/2 + 1/16 + 1/64 + 1/256 + 1/512
	= 3.583984375