CPSC 418 Home Page
CPSC 418: Midterm Solution

112 possible marks                                      Marked out of 100

---------------------------------------------------------------------------
1) Factors Affecting Performance (18 pts)

4 pts for each of the 3 terms in part (a)
2 pts for each of the 3 terms in part (b)

----------------------------------------
a) ISA effects
(machine instr)/(program):
	The more tasks an instruction can perform, the fewer
instructions are needed in a program.  For example: a memory-memory
machine like the VAX can use a single instruction to add the values
stored in two memory locations, and place the result in a third.  A
load-store ISA such as the MIPS would require 4 instructions to
perform a similar feat (2 loads, an add, and a store).

(cycles)/(machine instr)
	The more inherently sequential tasks an instruction can
perform, the more cycles that instruction will take to complete.  For
example, the memory-memory VAX instruction listed above requires two
accesses to memory, an addition, and another access to memory to be
completed sequentially (even if the address calculations are carried
out in parallel and behind the scenes).  It will therefore take at
least 4 cycles to complete.  If average instructions of the ISA
complete many tasks, and thus take many cycles, the average CPI of the
ISA will increase.

(time)/(cycle)
	This term is still affected by ISA choices, although less so
than the previous two.  The maximum time to execute a subtask of an
instruction determines the cycle length.  For example, in many older
processors, the instruction decoding time was a determining factor in
cycle time: CISC instructions took a long time to decode, decoding
could not be spread over multiple cycles, and so it fixed the cycle
time.
	Another ISA effect that could influence cycle length is the
inclusion or exclusion of instructions requiring length execution.
For example, long shifts are very slow.  Some instruction sets chose
to limit the length of shifts to reduce cycle time (a long shift was
the slowest operation in the ALU).  Some early RISC machines chose to
implement multiply or divide as multiple instructions as well, to
reduce the maximum ALU time.

----------------------------------------
b) Most important other effects

(machine instr)/(program)
	a good compiler may be able to reduce the size of a program
(or it may choose to increase the size to reduce execution time --
loop unrolling is an example)

(cycles)/(machine instr)
	processor organization: pipelining, multiple execution units,
out-of-order execution, etc.

(time)/(cycle)
	process technology: smaller feature size gives faster machine,
different process technologies (CMOS, BiCMOS, GaAs, etc.) run faster
or slower.

---------------------------------------------------------------------------
2) Reducing the Instruction Set (24 pts -- 8 each)
	I will use a MIPS R2000 style processor as my example RISC (ie
example pipeline, ISA, etc.).

first two reasons up to 3 marks each
third reason up to 2 marks

----------------------------------------
a) Autoincrement addressing mode

- complicates interrupt handling if the address register is
incremented before all interrupts have been detected -- the
instruction may cause an exception after the processor state is
modified.

- relatively easy to synthesize out of two simple instructions: a load
using the same address register, followed by an appropriate addition
on that address register.

- possible structural hazard writing both the loaded value and the
incremented value back to the register file.

- possible WAW and WAR hazards if the address register is written too
early in the pipeline (RISC pipelines like the MIPS read early and
write late to avoid these hazards).

- possible structural hazard calculating the load address and
incrementing the register in the ALU if autoincrement with
displacement is allowed.

----------------------------------------
b) Absolute addressing for branch targets

- very difficult to encode in a fixed instruction length: an absolute
address which includes the entire address space will be 32 to 64 bits
long, which is as long or longer than a RISC instruction (leaving no
room for the opcode).

- easy to synthesize out of three simple instructions: a load high to
load the 16 high bits of the address into a temporary register, an
addi to load the 16 low bits of the address into the same temporary
register, and a jr to jump to the address pointed to by the register.

- very infrequently needed: most branches use PC relative to address
their target, so that the code can be easily relocated in memory.

----------------------------------------
c) String Move

- interrupt very difficult to handle during the move: what if the
source or target string overlap a page boundary?  How can the current
(partially finished) state of the instruction be saved?

- easily synthesized with a short loop (load byte, store byte,
increment pointers, branch if not done).  The branch is not crippling
because it can be accurately predicted or the loop can be unrolled.

- very difficult to pipeline with other instructions, since each byte
moved will require several passes through address calculation and
memory access stages (structural hazards).  The string move would have
to grab the execution pipeline and use it over and over.

- data hazards may develop if the source and destination strings
overlap in memory.

---------------------------------------------------------------------------
3) RISC and CISC.  What lousy names (12 pts)

8 for the differences (2 each)
4 for the PPC explanation (2 for each reason which supports your argument)


RISC features:
- fixed length instructions with simple formats
- load-store architecture: ALU operates only on registers
- simple addressing modes to memory
- lots of registers
RISC optional features:
- no instruction side-effects: an instruction affects at most 1 register
- aligned accesses
- low latency branches

CISC features:
- variable length instructions, with many complex formats
- memory-memory architecture: ALU operates or registers or memory
- lots of addressing modes
- instruction side-effects: autoincrement addressing, condition codes, etc.

The key objective of RISC machines: easy pipelining
The key objective of CISC machines: minimize instructions

I would support my statement: the PPC has all of the RISC features
(although none of the optional ones).  In particular, it has fixed
length instructions and all ALU operands must be in registers, which
makes it decidedly different than the VAX.

---------------------------------------------------------------------------
4) Faster Processors (25 pts -- 5 each)

2 for the problem, 3 for the reason.

----------------------------------------
a) ROB: Reorder Buffer

- allows instructions to be retired (comitted) in-order despite
out-of-order execution, which in turn allows precise interrupts even
if instructions complete out-of-order.  When an instruction dispatches
(in-order) it reserves the next slot in the ROB.  When an instruction
completes execution, its result is written to the ROB, not the
register file.  Only when all preceeding instructions are finished and
written to the register file, may an instruction write its own result.
When an interrupt occurs, the register file is in a state consistent
with sequential execution up until (but not including) the instruction
at the head of the ROB.

- avoids WAW and WAR hazards.  If instructions can pull their source
operands from the ROB instead of the register file, and results are
written to the ROB first, then instructions issuing out-of-order will
be able to get/put the appropriate value from/to the ROB or register
file without worrying about instructions which should have executed
previously or subsequently.

----------------------------------------
b) multiple integer ALUs

- avoids structural hazards caused by several different instructions,
possibly in different pipeline stages, all needing integer arithmetic
at once.  In the MIPS R2000, for example, three separate ALUs allow a
branch in ID to both determine its condition and its target at the
same time another instruction is performing a general ALU operation in
the ALU stage.  More advanced processors may have many instructions in
execution at once, each needing an ALU op.

----------------------------------------
c) Tomasulo's algorithm and its derivatives: reservation stations

- allows out-of-order instruction issue.  Multiple instructions, all
unable to issue due to RAW hazards, may wait in reservation stations
after dispatch but before issue.  All the stations watch the results
coming from the execution units; any instructions needing a result
will load the result into a field of the reservation station.  Any
instruction in a reservation station with all its operands may issue,
even it that issue will take it out of program order.

----------------------------------------
d) separate ALU and MEM pipeline stages

- allows displacement addressing.  If loads/stores do not have access
to an ALU before making a memory request, the address must be an
absolute address present in the instruction or an address pointed to
directly by a register, since it is impossible to add an offset to the
register's value without an ALU.

----------------------------------------
e) explicitly set instead of implicitly set condition codes

- allows out-of-order execution without worrying about incorrect
branching.  When instructions set various condition codes as a side
effect, and those condition codes are used to determine branch
conditions, then the processor must ensure that the condiiion codes
read by the branch are the codes that would have been set under
sequential execution.  With explicitly set condition codes it is very
easy for the compiler to inform the processor which instruction sets
the condition codes for a particular branch -- the nearest preceeding
instruction which explicitly sets the condition codes.  With
implicitly set condition codes, the processor may be unable to
determine which preceeding instruction set the condition code, and may
have to resort to sequential execution.

---------------------------------------------------------------------------
5) Instruction Level Parallelism (15 pts -- 5 each)

----------------------------------------
a) An ISA in which the execution of each instruction is assumed to be
completed before the next is started.  Sequential execution ISAs give
the illusion that no parallel execution is occurring.

----------------------------------------
b) They both assume that several instructions may be executed in
parallel, and require the programmer or compiler to generate code
which is correct despite parallel execution.  The VLIW architecture
requires an instruction to specify which other instruction(s) may be
executed in parallel with it.  The dataflow architecture requires an
instruction to specify which other instruction(s) must execute before
it.

----------------------------------------

c) The MIPS R2000 fails to support the sequential execution
architecture assumption because it uses delayed branches -- the
instruction directly after a branch will be executed regardless of the
outcome of the branch.
	A delayed branch can be a problem if an exception occurs
during the execution of the instruction in the delay slot.  When an
exception occurs, normally the excepting instruction's PC is saved, so
that execution can resume at that location after handling the
exception.  Upon resumption, instructions are assumed to follow
sequentially from the excepting instruction.  If the excepting
instruction was in the delay slot of a taken branch, however, then the
correct instruction to place directly after the excepting instruction
will not be the next sequential instruction, but rather the target of
the branch.
	The MIPS R2000 solves this problem by saving the PCs of the
next two instructions when an exception occurs.

---------------------------------------------------------------------------
6) All these funky units... (18 pts)

lose 2 for an incorrect or missed stage, lose 1 if why is incorrect

unit  stage(s)	why

 a	1	to fetch the instructions from the physically addressed cache
		  based on the virtually addressed PC

 b	2	for register renaming to deal with WAW and WAR hazards
	5	for keeping the appropriate mapping between logical and
		  physical registers (based on which instructions have been
		  retired) in case of an interrupt

 c	3	to keep track of which instructions are still waiting for
		  operands and may not issue yet

 d	4	to execute the various possible instructions

 e	2	to provide sources for instructions
	5	to provide destinations for instructions

 f	4	to provide the memory port for executing loads and stores
	5	returning the result for loads and determining whether stores
		  should proceed

 g	1	to provide for icache misses
	4	to provide for dcache misses

---------------------------------------------------------------------------
CPSC 418 Home Page
Last modified: Apr 97