CPSC 418: Homework 10

Home

Suggestion Box

CPSC 418 --- Advanced Computer Architecture --- Spring 1996
Homework 10 --- Due: 26 Mar 1996

General Information for This Homework

In this homework we will consider the following fragment of pseudo-code:

/* A: array [0 to 1023] of int (ints are 32 bits) */ for i = 0 to 1023 do A[i] = 3*A[i+1];

You will hand-compile the code for two different architectures and compare the execution times for an implementation of each architecture. The two architectures represent ``Speed Demon'' (e.g. DEC Alpha) and ``Brainiac'' (e.g. IBM Power) approaches to Risc architectures. The pipelines for the two microprocessors are shown in Figures 1 (speed demon) and 2 (brainiac).

Speed Demon Pipeline

Brainiac Pipeline

Information Common to Both Microprocessors

Pipelines

All ``normal'' arithmetic and logical instructions (e.g. ADD, SUB, XOR, MULT, etc) use the Execute stage and then go directly to WriteBack.
Divide (DIV) instructions use Execute, and then spend two cycles in the Divide stage, before going to WriteBack.
Load instructions instructions use Execute and then the Load/Store stage before going to WriteBack.
Store instructions use Execute and then the Load/Store stage. They do not use WriteBack.
Each stage can contain at most one instruction.
``BY1'' is the register for the bypass from the output of Execute to its input.
``BY2'' is the register for the bypass from the output of Load/Store to the input of Execute.
``S1'' and ``S2'' are the registers where the register file puts the values currently stored in the registers corresponding to the operands in the FetchOps stage.
There is no bypass path from the output of Divide to the input of Execute. Divide instructions must be written to the register file before they can be used.
Branch prediction occurs in the Decode stage of the pipeline. Therefore, if a branch is predicted to be taken, the instruction in the Fetch Instruction stage is nullified (killed) and the PC is updated to the branch destination. This is independent of nullifying instructions after a branch mispredict.

Memory

A level one cache hit has a latency of one cycle (instructions spend one cycle in Load/Store).
All of the instructions are in the I cache.
The array A starts at address 0x2000 and is in the level 1 cache.

Registers

For both architectures, assume that the code fragment for this homework can use seven registers. (Other registers are allocated for other variables, function arguments and return values, stack pointer, etc).

``Speed Demon'' Architecture and Microprocessor

Branch instructions test a general register (e.g. JUMPEQ R3, 300)
Branches are resolved in the Execute stage.
Branches with negative offsets (earlier locations in code) are predicted to be taken.
Branches with positive offsets (later locations in code) are predicted not taken.
All Load/Store instructions use displacement addressing mode (e.g. 100(R3) = M[100+R3]).
Processor speed = 300MHz

``Brainiac'' Architecture and Microprocessor

Load/Store instructions use either displacement (e.g. 100(R3) = M[100+R3]) or indexed (e.g. 100(R3, R4) = M[100+R3+R4]) addressing modes. Load/Stores using indexed addressing mode spend two cycles in Execute.
RBR is a special decrement-and-test register, which is used with the JUMPCT instruction. The instruction JUMPCT imm decrements RB and branches to PC+imm if RBR is non-negative after decrementing. RBR is decremented and tested in the Decode stage and the branch is resolved in this stage. Your code fragment may use RBR, and this counts as one of the seven registers available.
There is a special load-rbr-immediate instruction (LOADBR), which is used to load RBR with a loop count at the beginning of a loop. This load is done in the Decode stage.
RBR can be used as a source operand to Execute, but not as a destination.
Branches are statically predicted to be not-taken.
Processor speed = 166MHz

Problem 1 (2 points)

Write the assembly code for both the Speed Demon and Brainiac processors. Be sure to use your registers effectively and schedule the code for optimum performance. Utilize the added features of the Brainiac when they will increase performance.

Problem 2 (8 points)

Draw execution tables for the initiation of the loop (setting up register values), one iteration of the loop, and the conclusion of the loop (testing loop count when it is done with the array) for each of the two code fragments. Assume that all memory accesses in these iterations get level-one cache hits.

NOTE You may draw simplified tables (i.e. don't show how the instructions are decoded, etc) if you wish. But, make sure that there is sufficient information to give you partial credit in case you don't get everything correct.

Problem 3 (4 points)

Calculate the average CPI for this loop (the full 1024 iterations) for each microprocessor.

Problem 4 (3 points)

What are the costs (e.g. where would extra hardware be needed, what would increase the cycle time and why) of each of the Brainiac's features. Be specific.

Problem 5 (2 points)

Is this example a good benchmark for comparing the Brainiac and Speed Demon? Justify your answer.

Problem 6 (3 Points)

Can different branch-prediction techniques be combined? If yes, which techniques can be used together and why? If not, why not?

Problem 7 (3 Points)

	Forward	Backward
%Jump	75%	25%
%Taken	50%	90%

Problem 7(a) (1 Points)

Why does predicting backwards branches to be taken, as opposed to predicting all branches as not-taken, result in a lower total branch penalty? (where total branch penalty = branch penalty + mispredict penalty) Justify your answer in terms of the behaviour of typical programs.

Problem 7(b) (2 Points)

How much smaller is the total branch penalty for predicting a branch based on the sign of the offset as compared to predicting all branches as not-taken?

Problem 8 (1 Point)

How much time did you spend total on the assignment and on each of these problems? Which problems were useful or useless? How well did the lectures and text book prepare you for this homework? Which concepts needed better explanation?

Postscript source for Figures 1, 2

Home

Suggestion Box

Last modified: 18 Mar 96