Homework 7 Solutions

H&P 6.4 Ideal CPI for this machine is 1.

The penalty for branch hazards (the only ones considered for this problem) depends on how one assumes the pipelining is handled. For each assumption, the CPI is calculated as the penalty plus the minimum CPI (same as the ideal CPI).

Assuming there is no branch prediction, there is a penalty of 2 cycles for each conditional branch and 1 cycle for each unconditional branch. Therefore, the CPI is:
0.20*2+0.05*1+1.00=1.45
Therefore, the machine is 45% faster.
Assuming the prediction of branch not taken, there is a penalty of 1 cycle for each unconditional branch and 2 cycles for 60% of the conditional branches. Therefore, the CPI is:
0.20*0.60*2+0.05*1+1.00=1.29
Therefore, the machine is 29% faster.
Assuming the prediction of branch taken, there is a penalty of 2 cycles for 40% of the conditional branches. Therefore, the CPI is:
0.20*0.40*2+1.00=1.16
Therefore, the machine is 16% faster.

H&P 6.9 This question is quite vague about what data to use. For this solution, the calculations are based on the DLX data from table 6.18. That is that:

2% of instructions are unconditional branches
11% of instructions are conditional branches

It is also assumed that the penalty for unconditional branches is 2 in both machines.

------ Begin Important Calculations -----

From the problem statement, we know the following about the machine with the branch-target buffer:

90% * 90% or 81% of the conditional branches hit correctly in the buffer (no penalty)
100% - 90% or 10% of the conditional branches miss the buffer (3 cycle penalty)
90% - 81% or 9% of the conditional branches hit the buffer but are mispredicted (4 cycle penalty)

That means that the penalty for conditional branches in the branch-target buffer machine is as follows:

<% cond branch> * (10% * 3 + 9% * 4) = <% cond branch> * 0.66

In the machine with a constant 2 cycle branch penalty, the penalty for conditional branches is:

<% cond branch> * 2

For unconditional branches, if you assigned a penalty for either machine, the penalty is:

<% uncond branch> * uncond branch penalty

To compare performance, add the penalties you computed to 1 (stated base CPI) and take the ratio of the branch-target machine to the fixed penalty machine.

----- End Important Calculations -----

For the particular assumptions that are made here, the CPI for the branch-target buffer machine are as follows:

Conditional branch penalty = 11% * 0.66 = 0.0726

Unconditional branch penalty = 2% * 2 = 0.04

CPI = 1.1126

For the fixed penalty machine, the CPI is computed as follows:

Conditional branch penalty = 11% * 2 = 0.22

Unconditional branch penalty = 2% * 2 = 0.04

CPI = 1.26

Therefore, the branch-target buffer machine is 1.26/1.1126 = 1.13 or 13% faster.

H&P 6.10 Assume a mispredicted unconditional branch takes a 3 cycle penalty.

90% of unconditional branches will take 0 cycles (penalty = -1 cycles)

10% of unconditional branches will take a penalty of 3 cycles

From a base CPI of 1 cycle, the 0-delay machine has a CPI as follows:

1 - <% uncond branch> * 90% * 1 + <% uncond branch> * 10% * 3

For the DLX, CPI is 1-0.02*90%+0.02*10%*3 = 0.988

Therefore, speedup is 1.1/0.998 = 1.11 or approx 11% faster.