Homework 7 Solutions

H&P 6.4 Ideal CPI for this machine is 1.

The penalty for branch hazards (the only ones considered for this problem) depends on how one assumes the pipelining is handled. For each assumption, the CPI is calculated as the penalty plus the minimum CPI (same as the ideal CPI).

H&P 6.9 This question is quite vague about what data to use. For this solution, the calculations are based on the DLX data from table 6.18. That is that:

It is also assumed that the penalty for unconditional branches is 2 in both machines.

------ Begin Important Calculations -----

From the problem statement, we know the following about the machine with the branch-target buffer:

That means that the penalty for conditional branches in the branch-target buffer machine is as follows:

<% cond branch> * (10% * 3 + 9% * 4) = <% cond branch> * 0.66

In the machine with a constant 2 cycle branch penalty, the penalty for conditional branches is:

<% cond branch> * 2

For unconditional branches, if you assigned a penalty for either machine, the penalty is:

<% uncond branch> * uncond branch penalty

To compare performance, add the penalties you computed to 1 (stated base CPI) and take the ratio of the branch-target machine to the fixed penalty machine.

----- End Important Calculations -----

For the particular assumptions that are made here, the CPI for the branch-target buffer machine are as follows:

Conditional branch penalty = 11% * 0.66 = 0.0726

Unconditional branch penalty = 2% * 2 = 0.04

CPI = 1.1126

For the fixed penalty machine, the CPI is computed as follows:

Conditional branch penalty = 11% * 2 = 0.22

Unconditional branch penalty = 2% * 2 = 0.04

CPI = 1.26

Therefore, the branch-target buffer machine is 1.26/1.1126 = 1.13 or 13% faster.

H&P 6.10 Assume a mispredicted unconditional branch takes a 3 cycle penalty.

90% of unconditional branches will take 0 cycles (penalty = -1 cycles)

10% of unconditional branches will take a penalty of 3 cycles

From a base CPI of 1 cycle, the 0-delay machine has a CPI as follows:

1 - <% uncond branch> * 90% * 1 + <% uncond branch> * 10% * 3

For the DLX, CPI is 1-0.02*90%+0.02*10%*3 = 0.988

Therefore, speedup is 1.1/0.998 = 1.11 or approx 11% faster.