CPSC 418: Solution 2
Problem 1
(Part 1)
CPI_current = [% LD/ST Inst x LD/ST CPI] + [% Other Inst x CPI Other]
CPI_current = [0.26 x 2.3] + [0.74 x 1.5] = 1.708
Option 1:
CPI_new = [% LD/ST Inst x reduced CPI by 20%] + [% Other Inst x CPI Other]
CPI_new = [0.26 x (2.3 - 0.2(2.3))] + [0.74 x 1.5] = 1.588
Compare Performance_new with Performance_current:
CPU time_current = #inst x CPI_current x speed
= #inst x 1.708 x speed
= 1.708
CPU time_new = #inst x CPI_new x speed
= #inst x 1.588 x speed
= 1.588
Note that #inst and speed parameters are constant in Option 1.
Option 2:
In this case, the speed changes whereas in Option 1, the CPI value changed.
speed_new = 0.9 x speed_current
Compare Performance_new with Performance_current:
CPU_time_current = #inst x CPI_current x speed_current
= #inst x 1.708 x speed
= 1.708
CPU_time_new = #inst x CPI_current x speed_new
= #inst x 1.708 x (0.9 x speed_current)
= 1.537
From the above calculations,
CPU_time_option2 < CPU_time_option1;
therefore,
=> CPU in option 2 is faster. Therefore, Option 2 will result in a microprocessor with higher performance.
(Part 2)
A is n% faster than B:
[ (Time(B) - Time(A)) / Time(A) ] x 100%
[ (1.708 - 1.537) / 1.537 ] x 100 = 11.12%
=> Option 2 (the faster option) is 11.12% faster than the current
microprocessor.
Problem 2
The following graphs show log2(SPECint92) and log2(SPECfp92) vs time
for four different families of microprocessors: DEC Alpha (axp); Intel
x86 (intel); Mips R3000, 4000, 4400; and Sun Sparc and HyperSparc.
The diagonal lines in order of decreasing steepness correspond to
performance doubling every year, every two years, every three years,
and every four years.
From the graphs we conclude that:
CPU Family | Approx Period to Double Performance
|
---|
DEC Alpha | 2.0 years
|
Intel x86 | 1.5 years
|
MIPS R3/4000 | 2.5 years
|
Sun Sparc | 2.0 years
|
One hypotheses that we can consider based on this data is that the
Intel x86, being a low-cost mass-market product, has been able to
increase its performance more rapidly than the other architectures
because the x86 has incorporated existing micro-architectural features,
such as pipelining, superscalar issue, register-renaming, etc into its
microprocessors. In comparison, the other CPUs have been targeted
at the high performance workstation market from the beginning and so
have always had to break new ground to increase performance. As the
x86 begins to catch up in the sophistication of its design, we may
begin to see a decline in the rate at which its performance increases.
The solutions used SPECint92 and SPECfp92 to evaluate performance.
For most microprocessors, the increases in floating-point and integer
performance have closely paralleled each other. For the Alphas,
floating-point performance is consistently better than integer.
For x86s the inverse is true, but that is beginning to change.
Sparcs and MIPS are more closely balanced between integer and floating point.
SPEC92 is a good judge of performance because it uses a mixture of
real-world programs with real-world data. The programs are large enough
and there are enough of them that it is somewhat difficult for a
compiler to implement SPEC-specific optimizations (which would
mean that the SPEC numbers would not reflect users' experience
when running other programs), but this does happen and so SPEC92
is being replaced with SPEC95.
In looking at the various factors that have changed over time and how much
they have improved performance, it is clear that most of the increase
in performance is due to decreases in features size.
There is no clear indication that rate that performance increases
has slowed down in the last five years. For the x86, the rate has
increased. Over the past 20 years the rate has probably decreased
somewhat, but it is hard to get precise performance data that
can be used to accurately compare performance over that range
of time.
Problem 3
(Part a)
Technology parameters from Handout 5 are: Vdd, Tech, and Metal.
Microprocessors using the same technology are:
=> Alpha 21064a-1, 21064a-2, 21066a, 21164
(Part b)
Design differences between the 21064 and the 21064a:
- The 21064a uses a better semiconductor technology
(improvement from 0.7um to 0.5um, refer to calculations below).
- The chip size has significantly decreased for the 21064a because
of the better technology used, while the number of transistors has
significantly increased mainly due to the improved memory component of
the chip - the cache. The number of transistors that went into the
memory part of the chip is: [(2.8-1.68) / (16x1024x8)]
transistors/bit.
- The 21064a has a higher clock rate (at 275-300 MHz) than the
21064a (at 200 MHz).
- The 21064a has also doubled cache size, thus directly improving
the perfomance.
- Performance numbers of the 21064 are lower than the 21064a
(SPEC-92 values) meaning that the 21064a has performance improvements
(refer to calculations below).
- Note that the tradeoff of the performance improvements made is
shown partly in the power consumption number. The power consumption
of the 21604a has slighly increased.
Note:
- theoretically, clock_speed is directly proportional to performance
- theoretically, clock_speed is inversely proportional to feature
size (um values).
Calculations for 21064 -> 21064a-1:
Clock speed vs. feature size:
(275/200) / (0.75/0.5)
=> 0.91
=> clock speed did not increase as much as feature size decreased.
Probably DEC made a design tradeoff and increased performance in
some other way as well.
Integer performance vs. clock speed:
(194/133) / (275/200)
=> 1.06
=> Small increase in integer performance beyond that contributed by
clock speed.
Integer performance vs. feature size
(194/133) /
(0.75/0.5)
=> 0.97
=> Integer performance did not increase as
much as feature size decreased. This is surprising and it would be
interesting to find out why this happened. Unfortunately, with the
information we have, we can't find out...
(Part c)
-------------------------------------
| clock speed | Major | Minor |
----------------------------------------------------------
21064 -> 21064a-1 | | X | |
----------------------------------------------------------
21064a-1 -> 21064a-2 | X | | |
----------------------------------------------------------
21064a-2 -> 21164-1 | | X | |
----------------------------------------------------------
21164-1 -> 21164-2 | | | X |
----------------------------------------------------------
21164-2 -> 21164a | | | X |
----------------------------------------------------------
Numeric calculations are shown only for the 2nd and 3rd transitions.
21064 -> 21064a-1
Major implementation improvements include doubled cache size, improved
technology, smaller chip size, faster clock speed, and higher
performance numbers according to SPEC-92 figures.
21064a-1 -> 21064a-2
Improvement made in this transition is purely in clockspeed.
Clock speed vs. feature size:
(300/275) / (0.5/0.5)
=> 1.09
=> Logic optimization allowed increase in clock speed w/o changing
feature size.
Integer performance vs. clock speed:
(220/194) / (300/275)
=> 1.04
=> The increase in integer performance is almost entirely due to the
increase in clock speed.
21064a-2 -> 21164-1
Major implementation improvements include higher performance numbers
according to SPEC-92 figures, quad-issuing (4 instructions/cycle), and
multiple cache levels.
Clock speed vs. feature size:
(300/300) / (0.5/05)
=> 1
=> No improvements in logic optimization.
Integer performance vs. clock speed:
(341/220) / (300/300)
=> 1.55
=> Increase in integer performance due to more than just clock speed.
Moved from dual-issue to quad issue and added on-chip L2 cache (see
footnote 1).
21164-1 -> 21164-2
Minor implementation improvements include a slightly faster clockspeed
and higher performance numbers (SPEC-92 figures).
21164-2 -> 21164a
Minor implementation improvements include
faster clockspeed and higher performance figures. The better
technology used in the 21164a is refered to as doing a "shrink" and
does not necessarily mean that the design was changed significantly.
(Part d)
No, the 21066a is not a technological disaster. Microprocessors are
designed for a specific purpose or usage. The Alpha 21066a
microprocessor has been designed to be a low-cost, low-power
consumption microprocessor. The tradeoff of designing and
implementing a low-cost machine means having a lower clock speed and
smaller performance numbers. The 21066a was probably designed for
small machines such as laptops or embedded controllers.
Last modified: 17 Jan 96