CPSC 418: Solution 2

Home

Problem 1

(Part 1)

CPI_current = [% LD/ST Inst x LD/ST CPI] + [% Other Inst x CPI Other]
CPI_current = [0.26 x 2.3] + [0.74 x 1.5] = 1.708

Option 1:

CPI_new = [% LD/ST Inst x reduced CPI by 20%] + [% Other Inst x CPI Other]
CPI_new = [0.26 x (2.3 - 0.2(2.3))] + [0.74 x 1.5] = 1.588

Compare Performance_new with Performance_current:

	CPU time_current = #inst x CPI_current x speed


			 = #inst x 1.708 x speed


			 = 1.708





	CPU time_new     = #inst x CPI_new x speed


			 = #inst x 1.588 x speed


			 = 1.588

Note that #inst and speed parameters are constant in Option 1.

Option 2:

In this case, the speed changes whereas in Option 1, the CPI value changed.

	speed_new = 0.9 x speed_current

Compare Performance_new with Performance_current:

	CPU_time_current = #inst x CPI_current x speed_current


		 	 = #inst x 1.708 x speed


			 = 1.708





	CPU_time_new     = #inst x CPI_current x speed_new


			 = #inst x 1.708 x (0.9 x speed_current)


		 	 = 1.537

From the above calculations, CPU_time_option2 < CPU_time_option1; therefore,
=> CPU in option 2 is faster. Therefore, Option 2 will result in a microprocessor with higher performance.

(Part 2)

A is n% faster than B:

	[ (Time(B) - Time(A)) / Time(A) ] x 100%





	[ (1.708 - 1.537) / 1.537 ] x 100 = 11.12%

=> Option 2 (the faster option) is 11.12% faster than the current microprocessor.

Problem 2

The following graphs show log2(SPECint92) and log2(SPECfp92) vs time for four different families of microprocessors: DEC Alpha (axp); Intel x86 (intel); Mips R3000, 4000, 4400; and Sun Sparc and HyperSparc. The diagonal lines in order of decreasing steepness correspond to performance doubling every year, every two years, every three years, and every four years.

From the graphs we conclude that:

CPU Family Approx Period to Double Performance
DEC Alpha 2.0 years
Intel x86 1.5 years
MIPS R3/4000 2.5 years
Sun Sparc 2.0 years
One hypotheses that we can consider based on this data is that the Intel x86, being a low-cost mass-market product, has been able to increase its performance more rapidly than the other architectures because the x86 has incorporated existing micro-architectural features, such as pipelining, superscalar issue, register-renaming, etc into its microprocessors. In comparison, the other CPUs have been targeted at the high performance workstation market from the beginning and so have always had to break new ground to increase performance. As the x86 begins to catch up in the sophistication of its design, we may begin to see a decline in the rate at which its performance increases. The solutions used SPECint92 and SPECfp92 to evaluate performance. For most microprocessors, the increases in floating-point and integer performance have closely paralleled each other. For the Alphas, floating-point performance is consistently better than integer. For x86s the inverse is true, but that is beginning to change. Sparcs and MIPS are more closely balanced between integer and floating point. SPEC92 is a good judge of performance because it uses a mixture of real-world programs with real-world data. The programs are large enough and there are enough of them that it is somewhat difficult for a compiler to implement SPEC-specific optimizations (which would mean that the SPEC numbers would not reflect users' experience when running other programs), but this does happen and so SPEC92 is being replaced with SPEC95. In looking at the various factors that have changed over time and how much they have improved performance, it is clear that most of the increase in performance is due to decreases in features size. There is no clear indication that rate that performance increases has slowed down in the last five years. For the x86, the rate has increased. Over the past 20 years the rate has probably decreased somewhat, but it is hard to get precise performance data that can be used to accurately compare performance over that range of time.

CPU Family	Approx Period to Double Performance
DEC Alpha	2.0 years
Intel x86	1.5 years
MIPS R3/4000	2.5 years
Sun Sparc	2.0 years

Problem 3

(Part a)

Technology parameters from Handout 5 are: Vdd, Tech, and Metal. Microprocessors using the same technology are:
=> Alpha 21064a-1, 21064a-2, 21066a, 21164

(Part b)

Design differences between the 21064 and the 21064a:

The 21064a uses a better semiconductor technology (improvement from 0.7um to 0.5um, refer to calculations below).
The chip size has significantly decreased for the 21064a because of the better technology used, while the number of transistors has significantly increased mainly due to the improved memory component of the chip - the cache. The number of transistors that went into the memory part of the chip is: [(2.8-1.68) / (16x1024x8)] transistors/bit.
The 21064a has a higher clock rate (at 275-300 MHz) than the 21064a (at 200 MHz).
The 21064a has also doubled cache size, thus directly improving the perfomance.
Performance numbers of the 21064 are lower than the 21064a (SPEC-92 values) meaning that the 21064a has performance improvements (refer to calculations below).
Note that the tradeoff of the performance improvements made is shown partly in the power consumption number. The power consumption of the 21604a has slighly increased.

Note:

theoretically, clock_speed is directly proportional to performance
theoretically, clock_speed is inversely proportional to feature size (um values).

Calculations for 21064 -> 21064a-1:

Clock speed vs. feature size:
(275/200) / (0.75/0.5)
=> 0.91
=> clock speed did not increase as much as feature size decreased. Probably DEC made a design tradeoff and increased performance in some other way as well.

Integer performance vs. clock speed:
(194/133) / (275/200)
=> 1.06
=> Small increase in integer performance beyond that contributed by clock speed.

Integer performance vs. feature size
(194/133) / (0.75/0.5)
=> 0.97
=> Integer performance did not increase as much as feature size decreased. This is surprising and it would be interesting to find out why this happened. Unfortunately, with the information we have, we can't find out...

(Part c)

	             -------------------------------------
	  	     |	clock speed  |	Major |	Minor     |
----------------------------------------------------------
21064 -> 21064a-1    |		     |    X   |	          |
----------------------------------------------------------
21064a-1 -> 21064a-2 |      X        |        |           |
----------------------------------------------------------
21064a-2 -> 21164-1  |		     |    X   |           |	
----------------------------------------------------------
21164-1 -> 21164-2   |               |        |     X     |
----------------------------------------------------------
21164-2 -> 21164a    |               |        |     X     |
----------------------------------------------------------

Numeric calculations are shown only for the 2nd and 3rd transitions.

21064 -> 21064a-1

Major implementation improvements include doubled cache size, improved technology, smaller chip size, faster clock speed, and higher performance numbers according to SPEC-92 figures.

21064a-1 -> 21064a-2

Improvement made in this transition is purely in clockspeed.

Clock speed vs. feature size:
(300/275) / (0.5/0.5)
=> 1.09
=> Logic optimization allowed increase in clock speed w/o changing feature size.

Integer performance vs. clock speed:
(220/194) / (300/275)
=> 1.04
=> The increase in integer performance is almost entirely due to the increase in clock speed.

21064a-2 -> 21164-1

Major implementation improvements include higher performance numbers according to SPEC-92 figures, quad-issuing (4 instructions/cycle), and multiple cache levels.

Clock speed vs. feature size:
(300/300) / (0.5/05)
=> 1
=> No improvements in logic optimization.

Integer performance vs. clock speed:
(341/220) / (300/300)
=> 1.55
=> Increase in integer performance due to more than just clock speed. Moved from dual-issue to quad issue and added on-chip L2 cache (see footnote 1).

21164-1 -> 21164-2

Minor implementation improvements include a slightly faster clockspeed and higher performance numbers (SPEC-92 figures).

21164-2 -> 21164a

Minor implementation improvements include faster clockspeed and higher performance figures. The better technology used in the 21164a is refered to as doing a "shrink" and does not necessarily mean that the design was changed significantly.

(Part d)

No, the 21066a is not a technological disaster. Microprocessors are designed for a specific purpose or usage. The Alpha 21066a microprocessor has been designed to be a low-cost, low-power consumption microprocessor. The tradeoff of designing and implementing a low-cost machine means having a lower clock speed and smaller performance numbers. The 21066a was probably designed for small machines such as laptops or embedded controllers.

Home

Last modified: 17 Jan 96