CPSC 418: Solution 5

Home

Problem 1

(Part 1a)

For all machines, CPI is as follows:

CPI = %single_issue + %dual_issue/2 + %triple_issue/3 + %quad_issue/4

Plugging in the numbers, the CPIs are as follows:

current	Micro2/Comp2	0.75
option 1	Micro3/Comp2	0.51
option 2	Micro2/Comp3	0.60

The option Micro3/Comp2 (CPI=0.51) is faster than the Micro2/Comp3 (CPI=0.60) option.

To calculate how much faster Micro3/Comp2 is than the original Micro2/Comp2:

 
    (Slow Time - Fast Time) / Fast Time = (0.75 - 0.508) / 0.508 
                                        = 0.47

The faster option, Micro3/Comp2, is 47% faster than the original Micro2/Comp2.

(Part 1b)

To keep pace with doubling performance every year, you need a speedup (time_new/time_old) of 2^x after x years (for x>=0), or 2^(n/24) after n months.

Comp3 will take six months to develop, so Micro2/Comp3 requires a speedup of:

2^(6/24) = 1.19

or better to keep pace with competition. In other words, (time_new/time_old)=1.19. In terms of n% faster, this is 19% faster.

Micro3 will take two years to develop, so Micro3/Comp2 requires a speedup of:

2^(2/2) = 2

or better to keep pace with competition. In other words, (time_new/time_old)=2, or 100% faster.

Referring to question (1a), the Micro3/Comp2 is 47% faster than Micro2/Comp2, and 47%<100%. Therefore Micro3/Comp2 is not a good choice.

Micro2/Comp3 is:

(0.75 - 0.60)/0.60 = 0.25

25% faster and 25%>19%. Therefore Micro2/Comp3 is a good choice.

(Part 1c)

The assumptions can be found by looking at the formula for time to execute a trace:

Time = #Inst * Clock * CPI

We've examined the effects of a compiler on the percentage of instructions that can multi-issue and some of the effects of micro-architectural changes on CPI. But we ignored the effects of a compiler on #Inst, any changes associated with the clock speed, and other factors affecting CPI.

The most unrealistic assumptions are that:

As was discussed in HW-03 prob 3, moving to a superscalar machine will likely increase the CPI.
In the time that it takes to develop Micro3, clock speeds will likely increase significantly, which will boost performance and could make Micro3 a better option than Comp3.

Problem 2

 

Parameter                | Big=Better? |   Value            
----------------------------------------------------------- 
SPECint92                |   Yes       |     300   
----------------------------------------------------------- 
Fabrication feature size |    No       |  0.35um 
----------------------------------------------------------- 
# Pipe stages            |   Yes       |    5-6  
                         |(but only up | 
                         |  to ~10)    | 
----------------------------------------------------------- 
# Execution units        |   Yes       |      5 
----------------------------------------------------------- 
Clock speed              |   Yes       |  100MHz 
----------------------------------------------------------- 
L1 Cache                 |   Yes       |    16Kb  
----------------------------------------------------------- 
L2 Cache                 |   Yes       |   256Kb or 128Kb 
-----------------------------------------------------------

Problem 3

 
              CTRL                   EXAMPLE 
--------------------------------------------------------- 
ABS     O/S subroutines           Call 1589 RA 
        with hardcoded addrs      RA = PC + 1 
        (typically primitive      PC = 1589 
         funcs) 
--------------------------------------------------------- 
DISP    PC relative.              DJump 135(PC) 
        Compiler knows addr       DJump -10(PC) 
        of functions relative 
        to PC. 
        ------------------------------------------------- 
        Jump through table index
        where table address
        hardcoded      
         e.g. interrupt vector    DCall 1500(R3), RA
              call
        ------------------------------------------------- 
        Indirect (compute addr    DCall (R4), RA
        then do CTRL) 
--------------------------------------------------------- 
INDEX   Indirect jump through     ICall (R2,R3), RA
        table index if base 
        address of table is 
        not hardcoded.
--------------------------------------------------------- 
 
 
 
           LOAD/STORE                EXAMPLE 
--------------------------------------------------------- 
ABS     O/S data                  Load 18954, R3 
                                  R3 = M[18954] 
--------------------------------------------------------- 
DISP    O/S data stored           Load 18324(R4), R5
        in tables (hardcoded
        table addr) 
        ------------------------------------------------- 
        Indirect (compute addr    DLoad 0(R3), R8 
        first)                    R8 = M[R3] 
--------------------------------------------------------- 
INDEX   Array Indexing            ILoad (R3,R4), R5 
---------------------------------------------------------

Problem 4

R5,R6,R7 are the return address (RA), stack pointer (RA), and PC, respectively. Thus, they can not be used for normal variables in the program.

R0, R1 are used to pass parameters and are scratch registers. The other normal registers (R2,R3,R4) must be saved and restored if they are used by a function.

R0 R1 R2 R3 R4
g:entry M -- -- -- --
g:exit 2*M -- -- -- --
f:entry &X Y -- -- --
f:call g Y -- &X -- --
f:exit -- -- -- -- --
main:start -- -- &A &B --
main:call f &A B-2 &A &B --

	R0	R1	R2	R3	R4
g:entry	M	--	--	--	--
g:exit	2*M	--	--	--	--
f:entry	&X	Y	--	--	--
f:call g	Y	--	&X	--	--
f:exit	--	--	--	--	--
main:start	--	--	&A	&B	--
main:call f	&A	B-2	&A	&B	--

_g:     LMultx  R0, 2, R0           ; R0 = M*2
        DJump   (RA)

_f:     LSubx   SP, 2, SP           ; decrement SP for saved regs
        DStore  R2, 0(SP)           ; save R2 on stack
        DStore  RA, 1(SP)           ; save RA on stack
        LAddx   R0, 0, R2           ; R2 = &X
        LAddx   R1, 0, R0           ; R0 = Y
        Call    _g, RA
        DStore  R0, (R2)            ; M[&X] = g(Y)
        DLoad   1(SP), RA           ; restore RA
        DLoad   0(SP), R2           ; restore R2
        LAddx   SP, 2, SP           ; increment SP for restoring regs
        DJump   (RA)

_main:  ...
        DLoad   (R3), R1            ; R1 = B
        LAddx   3, R1, R0           ; R0 = B + 3
        DStore  R0, (R2)            ; M[&A] = R0
        LAddx   R2, 0, R0           ; R0 = &A
        LSubx   R1, 2, R1           ; R1 = B-2
        Call    _f, RA
        DLoad   (R2), R0            ; R0 = A
        DLoad   (R3), R1            ; R1 = B
        Addx    R0, R1, R0          ; R0 = A + B
        DStore  R0, (R2)            ; M[&A] = A + B

Note: In this case, we could have optimized away the second load of B into R1, by keeping a copy of B in R4. But in general, this is not a safe optimization, because we can't tell if f or g modify the data at address &B, possibly by following a pointer, that isn't necessarily related to B in any obvious way.

Note: a student raised a question about the old version of the code for saving and restoring registers:

Old code:

_f:     DStore  R2, -1(SP)          ; save R2 on stack
        DStore  RA, -2(SP)          ; save RA on stack
        LSubx   SP, 2, SP           ; decrement SP for saved regs
        ...
        LAddx   SP, 2, SP           ; increment SP for restoring regs
        DLoad   -2(SP), RA          ; restore RA
        DLoad   -1(SP), R2          ; restore R2

Student's question:

In your code, control over the stack is relinquished before the values are restored. If an interrupt occured after you increment the stack pointer but before either of your two Dloads, the interrupt routine could overwrite the values in the stack before they are restored to RA and R2. Thus your code is non-deterministic.

Very good point. The same thing is true for the way that R2 and RA are saved on the stack. However, in reality on all machines that I'm aware of, different processes and different interrupt levels use different stack areas (hence F&R save SP in hw) and so an interrupt or context switch would not cause a problem.

There is a real problem though in accessing memory beyond where the SP is pointing. If the stack is right at the edge of filling up a page in memory and requiring a new page, DStore R2, -1(SP) could cause a segmentation fault. This would be a very difficult bug to track down, because it would almost never happen. We'll talk about pages and virtual memory starting on 12 Feb.

Problem 5

Vector table base address: 1184
Keyboard vector: 2032

From the above, we can compute the Keyboard vector address:
1184+2032=3216.

This vector indicates that the keyboard routine is located at memory location M[3216] = 891254

Assumptions:

Keyboard interrupt level 1
memory mapped I/O with keyboard address at M[854] (note that 854 is an arbitrary value)
Key hit is [space]

Description of abbreviations:

INTR	signal to request interrupt
INT_ACK	signal to acknowledge interrupt request
IL	internal register to store active interupt levels

Listing of execution of events:

Key pressed on keyboard
INTR[1] = hi
Wait until (INT_ACK[1] = hi and keyboard has control of bus)
Within the 10 cycles of the previous line: IL[1] = hi
Address Bus = 2032
PC <- 891254
During the next (500*CPI) cycles, plus any time at other interrupt level:
1. Execute a few instructions to save registers
2. Execute: Load 854, Rx (Rx some arbitrary register)
3. 32 (ASCII code for [SPACE]) appears on data bus
4. Rest of interrupt routine execute
5. Restore registers
6. Execute RFI
If INT[1] = lo
Then PC = PC[0] (PC for IL0)
Else INT_ACK[1] = hi

Problem 6

Allocate one page for a queue. Both producer and consumer have read/write privileges on page. The page also contains a semaphore variable that controls acess to the queue. Producer and consumer use LODSET base busy-wait before adding or deleting elements from queue.

Problem 7

LODSET is not a good way to lock files because files reside on disk, and LODSET semaphores are in memeory. When two users try to write to the same file, they are often on separate machines and so they cannot access each other's memory. The semaphore must reside on a shared storage medium such as the disk where the file actually resides.

Home

Last modified: 5 Feb 96