Prev | Home | Next |
CPI = %single_issue + %dual_issue/2 + %triple_issue/3 + %quad_issue/4
current | Micro2/Comp2 | 0.75 |
---|---|---|
option 1 | Micro3/Comp2 | 0.51 |
option 2 | Micro2/Comp3 | 0.60 |
The option Micro3/Comp2 (CPI=0.51) is faster than the Micro2/Comp3 (CPI=0.60) option.
To calculate how much faster Micro3/Comp2 is than the original Micro2/Comp2:
(Slow Time - Fast Time) / Fast Time = (0.75 - 0.508) / 0.508 = 0.47The faster option, Micro3/Comp2, is 47% faster than the original Micro2/Comp2.
Comp3 will take six months to develop, so Micro2/Comp3 requires a speedup of:
2^(6/24) = 1.19or better to keep pace with competition. In other words, (time_new/time_old)=1.19. In terms of n% faster, this is 19% faster.
Micro3 will take two years to develop, so Micro3/Comp2 requires a speedup of:
2^(2/2) = 2or better to keep pace with competition. In other words, (time_new/time_old)=2, or 100% faster.
Referring to question (1a), the Micro3/Comp2 is 47% faster than Micro2/Comp2, and 47%<100%. Therefore Micro3/Comp2 is not a good choice.
Micro2/Comp3 is:
(0.75 - 0.60)/0.60 = 0.2525% faster and 25%>19%. Therefore Micro2/Comp3 is a good choice.
Time = #Inst * Clock * CPIWe've examined the effects of a compiler on the percentage of instructions that can multi-issue and some of the effects of micro-architectural changes on CPI. But we ignored the effects of a compiler on #Inst, any changes associated with the clock speed, and other factors affecting CPI.
The most unrealistic assumptions are that:
Parameter | Big=Better? | Value ----------------------------------------------------------- SPECint92 | Yes | 300 ----------------------------------------------------------- Fabrication feature size | No | 0.35um ----------------------------------------------------------- # Pipe stages | Yes | 5-6 |(but only up | | to ~10) | ----------------------------------------------------------- # Execution units | Yes | 5 ----------------------------------------------------------- Clock speed | Yes | 100MHz ----------------------------------------------------------- L1 Cache | Yes | 16Kb ----------------------------------------------------------- L2 Cache | Yes | 256Kb or 128Kb -----------------------------------------------------------
CTRL EXAMPLE --------------------------------------------------------- ABS O/S subroutines Call 1589 RA with hardcoded addrs RA = PC + 1 (typically primitive PC = 1589 funcs) --------------------------------------------------------- DISP PC relative. DJump 135(PC) Compiler knows addr DJump -10(PC) of functions relative to PC. ------------------------------------------------- Jump through table index where table address hardcoded e.g. interrupt vector DCall 1500(R3), RA call ------------------------------------------------- Indirect (compute addr DCall (R4), RA then do CTRL) --------------------------------------------------------- INDEX Indirect jump through ICall (R2,R3), RA table index if base address of table is not hardcoded. --------------------------------------------------------- LOAD/STORE EXAMPLE --------------------------------------------------------- ABS O/S data Load 18954, R3 R3 = M[18954] --------------------------------------------------------- DISP O/S data stored Load 18324(R4), R5 in tables (hardcoded table addr) ------------------------------------------------- Indirect (compute addr DLoad 0(R3), R8 first) R8 = M[R3] --------------------------------------------------------- INDEX Array Indexing ILoad (R3,R4), R5 ---------------------------------------------------------
R5,R6,R7 are the return address (RA), stack pointer (RA), and PC, respectively. Thus, they can not be used for normal variables in the program.
R0, R1 are used to pass parameters and are scratch registers. The other normal registers (R2,R3,R4) must be saved and restored if they are used by a function.
R0 | R1 | R2 | R3 | R4 | |
---|---|---|---|---|---|
g:entry | M | -- | -- | -- | -- |
g:exit | 2*M | -- | -- | -- | -- |
f:entry | &X | Y | -- | -- | -- |
f:call g | Y | -- | &X | -- | -- |
f:exit | -- | -- | -- | -- | -- |
main:start | -- | -- | &A | &B | -- |
main:call f | &A | B-2 | &A | &B | -- |
_g: LMultx R0, 2, R0 ; R0 = M*2 DJump (RA) _f: LSubx SP, 2, SP ; decrement SP for saved regs DStore R2, 0(SP) ; save R2 on stack DStore RA, 1(SP) ; save RA on stack LAddx R0, 0, R2 ; R2 = &X LAddx R1, 0, R0 ; R0 = Y Call _g, RA DStore R0, (R2) ; M[&X] = g(Y) DLoad 1(SP), RA ; restore RA DLoad 0(SP), R2 ; restore R2 LAddx SP, 2, SP ; increment SP for restoring regs DJump (RA) _main: ... DLoad (R3), R1 ; R1 = B LAddx 3, R1, R0 ; R0 = B + 3 DStore R0, (R2) ; M[&A] = R0 LAddx R2, 0, R0 ; R0 = &A LSubx R1, 2, R1 ; R1 = B-2 Call _f, RA DLoad (R2), R0 ; R0 = A DLoad (R3), R1 ; R1 = B Addx R0, R1, R0 ; R0 = A + B DStore R0, (R2) ; M[&A] = A + BNote: In this case, we could have optimized away the second load of B into R1, by keeping a copy of B in R4. But in general, this is not a safe optimization, because we can't tell if f or g modify the data at address &B, possibly by following a pointer, that isn't necessarily related to B in any obvious way.
Note: a student raised a question about the old version of the code for saving and restoring registers:
Old code:
_f: DStore R2, -1(SP) ; save R2 on stack DStore RA, -2(SP) ; save RA on stack LSubx SP, 2, SP ; decrement SP for saved regs ... LAddx SP, 2, SP ; increment SP for restoring regs DLoad -2(SP), RA ; restore RA DLoad -1(SP), R2 ; restore R2Student's question:
In your code, control over the stack is relinquished before the values are restored. If an interrupt occured after you increment the stack pointer but before either of your two Dloads, the interrupt routine could overwrite the values in the stack before they are restored to RA and R2. Thus your code is non-deterministic.Very good point. The same thing is true for the way that R2 and RA are saved on the stack. However, in reality on all machines that I'm aware of, different processes and different interrupt levels use different stack areas (hence F&R save SP in hw) and so an interrupt or context switch would not cause a problem.
There is a real problem though in accessing memory beyond where the SP is pointing. If the stack is right at the edge of filling up a page in memory and requiring a new page, DStore R2, -1(SP) could cause a segmentation fault. This would be a very difficult bug to track down, because it would almost never happen. We'll talk about pages and virtual memory starting on 12 Feb.
This vector indicates that the keyboard routine is located at memory location M[3216] = 891254
Assumptions:
INTR | signal to request interrupt |
---|---|
INT_ACK | signal to acknowledge interrupt request |
IL | internal register to store active interupt levels |
Prev | Home | Next |