1. For architecture, remember that all of the Intex x86 microprocessors are in the same architecture. The 80486 and the Pentium have different numbers of pins, etc, but support the same assembly language instructions. Also, features defined by the architecture affect the correctness of a program, while features defined by an implementation affect the performance of a program. The following features are architecture defined:
2. This question may have caused a bit of confusion, depending upon the interpretation of indexed addressing mode and accessing the array. Each interpretation affects the solution for part (a) and the answers for the rest of the parts. Therefore, there are three solutions listed for each section of the problem and you should follow the same solution number all of the way through.
(a)
Solution #1
Indexed addressing as (R1)(R2) = M[R1+R2]
and
i
as already shifted over two bits to the left
so that it is indexing on 8 byte increments.
Method 1
; comment | cycles ADD R3, (R1)(R2) ; R3 = x + A[i] | 5 total = 5Method 2
; comment | cycles ADD R2, R1 ; R2 = A + i | 1 ADD R3, (R2) ; R3 = x + A[i] | 4 total = 5Solution #2
Indexed addressing as (R1)(R2) = M[R1+R2]
and i
as the array index into A
,
so that it needs to be shifted two bits to the left in
order to index on 8 byte increments.
Method 1
SHL R2, #2 1 cycle ADD R3, (R1)(R2) 5 cycles for a total of 6 cyclesMethod 2
SHL R2, #4 1 cycle ADD R2, R1 1 cycle ADD R3, (R2) 4 cycles for a total of 6 cyclesSolution #3
Indexed addressing as I(R1)[R2] = M[ I + R1 + d*R2]
and i
as the array index into A
.
Method 1
ADD R3, 0(R1)[R2] 5 cyclesMethod 2
SHL R2, #2 1 cycle ADD R2, R1 1 cycle ADD R3, (R2) 4 cycles for a total of 6 cycles
Method 1 uses 5 cycles per reference for a total of 10 cycles for 2 references.
Method 2 uses 5 cycles for the first reference and 4 for each subsequent reference for a total of 9 cycles for 2 references.
Solution #2 -- 2 references
Method 1 uses 6 cycles for the first reference and 5 for each subsequent reference for a total of 11 cycles for 2 references.
Method 2 uses 6 cycles for the first reference and 4 for each subsequent reference for a total of 10 cycles for 2 references.
Solution #3 -- 3 references
Method 1 uses 5 cycles for each reference for a total of 15 cycles for 3 references.
Method 2 uses 6 cycles for the first reference and 4 for each subsequent reference for a total of 14 cycles for 3 references.
(c) A single save/restore of a register to the stack costs 10 cycles. This is independent of the solution to part (a).
Solution #1 -- 12 references. The first reference for both methods costs the same number of cycles. After that, each reference by method 2 saves 1 cycle. Therefore, after 10 more references, the cycle count will be the same, leaving one more reference to make method 2 faster.
Method 1 cycles = 5 * n therefore 12 refs => 60 cycles
Method 2 cycles = 5 + 4 * (n-1) + 10 therefore 12 refs => 59 cycles
Solution #2 -- 12 references. Reasoning is the same as solution #1.
Method 1 cycles = 6 + 5 * (n-1) therefore 12 refs => 61 cycles
Method 2 cycles = 6 + 4 * (n-1) + 10 therefore 12 refs => 60 cycles
Solution #3 -- 13 references. Reasoning is the same as solution #1 except that the first reference with method 2 costs an extra cycle, so one more reference is needed to catch up.
Method 1 cycles = 5 * n therefore 13 refs => 65 cycles
Method 2 cycles = 6 + 4 * (n-1) + 10 therefore 13 refs => 64 cycles
(d) For all three solutions, replace the main ADD instruction with a LD instruction and then calculate A[i]+x using register direct mode. With both methods, all future accesses take the same number of cycles (ie, 1) so neither method is faster than the other.
3. Refer to appendix C.4 in the text for the numbers used in these calculations.
Therefore, 26-1 or 25% of instructions transfer data to/from memory.
100% of instructions load instructions from memory.
Therefore, 25/125 or 20% of all accesses are for data.
LW , LBU and LD are reading from memory and they account for 13%, 1%, and 4% (total of 18%) of the instructions.
Therefore, 18/25 or 72% of all data accesses are reading from memory.
100% of instructions read instructions from memory.
Therefore, (100+18)/(100+25) or 94.4% of all accesses are for reads.
4.(a) Let "n" be the number of instructions in a given program. From table 4.32 in the text, the following formulas represent the number of particular instructions executed by the 8086:
The following formulas represent the number of instructions executed by the modified machine:
All other instruction counts remain the same, and, therefore, may be ignored for the calculations.
Consequently, the new 8086 will execute 42.0% - 40.5% = 1.5% fewer instructions than the original.
Relevant data from table D.5 (time/frequency ratio)
The new machine will use the same number of cycles minus 1/4 of the LES and MOV cycles, minus all of the PUSH and POP cycles and plus 3/4 of the number of PUSH and POP instructions as the cycles used by a MOV and ADD instruction.
Breaking that down into pieces, 1/4 of the LES and MOV instructions use the following number of cycles:
(n*3%*24.0+n*27%*11.3)/4=n*0.943 cycles
All of the PUSH and POP cycles are:
n*5%*8.46+n*7%*14.1=n*1.41 cycles
The cost of a MOV and an ADD instruction are:
11.3+5.64=16.9 cycles
Multiply that by 3/4 of the number of PUSH and POP instruction to get the cycles used by the PUSH/POP replacement instructions:
(n*5%*75%+n*7%*75%)*16.9=(n*0.09)*16.9=n*1.52
Now, putting it all together, the new number of cycles is:
n*14.1-n*0.943-n*1.41+n*1.52=n*13.3
To compute the new number of instructions, we use the results of part (a) which tell us how many instructions the new machine uses. That is, the new machine uses 98.5%*n instructions. This gives is a new CPI as follows:
(n*13.3)/(n*98.5%)=13.3/0.985=13.5 CPI
Therefore, the original machine has a higher CPI by 0.6 cycles per instruction.
14.1/13.5=1.04 (or a 4.44% increase)
1/CPI_old - 1/CPI_new --------------------- 1/CPI_oldmultiplying through by
CPI_old/CPI_old
gives:
1 - CPI_old/CPI_newwhich evaluates to:
1 - 14.1/13.5=0.04 (or a 4.44% increase)