DIVISION EXAMPLES As requested in class, here's some examples of the division algorithms we considered. I find it easiest to keep the math straight when we assume that the dividend is normalized to be less than the divisor, so the result is always a fraction between 0 and 1. When dividing n-bit integers, it's easiest to lay them out as I did in class, with a 2n-bit partial remainder register and an n-bit divisor register lined up with the upper half of the partial remainder register. After n iterations, you will have your n-bit result. (Optimization: you can even reuse the bottom part of the partial remainder register to hold the result bits if you want to get tricky!) This is equivalent to premultiplying the dividend by 2^(-n), and then multiplying the result by 2^n. When dividing floating point numbers, it's often easier to think in terms of fractions, with the radix point at the left of the partial remainder register and divisor registers, and the radix points lined up. In this case, the divisor is usually normalized so that 1 <= D < 2, and the dividend is normalized so that P<1. The quotient bits you get start to the right of the radix point and continues for as much precision as you want (i.e. the first bit is 1/2, the next bit is 1/(2*2), then 1/(2*2*2), etc.) Regardless of how you line up the two, you'll get the correct result bits, unless the partial remainder gets too big and overflows. The only hard part is figuring out where the radix point is (and/or when to stop). The basic idea of the division algorithms is to iteratively select less and less significant bits of the result, keeping the partial remainder in range. Mathematically, we are always preserving the invariant: dividend = P/(2^i) + D*result where P is the partial remainder, i is the iteration counter, D is the divisor, and result is the result accumulated so far. Initially, result is 0, and i is 0, so dividend = P. As the number of iterations increase (i gets larger), we get closer and closer to dividend = D*result. At each iteration, we perform the following steps: Shift P one digit to the left. (P becomes 2*P.) Select next result digit d. Subtract d*D from P. Add d/(2^i) to result. You can work out that the invariant is preserved by this code. Algorithms differ mainly in how the result digit is selected. There are also many optimizations we can perform depending on the specifics of the algorithm. For example, if we get one result digit at a time, we can shift them into the result register, instead of adding d/(2^i). All of the above can be generalized to a different radix r by replacing 2 by r. Radix 4 is quite common, allowing you to get 2 bits of result (2 binary digits equal one radix-4 digit) on each iteration. You should view this text with a fixed-width font (like Courier) to get the examples to line up properly. Let's try dividing 14 by 3. 14 in binary is 1110. 3 in binary is 11. Restoring Division: P: 0000 1110 D: 0011 result: xxxx Iteration 1: Shift P: 0001 1100 Digit Selection: Try subtracting: P-D = 1110 1100. Oops, negative. Add D back in to restore P to previous value. Selected digit d=0. Shift the 0 into result. P: 0001 1100 D: 0011 result: xxx0 Iteration 2: Shift P: 0011 1000 Digit Selection: Try subtracting: P-D = 0000 1000. Positive. Do not add D back into P. Selected digit d=1. Shift the 1 into result. P: 0000 1000 D: 0011 result: xx01 Iteration 3: Shift P: 0001 0000 Digit Selection: Try subtracting: P-D = 1110 0000. Oops, negative. Add D back in to restore P to previous value. Selected digit d=0. Shift the 0 into result. P: 0001 0000 D: 0011 result: x010 Iteration 4: Shift P: 0010 0000 Digit Selection: Try subtracting: P-D = 1111 0000. Oops, negative. Add D back in to restore P to previous value. Selected digit d=0. Shift the 0 into result. P: 0010 0000 D: 0011 result: 0100 So, the result is 0100 or 4 in decimal, and the remainder is 0010 or 2 in decimal. And indeed 14 = 4*3 + 2. Non-Restoring Division: P: 0000 1110 D: 0011 result: 0000 Iteration 1: Shift P: 0001 1100 Digit Selection: P is positive, so we select d=1. Subtract d*D from P: 1110 1100 Shift result and add d. P: 1110 1100 D: 0011 result: 0001 Iteration 2: Shift P: 1101 1000 Digit Selection: P is negative, so we select d=-1. Subtract d*D from P (same as adding D to P): 0000 1000 Shift result (0010) and add d (subtract 1, yielding 0001). P: 0000 1000 D: 0011 result: 0001 Iteration 3: Shift P: 0001 0000 Digit Selection: P is positive, so we select d=1. Subtract d*D from P: 1110 0000 Shift result (0010) and add d (0011). P: 1110 0000 D: 0011 result: 0011 Iteration 4: Shift P: 1100 0000 Digit Selection: P is negative, so we select d=-1. Subtract d*D from P (same as adding D to P): 1111 0000 Shift result (0110) and add d (subtract 1, yielding 0101). So, the result is 0101 or 5 in decimal, and the remainder is 1111 or -1. And indeed 14 = 5*3 - 1. Note that non-restoring division can leave a negative remainder. If that's not acceptable, we need a final restoring step (adding in D and decrementing result). One thing to be careful about with non-restoring division is you need to make sure you have enough bits for P to hold a sign bit. If D had been larger, we would have needed an extra bit on P to avoid overflow. SRT Division: For binary (radix 2) SRT division, we can chose a quotient digit of -1, 0, or +1 at each step: If 0 <= P < 2D, then it's OK to chose digit 1. If -D < P < D, then it's OK to chose digit 0. If -2D < P <= 0, then it's OK to chose digit -1. (These limits are determined by making sure that P stays within bounds on the next iteration (when P becomes 2(P-d*D)).) Note that the regions overlap. This is good -- it means we have freedom to pick a digit even if we don't know exactly what P and D are. For binary SRT, it's particularly easy -- we can get away with just looking at the sign bit of P. If it's 0, we know 0 <= P, so we can pick d=1. If it's 1, we know P < 0, so we can pick d=-1. In other words, non-restoring division is a special case of SRT division. Things get more interesting if we try to make division run faster -- this is where SRT shines with two tricks: higher radix, and inexact computation of P. The higher radix idea is fairly straightforward: for example, if we change to radix 4, we can get 2 bits of quotient per clock cycle. The digit selection ranges from -2 to +2, with tighter requirements for which digit we can pick. Inexact computation of P requires a bit more explanation. Recall that adding two numbers takes a fair amount of time and/or logic, mainly because we need to propagate the carry. Also, recall from class the carry save adder, that very quickly and compactly adds three n-bit numbers (say a, b, and cin) and generates the result as an n-bit sum and an n-bit cout. (The actual sum of the three numbers is sum+2*cout.) So, if instead of keeping P as a single number, we keep it as the sum of *two* numbers, we can use a carry save adder to add in d*D very quickly at each iteration. The problem is that for digit selection, we want to know what the value of P is, but we can't compute P without doing a slow, carry-propagate addition of the two numbers. Aha! With SRT division, we only need an approximation of the value of P, so we can do the slow addition on only enough of the highest-order bits of P as we need to figure out which digit to select. In a real implementation, there is a table indexed by the high-order bits of P and D, which tells which digit to select. Much of the design effort is to determine exactly how many bits we need to check. To keep things simple, I'm going to index only off of the top bits of P. Let's work an example. We'll divide 14 by 3 again. To better illustrate the power of SRT division, though, I'm going normalize the division as a floating-point division. 14 and 3 in decimal are 1110 and 11 in binary. I'll normalize to divide 0.1110 by 1.1000, which is 7/8 divided by 3/2. We'll have to normalize the result by 3 bit positions (or multiply by 8). Since I'm too lazy to figure out exactly how many bits I need to look at, let's just think at an abstract level. Suppose I added the top few bits of P together, and the high-order two bits are 00. I *know* the result is non-negative, regardless of what could have been the carry from the lower bits (which I didn't add). Therefore, I know I can select d=1 as my next digit. Suppose the high-order bits were 01. Now, I don't know whether the result is positive or negative (if there is supposed to be carry to flip the sign bit). However, I do know that -D < P < D, so I can select d=0 as my next digit. Similarly, if the high-order bits were 10, I know the result is negative, so I can select d=-1. P: 000.1110 +: 000.0000 D: 001.1000 result: 0. Iteration 1: Shift both words of P: P: 001.1100 +: 000.0000 Add high order bits for P: 001 Regardless of possible carry, I know P is positive. Therefore, can select d=1. Result 0.1 Subtract dD from P (same as adding -D): P: 001.1100 +: 000.0000 -D: 110.1000 ---------- S: 111.0100 (becomes new P) C: 000.1000 (shifted, becomes new +) P: 111.0100 +: 001.0000 D: 001.1000 result: 0.1 Iteration 2: Shift both words of P: P: 110.1000 +: 010.0000 Add high order bits for P: 000 Regardless of possible carry, I know P is positive. Therefore, can select d=1. Result 0.11 Subtract dD from P (same as adding -D): P: 110.1000 +: 010.0000 -D: 110.1000 ---------- S: 010.0000 (becomes new P) C: 110.1000 (shifted, becomes new +) P: 010.0000 +: 101.0000 D: 001.1000 result: 0.11 Iteration 3: Shift both words of P: P: 100.0000 +: 010.0000 Add high order bits for P: 110 Regardless of possible carry, I know P is negative. Therefore, can select d = -1. Result 0.101 Subtract dD from P (same as adding D): P: 100.0000 +: 010.0000 0: 001.1000 ---------- S: 111.1000 (becomes new P) C: 000.0000 (shifted, becomes new +) P: 111.1000 +: 000.0000 D: 001.1000 result: 0.101 Iteration 4: Shift both words of P: P: 111.0000 +: 000.0000 Add high order bits for P: 111 Depending on carry, P could be positive or negative. However, regardless of possible carry, -D < P < D. Therefore, can select d=0. Result 0.1010 Subtract dD from P (same as adding 0): P: 111.0000 +: 000.0000 0: 000.0000 ---------- S: 111.0000 (becomes new P) C: 000.0000 (shifted, becomes new +) P: 111.0000 +: 000.0000 D: 001.1000 result: 0.1010 Iteration 5: Shift both words of P: P: 110.0000 +: 000.0000 Add high order bits for P: 110 Regardless of possible carry, I know P is negative. Therefore, can select d = -1. Result 0.10011 Subtract dD from P (same as adding D): P: 110.0000 +: 000.0000 D: 001.1000 ---------- S: 111.1000 (becomes new P) C: 000.0000 (shifted, becomes new +) P: 111.1000 +: 000.0000 D: 001.1000 result: 0.10011 Note that we've reached the same state as before Iteration 4. Therefore, from now on, we're going to select d=0 alternating with d=-1. Iteration 6: d = 0 result: 0.100110 Iteration 7: d = -1 result: 0.1001011 Iteration 8: d = 0 result: 0.10010110 Iteration 9: d = -1 result: 0.100101011 ... result = 0.100101010... Recall that we normalize the result by 3 decimal places, which gives 100.101010..., which is 4 plus 2/3, which is exactly the right result.