DIVISION EXAMPLES

As requested in class, here's some examples of the division algorithms
we considered.

I find it easiest to keep the math straight when we assume that the
dividend is normalized to be less than the divisor, so the result is
always a fraction between 0 and 1.

When dividing n-bit integers, it's easiest to lay them out as I did
in class, with a 2n-bit partial remainder register and an n-bit divisor
register lined up with the upper half of the partial remainder register.
After n iterations, you will have your n-bit result.  (Optimization:  you
can even reuse the bottom part of the partial remainder register to hold
the result bits if you want to get tricky!)  This is equivalent to
premultiplying the dividend by 2^(-n), and then multiplying the result
by 2^n.

When dividing floating point numbers, it's often easier to think in terms
of fractions, with the radix point at the left of the partial remainder
register and divisor registers, and the radix points lined up.  In this
case, the divisor is usually normalized so that 1 <= D < 2, and the
dividend is normalized so that P<1.  The quotient bits you get start to
the right of the radix point and continues for as much precision as you
want (i.e. the first bit is 1/2, the next bit is 1/(2*2), then 1/(2*2*2), etc.)

Regardless of how you line up the two, you'll get the correct result
bits, unless the partial remainder gets too big and overflows.  The
only hard part is figuring out where the radix point is (and/or when
to stop).

The basic idea of the division algorithms is to iteratively select
less and less significant bits of the result, keeping the partial
remainder in range.  Mathematically, we are always preserving the
invariant:
	dividend = P/(2^i) + D*result
where P is the partial remainder, i is the iteration counter, D is the
divisor, and result is the result accumulated so far.  Initially, result
is 0, and i is 0, so dividend = P.  As the number of iterations increase
(i gets larger), we get closer and closer to dividend = D*result.
At each iteration, we perform the following steps:
	Shift P one digit to the left.  (P becomes 2*P.)
	Select next result digit d.
	Subtract d*D from P.
	Add d/(2^i) to result.
You can work out that the invariant is preserved by this code.

Algorithms differ mainly in how the result digit is selected.  There
are also many optimizations we can perform depending on the specifics
of the algorithm.  For example, if we get one result digit at a time,
we can shift them into the result register, instead of adding d/(2^i).

All of the above can be generalized to a different radix r by replacing
2 by r.  Radix 4 is quite common, allowing you to get 2 bits of result
(2 binary digits equal one radix-4 digit) on each iteration.

You should view this text with a fixed-width font (like Courier) to
get the examples to line up properly.

Let's try dividing 14 by 3.  14 in binary is 1110.  3 in binary is 11.

Restoring Division:

P: 0000 1110
D: 0011
result: xxxx

	Iteration 1:
	Shift P: 0001 1100
	Digit Selection:
		Try subtracting: P-D = 1110 1100.
		Oops, negative.  Add D back in to restore P to previous value.
		Selected digit d=0.
	Shift the 0 into result.

P: 0001 1100
D: 0011
result: xxx0

	Iteration 2:
	Shift P: 0011 1000
	Digit Selection:
		Try subtracting: P-D = 0000 1000.
		Positive.  Do not add D back into P.
		Selected digit d=1.
	Shift the 1 into result.

P: 0000 1000
D: 0011
result: xx01

	Iteration 3:
	Shift P: 0001 0000
	Digit Selection:
		Try subtracting: P-D = 1110 0000.
		Oops, negative.  Add D back in to restore P to previous value.
		Selected digit d=0.
	Shift the 0 into result.

P: 0001 0000
D: 0011
result: x010

	Iteration 4:
	Shift P: 0010 0000
	Digit Selection:
		Try subtracting: P-D = 1111 0000.
		Oops, negative.  Add D back in to restore P to previous value.
		Selected digit d=0.
	Shift the 0 into result.

P: 0010 0000
D: 0011
result: 0100

So, the result is 0100 or 4 in decimal, and the remainder is 0010 or 2 in
decimal.  And indeed 14 = 4*3 + 2.


Non-Restoring Division:

P: 0000 1110
D: 0011
result: 0000

	Iteration 1:
	Shift P: 0001 1100
	Digit Selection:
		P is positive, so we select d=1.
		Subtract d*D from P:  1110 1100
	Shift result and add d.

P: 1110 1100
D: 0011
result: 0001

	Iteration 2:
	Shift P: 1101 1000
	Digit Selection:
		P is negative, so we select d=-1.
		Subtract d*D from P (same as adding D to P):  0000 1000
	Shift result (0010) and add d (subtract 1, yielding 0001).

P: 0000 1000
D: 0011
result: 0001

	Iteration 3:
	Shift P: 0001 0000
	Digit Selection:
		P is positive, so we select d=1.
		Subtract d*D from P:  1110 0000
	Shift result (0010) and add d (0011).

P: 1110 0000
D: 0011
result: 0011

	Iteration 4:
	Shift P: 1100 0000
	Digit Selection:
		P is negative, so we select d=-1.
		Subtract d*D from P (same as adding D to P):  1111 0000
	Shift result (0110) and add d (subtract 1, yielding 0101).

So, the result is 0101 or 5 in decimal, and the remainder is 1111 or -1.
And indeed 14 = 5*3 - 1.  Note that non-restoring division can leave
a negative remainder.  If that's not acceptable, we need a final restoring
step (adding in D and decrementing result).  One thing to be careful about
with non-restoring division is you need to make sure you have enough
bits for P to hold a sign bit.  If D had been larger, we would have needed
an extra bit on P to avoid overflow.


SRT Division:

For binary (radix 2) SRT division, we can chose a quotient digit of -1, 0,
or +1 at each step:
	If 0 <= P < 2D, then it's OK to chose digit 1.
	If -D < P < D, then it's OK to chose digit 0.
	If -2D < P <= 0, then it's OK to chose digit -1.
(These limits are determined by making sure that P stays within bounds
on the next iteration (when P becomes 2(P-d*D)).)  Note that the regions
overlap.  This is good -- it means we have freedom to pick a digit even if
we don't know exactly what P and D are.  For binary SRT, it's particularly
easy -- we can get away with just looking at the sign bit of P.  If it's 0,
we know 0 <= P, so we can pick d=1.  If it's 1, we know P < 0, so we can
pick d=-1.  In other words, non-restoring division is a special case of
SRT division.

Things get more interesting if we try to make division run faster -- this
is where SRT shines with two tricks:  higher radix, and inexact computation
of P.  The higher radix idea is fairly straightforward:  for example, if we
change to radix 4, we can get 2 bits of quotient per clock cycle.  The digit
selection ranges from -2 to +2, with tighter requirements for which digit
we can pick.  Inexact computation of P requires a bit more explanation.

Recall that adding two numbers takes a fair amount of time and/or logic,
mainly because we need to propagate the carry.  Also, recall from class
the carry save adder, that very quickly and compactly adds three n-bit
numbers (say a, b, and cin) and generates the result as an n-bit sum
and an n-bit cout.  (The actual sum of the three numbers is sum+2*cout.)
So, if instead of keeping P as a single number, we keep it as the sum of
*two* numbers, we can use a carry save adder to add in d*D very quickly
at each iteration.  The problem is that for digit selection, we want to
know what the value of P is, but we can't compute P without doing a slow,
carry-propagate addition of the two numbers.  Aha!  With SRT division,
we only need an approximation of the value of P, so we can do the slow
addition on only enough of the highest-order bits of P as we need to
figure out which digit to select.

In a real implementation, there is a table indexed by the high-order bits
of P and D, which tells which digit to select.  Much of the design effort
is to determine exactly how many bits we need to check.  To keep things simple,
I'm going to index only off of the top bits of P.

Let's work an example.  We'll divide 14 by 3 again.  To better illustrate
the power of SRT division, though, I'm going normalize the division as
a floating-point division.  14 and 3 in decimal are 1110 and 11 in binary.
I'll normalize to divide 0.1110 by 1.1000, which is 7/8 divided by 3/2.
We'll have to normalize the result by 3 bit positions (or multiply by 8).

Since I'm too lazy to figure out exactly how many bits I need to look
at, let's just think at an abstract level.  Suppose I added the top few
bits of P together, and the high-order two bits are 00.  I *know*
the result is non-negative, regardless of what could have been the carry
from the lower bits (which I didn't add).  Therefore, I know I can select
d=1 as my next digit.  Suppose the high-order bits were 01.  Now, I don't
know whether the result is positive or negative (if there is supposed to
be carry to flip the sign bit).  However, I do know that -D < P < D,
so I can select d=0 as my next digit.  Similarly, if the high-order bits
were 10, I know the result is negative, so I can select d=-1.  


P: 000.1110
+: 000.0000
D: 001.1000
result: 0.

	Iteration 1:
	Shift both words of P:
		P: 001.1100
		+: 000.0000
	Add high order bits for P:
		   001
	Regardless of possible carry, I know P is positive.
	Therefore, can select d=1.  Result 0.1
	Subtract dD from P (same as adding -D):
		 P: 001.1100
		 +: 000.0000
		-D: 110.1000
		 ----------
		 S: 111.0100	(becomes new P)
		 C: 000.1000	(shifted, becomes new +)

P: 111.0100
+: 001.0000
D: 001.1000
result: 0.1

	Iteration 2:
	Shift both words of P:
		P: 110.1000
		+: 010.0000
	Add high order bits for P:
		   000
	Regardless of possible carry, I know P is positive.
	Therefore, can select d=1.  Result 0.11
	Subtract dD from P (same as adding -D):
		 P: 110.1000
		 +: 010.0000
		-D: 110.1000
		 ----------
		 S: 010.0000	(becomes new P)
		 C: 110.1000	(shifted, becomes new +)

P: 010.0000
+: 101.0000
D: 001.1000
result: 0.11

	Iteration 3:
	Shift both words of P:
		P: 100.0000
		+: 010.0000
	Add high order bits for P:
		   110
	Regardless of possible carry, I know P is negative.
	Therefore, can select d = -1.  Result 0.101
	Subtract dD from P (same as adding D):
		 P: 100.0000
		 +: 010.0000
		 0: 001.1000
		 ----------
		 S: 111.1000	(becomes new P)
		 C: 000.0000	(shifted, becomes new +)

P: 111.1000
+: 000.0000
D: 001.1000
result: 0.101

	Iteration 4:
	Shift both words of P:
		P: 111.0000
		+: 000.0000
	Add high order bits for P:
		   111
	Depending on carry, P could be positive or negative.
	However, regardless of possible carry, -D < P < D.
	Therefore, can select d=0.  Result 0.1010
	Subtract dD from P (same as adding 0):
		 P: 111.0000
		 +: 000.0000
		 0: 000.0000
		 ----------
		 S: 111.0000	(becomes new P)
		 C: 000.0000	(shifted, becomes new +)

P: 111.0000
+: 000.0000
D: 001.1000
result: 0.1010

	Iteration 5:
	Shift both words of P:
		P: 110.0000
		+: 000.0000
	Add high order bits for P:
		   110
	Regardless of possible carry, I know P is negative.
	Therefore, can select d = -1.  Result 0.10011
	Subtract dD from P (same as adding D):
		 P: 110.0000
		 +: 000.0000
		 D: 001.1000
		 ----------
		 S: 111.1000	(becomes new P)
		 C: 000.0000	(shifted, becomes new +)

P: 111.1000
+: 000.0000
D: 001.1000
result: 0.10011

Note that we've reached the same state as before Iteration 4.  Therefore,
from now on, we're going to select d=0 alternating with d=-1.

Iteration 6: d = 0  result: 0.100110
Iteration 7: d = -1 result: 0.1001011
Iteration 8: d = 0  result: 0.10010110
Iteration 9: d = -1 result: 0.100101011
...
result = 0.100101010...

Recall that we normalize the result by 3 decimal places, which gives
100.101010..., which is 4 plus 2/3, which is exactly the right result.