Next: Experimental comparison. Up: Multiplying. Previous: Grade-school multiplying.

There is a faster way to multiply, though,
caled the *divide-and-conquer* approach.
The introduction of the technique is attributed to a 1962 paper by
Karatsuba, and indeed it is sometimes called *Karatusba
multiplication*.

With divide-and-conquer multiplication, we split each of the numbers into two
halves, each with `n`/2 digits. I'll call the two numbers
we're trying to multiply `a` and `b`,
with the two halves of `a` being
`a`_{L} (the left or upper half)
and `a`_{R} (the right or lower half)
and the two halves of `b` being
`b`_{L} and `b`_{R}.

Basically, we can multiply these two numbers as follows.

That image is just a picture of the idea, but more formally, the derivation works as follows.a_{L}a_{R}x b_{L}b_{R}----------------- a_{L}b_{R}a_{R}b_{R}+ a_{L}b_{L}a_{R}b_{L}-------------------------- a_{L}b_{L}(a_{L}b_{R}+ a_{R}b_{L}) a_{R}b_{R}

Thus, in order to multiply a pair ofab = (a_{L}10^{n/2}+ a_{R}) (b_{L}10^{n/2}+ b_{R}) = a_{L}b_{L}10^{n}+ a_{L}b_{R}10^{n/2}+ a_{R}b_{L}10^{n/2}+ a_{R}b_{R}= a_{L}b_{L}10^{n}+ (a_{L}b_{R}+ a_{R}b_{L}) 10^{n/2}+ a_{R}b_{R}

That's fine as far as it goes, but it turns out that it's not far
enough: If you write down the recurrence and solve it, it turns out to
solve to `O`(`n`^{2}), which is what we had
from grade school. And this algorithm is much more complicated. Not a
very encouraging result.

But there turns out to be a very clever approach, we permits us to
reduce the number of `n`/2-digit multiplications from four to
*three*! This clever idea yields a better result.

The idea works as follows: We're trying to compute

What we'll do is compute the following three products using recursive calls.a_{L}b_{L}10^{n}+ (a_{L}b_{R}+ a_{R}b_{L}) 10^{n/2}+ a_{R}b_{R}

These have all the information that we want, since the following is true.x_{1}= a_{L}b_{L}x_{2}= a_{R}b_{R}x_{3}= (a_{L}+ a_{R}) (b_{L}+ b_{R})

And we already reason that this last is equal to the product ofx_{1}10^{n}+ (x_{3}- x_{1}- x_{2}) 10^{n/2}+ x_{2}= a_{L}b_{L}10^{n}+ ((a_{L}b_{L}+ a_{L}b_{R}+ a_{R}b_{L}+ a_{R}b_{R}) - a_{L}b_{L}- a_{R}b_{R}) 10^{n/2}+ a_{R}b_{R}= a_{L}b_{L}10^{n}+ (a_{L}b_{R}+ a_{R}b_{L}) 10^{n/2}+ a_{R}b_{R}

In pseudocode, I might write down the algorithm this way. (This is just pseudocode, even though it may look like Java. But the Java would be quite a bit more complex, without really giving you much of an idea.)

BigInteger multiply(BigInteger a, BigInteger b) { int n = max(number of digits in a, number of digits in b) if(n == 1) { return a.intValue() * b.intValue(); } else { BigInteger aR = bottom n/2 digits of a; BigInteger aL = top remaining digits of a; BigInteger bR = bottom n/2 digits of b; BigInteger bL = top remaining digits of b; BigInteger x1 = Multiply(aL, bL); BigInteger x2 = Multiply(aR, bR); BigInteger x3 = Multiply(aL + bL, aR + bR); return x1 * pow(10, n) + (x3 - x1 - x2) * pow(10, n / 2) + x2; } }

Let's do an actual multiplication to illustrate how this works. I'm going to draw a recursion tree, labeling the edges with the final values computed by each node of the tree.

We're going to do the same form of analysis we did for Mergesort: We'll draw a recursion tree, labeling each node with the amount of time consumed outside the recursive calls, and then we'll find the sum of all the nodes of the recursion tree to get the total time.

All of the calculations of divide-and-conquer multiplication take
`O`(`n`) time, except the three recursive
multiplications. The three multiplications are to `n`/2-digit
numbers. (Actually, the last multiplicaiton could be to
(`n`/2+1)-digit number, but this extra digit turns out not to
affect the analysis. It complicates the analysis considerably, and so
we'll ignore it.)

So here's the recursion tree. I'll label each node with the number of digits in each number of the pair being multiplied.

As with Mergesort, there are 1 + logn n/2 n/2 n/2 n/4 n/4 n/4 n/4 n/4 n/4 n/4 n/4 n/4 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 n/8 : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

The times for the first level sum to `n`;
the times for the second sum to (3/2)`n`;
for the third, (9/4)`n`;
for the fourth, (27/8)`n`;
and so on, each time going up by a factor of 3/2. The last level will
sum to (3/2)^{log2 n} `n`.

We're going to simplify the total over all levels.

This last is a finite geometric series, of the form 1 +3 3 2 3 3 3 log2 n n + --- n + (---) n + (---) n + ... + (---) n 2 2 2 2 3 3 2 3 3 3 log2 n = n ( 1 + --- + (---) + (---) + ... + (---) ) 2 2 2 2

We're going to apply this fact now to further simplify our expression for the total between the levels, usingk+1 2 3 k r - 1 1 + r + r + r + ... + r = -------- r - 1

This final answer is quite a bit better than the1+log2 n (3/2) - 1 ( 3 3 log2 n ) n ------------------- = 2 n ( --- (---) - 1 ) 3/2 - 1 ( 2 2 ) log2 n 3 log2 n 3 log2 n = 3 n (---) - 2 n = 3 n --------- - 2n = 3 3 - 2n 2 log2 n 2 log2 3 1.585 = 3 n - 2n = O(n )

(The last step, where we replaced
3^{log2 n} with `n`^{log2 3}, comes
from an identity I know that
`x`^{logb y} =
`y`^{logb x}.
It can be proven with the following reasoning.

The first step follows from the identity that(logb x) (logb y) = (logb y) (logb x) logb x logb y logb y = logb x logb x logb y y = x

(Incidentally, divide-and-conquer multiplication isn't the fastest known
multiplication algorithm. The best known result is an
`O`(`n` log `n`) algorithm. It's quite a bit
more complicated, and I'm not sure if it's practical for reasonably
sized numbers.)

It happens that I've written one of the faster divide-and-conquer multipliers
out there. Implementing algorithms *quickly* is a bit of a trick.
There are three tricks that my implemantion uses.

- In recurring, the base case is actually when there are 8 digits,
at which the code breaks down and does grade-school multiplication.
This is because the grade-school technique actually is faster for small
numbers - the big-O analysis only applies to really big values of
*n*. The choice of 8 for the number of digits is an experimentally determined value; I tried several values, and 8 was fastest on my computer. - Carries are deferred until after the multiplication is complete.
So, actually, as it's doing the multiplication, some of the ``digits''
are more than 10. It's still a base-10 representation, but the algorithm
doesn't need the digits to be between 0 and 9. Thus, the implementation
uses CC to represent the number 12x10 + 12 = 132.
This saves in two respects: First, it means that we avoid extra, unnecessary computation while the recursion is going on, saving all the carrying to one quick pass at the end. (It would amount to several passes if we did the carry as we go along.) And second, it means we don't have to worry about the annoying extra digit that can pop up in computing

`x`_{3}. - A simple implementation of divide-and-conquer multiplication would often allocate new
arrays, basically for each recursive call. Memory allocation can get
pretty complex, however, and sometimes takes quite a while.
My implementation plays some pretty heavy tricks to avoid allocating
memory. In particular, it allocates 6
`n`array entries at the very beginning, before starting the recursion, and it aggressively reuses these array entries throughout the recursion process. This saves the overhead of continually allocating arrays.

Next: Experimental comparison. Up: Multiplying. Previous: Grade-school multiplying.