The quadratic formula and low-precision arithmetic

This article was first published on John D. Cook.

What could be interesting about the lowly quadratic formula? It’s a formula after all. You just stick numbers into it.

Well, there’s an interesting wrinkle. When the linear coefficient b is large relative to the other coefficients, the quadratic formula can give wrong results when implemented in floating point arithmetic.

Quadratic formula and loss of precision

The quadratic formula says that the roots of

ax^2 + bx + c = 0

are given by

x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}

That’s true, but let’s see what happens when we have ac = 1 and b = 108.

    from math import sqrt

    def quadratic(a, b, c):
        r = sqrt(b**2 - 4*a*c)
        return ((-b + r)/(2*a), (-b -r)/(2*a))

    print( quadratic(1, 1e8, 1) )

This returns

    (-7.450580596923828e-09, -100000000.0)

The first root is wrong by about 25%, though the second one is correct.

What happened? The quadratic equation violated the cardinal rule of numerical analysis: avoid subtracting nearly equal numbers. The more similar two numbers are, the more precision you can lose from subtracting them. In this case √(b² – 4ac) is very nearly equal to b.

If we ask Python to evaluate

    1e8 - sqrt(1e16-4)

we get 1.49e-8 when the correct answer would be 2.0e-8.

Improving precision for quadratic formula

The way to fix the problem is to rationalize the numerator of the quadratic formula by multiplying by 1 in the form

\frac{-b \mp \sqrt{b^2 - 4ac}}{-b \mp \sqrt{b^2 - 4ac}}

(The symbol ∓ is much less common than ±. It must means that if you take the the + sign in the quadratic formula, take the – sign above, and vice versa.)

When we multiply by the expression above and simplify we get

\frac{2c}{-b \mp \sqrt{b^2 - 4ac}}

Let’s code this up in Python and try it out.

    def quadratic2(a, b, c):
        r = sqrt(b**2 - 4*a*c)
        return (2*c/(-b - r), 2*c/(-b+r))

    print( quadratic2(1, 1e8, 1) )

This returns

    (-1e-08, -134217728.0)

So is our new quadratic equation better? It gives the right answer for the first root, exact to within machine precision. But now the second root is wrong by 34%. Why is the second root wrong? Same reason as before: we subtracted two nearly equal numbers!

The familiar version of the quadratic formula computes the larger root correctly, and the new version computes the smaller root correctly. Neither version is better overall. We’d be no better off or worse off always using the new quadratic formula than the old one. Each one is better when it avoids subtracting nearly equal numbers.

The solution is to use both quadratic formulas, using the appropriate one for the root you’re trying to calculate.

Low precision arithmetic

Is this a practical concern? Yes, and here’s why: Everything old is new again.

The possible inaccuracy in the quadratic formula was serious before double precision (64-bit floating point) arithmetic became common. And back in the day, engineers were more likely to be familiar with the alternate form of the quadratic formula. You can still run into quadratic equations that give you trouble even in double precision arithmetic, like the example above, but it’s less likely to be a problem when you have more bits at your disposal.

Now we’re interested in low-precision arithmetic again. CPUs have gotten much faster, but moving bits around in memory has not. Relative to CPU speed, memory manipulation has gotten slower. That means we need to be more concerned with memory management and less concerned about floating point speed.

Not only is memory juggling slower relative to CPU, it also takes more energy. According to Gustafson, reading 64 bits from DRAM requires 65 times as much energy as doing a floating point combined multiply-add because it takes place off-chip. The table below, from page 6 of Gustafson’s book, gives the details. Using lower precision floating point saves energy because more numbers can be read in from memory in the same number of bits. (Here pJ = picojoules.)

Operation  Energy consumed Where
Perform a 64-bit floating point multiply-add 64 pJ on-chip
Load or store 64 bits of register data 6 pJ on-chip
Read 64 bits from DRAM 4200 pJ off-chip

So we might be interested in low-precision arithmetic to save energy in a battery powered mobile device, or to save clock cycles in a server application manipulating a lot of data. This means that the numerical tricks that most people have forgotten about are relevant again.

By: John