Drawbacks of Newton-Raphson approximation with approximate numerical derivative

Question

Suppose I have some function $f$ and I want to find $x$ such that $f(x)approx 0$. I might use the Newton-Raphson method. But this requires that I know the derivative function $f'(x)$. An analytic expression for $f$ may be unavailable.  For example, $f$ may be defined by a complicated piece of computer code that consults a database of experimental values.

But even if $f'$ is complicated, I can approximate $f'(a)$ for any particular $a$ by choosing a small number $epsilon$ and calculting $f'(a) approx {f(a+epsilon) - f(a)overepsilon}$.

I have heard that there are distinct disadvantages to this approach, but I don't know what they are. Wikipedia hints that "Using this approximation would result in something like the secant method whose convergence is slower than that of Newton's method."

Can someone please elaborate on this, and provide a reference that particularly discusses the problems with this technique?

Geoff Oxberry · Accepted Answer

For the sake of notation, let's suppose that $f: mathbb{R}^{n} rightarrow mathbb{R}^{n}$ (i.e., it's a vector-valued function that takes a vector as input and outputs a vector of the same size). There are two concerns: computational cost and numerical accuracy.

Calculating the derivative $mathrm{D}f(x)$ (the Jacobian matrix, $J(x)$, or $(nabla f(x))^{T}$, or whatever you prefer) using finite differences is going to require $n$ function evaluations. If you could calculate the derivative using floating point arithmetic directly from the definition, you would have to calculate the difference quotient

begin{align}
mathrm{D}f(x)e_{i} = lim_{varepsilon rightarrow 0} frac{f(x + varepsilon e_{i}) - f(x)}{varepsilon}
end{align}

for each $i = 1, ldots, n$, assuming you don't do any sort of "smart finite differencing" (like Curtis-Powell-Reid) because you know (or can detect) the sparsity pattern of $mathrm{D}f$. If $n$ is large, that could be a lot of function evaluations. If you have an analytical expression for $mathrm{D}f$, then calculating it could be cheaper. Automatic (also known as algorithmic) differentiation methods can also be used in some cases to calculate $mathrm{D}f$ at roughly 3 to 5 times the cost of a function evaluation.

There are also numerical concerns. Obviously, on a computer, we can't take the limit of a scalar as it goes to zero, so when we approximate $mathrm{D}f$, we're really picking $varepsilon$ to be "small" and calculating

begin{align}
mathrm{D}f(x)e_{i} approx frac{f(x + varepsilon e_{i}) - f(x)}{varepsilon},
end{align}

where $approx$ means it's an approximation, and we hope it's a really good approximation. Calculating this approximation in floating point arithmetic is tough because if you pick $varepsilon$ too large, your approximation could be bad, but if you pick $varepsilon$ too small, there could be significant rounding error. These effects are covered in the Wikipedia article on numerical differentiation in superficial detail; more detailed references can be found within the article.

If the error in the Jacobian matrix $mathrm{D}f$ isn't too large, Newton-Raphson iterations will converge. For a detailed theoretical analysis, see Chapter 25 of Accuracy and Stability of Numerical Algorithms by Nick Higham, or the paper by Françoise Tisseur on which it is based.

Libraries generally take care of these algorithmic details for you, and usually, library implementations of the Newton-Raphson algorithm (or variants thereof) will converge quite nicely, but every so often, there will be a problem that causes some trouble due to the drawbacks above. In the scalar case $(n = 1)$, I'd use Brent's method, owing to its robustness and good convergence rate in practice.

Simply Beautiful Art · Answer

In the 1-dimensional case, there are three main methods which are similar to what you are describing which come to my mind.
The secant method uses the derivative approximation
$$f'(x_n)approxfrac{f(x_n)-f(x_{n-1})}{x_n-x_{n-1}}$$
to compute the next iteration. As André Nicolas points out, the secant method is quite desirable for its fast convergence and fast evaluations, especially when evaluating more functions (either $f'(x_n)$ or $f(x_n+epsilon)$) may be very slow.
Steffensen's method is another method using derivative approximations. It computes $f(x_n+epsilon)$ for $epsilon=f(x_n)$ on each iteration to approximate the derivative. Since $f(x_n)to0$ is expected (i.e. a root is approaches), the accuracy of this derivative approximation improves as the root is approached, leading to the same asymptotic speed as Newton's method (with or without the consideration of multiple function evaluations per iteration).
More powerful extensions of the secant method are known. Sidi's methods provide generalizations of the secant method based on the following premise:
The secant method approximates $f'(x_n)$ by the slope of the line going through $x_n$ and $x_{n-1}$. We can extend this by approximating $f'(x_n)$ by the derivative of the quadratic (or $k$ degree polynomial) going through $x_n$, $x_{n-1}$, and $x_{n-2}$ (or up to $x_{n-k}$).
These methods recover more of the asymptotic speed of Newton's method while still remaining true to the advantages of the secant method, requiring no additional function evaluations per iteration.

Overall, the main disadvantage is reduced speed per iteration. Whether or not these methods are faster than Newton's method depends fully on the nature of how the derivative is provided. If the derivative is significantly easier to evaluate than $f$ itself, Newton's method may be more desirable. Otherwise, Newton's method may be much slower than the above methods.
Consider for example, using Newton's method to find the root of the solution to the differential equation $dot y=g(t)y$. In this instance, it can be easily seen that $y/dot y=1/g(t)$, which makes Newton's method extremely favorable for this problem.
In most problems, however, I tend to find the derivative unprovided. Since at worst the derivative-approximation type methods are only a constant factor slower than Newton's method e.g. secant method is roughly 50% slower than Newton's method, I prefer these methods.

Drawbacks of Newton-Raphson approximation with approximate numerical derivative

2 Answers

Add your own answers!

Ask a Question