Newton's Method¶
In Newton's method, we minimize a quadratic approximation of the function at each iterate. Consider the problem
where f:\R^n\rightarrow \R is a twice continuously differentiable function. Given an iterate \vx_k, define \vx_{k+1} as the minimizer of the quadratic program given by
In the above equation, f_k := f(\vx_k). The minimizer of the quadratic program is well-defined if \nabla^2 f_k ≻ 0. In that case,
where \vd_k solves \nabla^2 f_k d = -\nabla f_k. The pure Newton's method is given by the following algorithm:
Pure Newton's method
Given: \vx_0, tol > 0 for k = 0, 1, 2,\dots
-
\vg_k ← ∇f(\vx_k) (compute gradient)
-
\mH_k ← ∇^2f(\vx_k) (compute Hessian)
-
\vd_k solves \mH_k \vd = -\vg_k (compute Newton step)
-
\vx_{k+1} \leftarrow \vx_k + \vd_k
-
STOP if \|\nabla f(\vx_{k+1})\|_2 \leq tol
Convergence of (pure) Newton's method¶
Convergence of (pure) Newton's method requires the Hessian at every iteration to be positive definite, i.e. \nabla^2f_k \succ 0 for all iterations k. This positive definiteness of the Hessian ensures that the Newton step is a descent direction since
However, positive definiteness is not sufficient for convergence, i.e. the pure Newton method may still diverge even if \nabla^2 f_k \succ 0. For example, consider the function f(x) = \sqrt{1+x^2}. The derivative of f is f'(x) = x(1+x^2)^{-\frac{1}{2}} and the second-derivative is f''(x) = (1+x^2)^{-\frac{3}{2}}. So the Newton iteration is x_{k+1} = x_k - \frac{f'(x_k)}{f''(x_k)} = -x_k^3. Note that
Lemma (Convergence of pure Newton's method) Suppose f:\R^n\rightarrow \R is twice continuously differentiable and
-
there exists an \epsilon >0 such that \nabla^2 f(x)\succeq \epsilon \mI for all \vx,
-
there exists some L>0 such that \|\nabla^2 f(\vx) -\nabla^2 f(\vy)\|_2 \leq L \|\vx-\vy\|_2 for all \vx and \vy.
If the minimizer \vx_* of f is unique, then the pure Newton iterations satisfies
In addition, if \|\vx_0 - \vx_*\|_2 \leq \frac{\epsilon}{L}, then each iterate k satisfies
Rates of convergence¶
Rates of convergence measures how fast a sequence ${\vx_k}_{k=0,1,2,\dots} converges to its limit (assuming limit exists). Suppose \vx_k \rightarrow \vx_*, i.e. \lim_{k\rightarrow \infty} \|\vx_k -\vx_*\| = 0.
-
Linear Convergence: There exists a number \mu \in (0,1) such that \lim_{k\rightarrow \infty} \frac{\|\vx_{k+1}-\vx_*\|}{\|\vx_k-\vx_*\|} = \mu. For example, consider the sequence x_k = 2^{-k}.
-
Sublinear Convergence: A sequence \{\vx_k\}_{k=0,1,2,\dots} converges sublinearly to \vx_* if \lim_{k\rightarrow \infty} \frac{\|\vx_{k+1}-\vx_*\|}{\|\vx_k-\vx_*\|} = 1. For example, consider the sequence x_k = \frac{1}{k+1}.
-
Superlinear Convergence: A sequence \{\vx_k\}_{k=0,1,2,\dots} converges superlinearly with order q to \vx_* if there exists a number M>0 such that \lim_{k\rightarrow \infty} \frac{\|\vx_{k+1}-\vx_*\|}{\|\vx_k-\vx_*\|^q} \leq M. For example, consider the sequence x_k = \left(\frac{1}{2}\right)^{2^k}. If q=2, we say the convergence of quadratic.