Deepayan Sarkar
Abstract problem: compute \(f: X \to Y\)
\(X\) and \(Y\) are normed vector spaces, usually \(\mathbb{R}^k\) for some \(k\)
\(f\) is referred to as the “problem”, and is usually continuous
We are interested in the behaviour of the problem at a particular “instance” \(x \in X\)
A problem instance \(f(x)\) is
well-conditioned if small perturbations in \(x\) lead to only small changes in \(f(x)\)
ill-conditioned if small perturbations in \(x\) can lead to large changes in \(f(x)\)
Depending on context, “small” and “large” may be either absolute or relative change
Consider a small perturbation \(\delta x\) in \(x\)
Define the change in \(f\) to be \(\delta f = f(x + \delta x) - f(x)\)
The absolute condition number \(\hat{\kappa} = \hat{\kappa}(x)\) of the problem \(f\) at \(x\) is
\[ \hat{\kappa}(x) = \lim_{h \to 0} \sup_{\lVert \delta x \rVert \leq h} \frac{\lVert \delta f \rVert}{\lVert \delta x \rVert} \]
\[ \hat{\kappa}(x) = \sup_{\delta x} \frac{\lVert \delta f \rVert}{\lVert \delta x \rVert} \]
\[ \hat{\kappa}(x) = \lVert J(x) \rVert \]
Here \(\lVert J(x) \rVert\) represents a “matrix norm” induced by a vector norm (on \(\mathbb{R}^k\))
Definition: For \(A_{m \times n}\), the matrix norm induced by vector norms on \(\mathbb{R}^m\) and \(\mathbb{R}^n\) is
\[ \lVert A \rVert = \sup \left\lbrace \frac{\lVert A x \rVert}{\lVert x \rVert} : x \in \mathbb{R}^n, x \neq 0 \right\rbrace \]
\[ \delta f = f(x + \delta x) - f(x) \approx J(x) \delta x \implies \sup_{\delta x} \frac{\lVert \delta f \rVert}{\lVert \delta x \rVert} \approx \sup_{\delta x} \frac{\lVert J(x) \delta x \rVert}{\lVert \delta x \rVert} \]
The nature of floating point computations makes it more important to study relative changes
The relative condition number \(\kappa = \kappa(x)\) of \(f\) at \(x\) is
\[ \kappa(x) = \lim_{h \to 0} \sup_{\lVert \delta x \rVert\leq h} \left( \frac{\lVert \delta f \rVert}{\lVert f(x) \rVert} \middle/ \frac{\lVert \delta x \rVert}{\lVert x \rVert} \right) = \frac{\lVert x \rVert}{\lVert f(x) \rVert} \lim_{h \to 0} \sup_{\lVert \delta x \rVert \leq h} \frac{\lVert \delta f \rVert}{\lVert \delta x \rVert} \]
\[ \kappa(x) = \frac{\lVert J(x) \rVert \cdot \lVert x \rVert }{\lVert f(x) \rVert} = \left\lvert \frac{ f^{\prime}(x) x }{f(x)} \right\rvert \]
\(f(x) = \sqrt{x}, x \geq 0\)
\(f'(x) = \frac{1}{2} x^{-\frac{1}{2}}\)
So the condition of \(f\) at \(x\) is \[ \left\lvert \frac{f'(x) x}{f(x)} \right\rvert = \frac{1}{2} \frac{x^{-\frac{1}{2}}}{x^\frac{1}{2}} x = \frac{1}{2} \]
So \(f\) is well-conditioned for all \(x\).
\(f(x) = x^\alpha\)
\(f(x) = \frac{1}{1-x^2}\)
\(f'(x) = 2x (1-x^2)^{-2}\)
So condition of \(f\) at \(x\) is \[ \left\lvert \frac{f'(x) x}{f(x)} \right\rvert = \lvert 2x (1-x^2)^{-2} x (1-x^2) \rvert = \frac{2x^2}{|1-x^2|} \]
Can be large for \(x\) close to \(\pm1\).
\(f(x_1, x_2) = x_1 - x_2\)
The Jacobian of \(f\) is \(J = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \frac{\partial f}{\partial x_2} \end{bmatrix} = \begin{bmatrix} 1 & -1 \end{bmatrix}\)
So \(\kappa = \frac{\lVert J \rVert \cdot \lVert x \rVert}{\lvert x_1 - x_2 \rvert}\)
What is \(\lVert x \rVert\)? Common choices are
\(L_1\): \(\lvert x_1 \rvert + \lvert x_2 \rvert\)
\(L_2\): \(\sqrt{x_1^2 + x_2^2}\)
\(L_{\infty}\): \(\max \{ \lvert x_1 \rvert, \lvert x_2 \rvert \}\)
What is \(\Vert J \rVert\)? Depends on vector norm, but some constant \(c\) for this \(J\) regardless of choice
So \(\kappa\) is, with the \(L_{\infty}\) norm, \(\kappa = \frac{c \max\{ \lvert x_1 \rvert, \lvert x_2 \rvert \}}{\lvert x_1 - x_2 \rvert}\)
Ill-conditioned when \(x_1 \approx x_2\)
\[f(a, b, c) = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}\]
Exercise: Show that for \(x^2 - 2x + 1 = (x-1)^2 = 0\), \(f(1, -2, 1)\) has \(\kappa = \infty\)
Hint: try perturbing one coefficient at a time
Graphical demonstration:
par(mfrow = c(1, 3))
for (eps in c(0.1, 0.01, 0.001))
{
roots <- replicate(1000, qroot(abc + eps * runif(3, -1, 1)))
plot(roots, pch = ".", cex = 3, las = 1)
rect(1-eps, -eps, 1+eps, eps, col = "#FF000044")
}
\[ \text{significand} \times \text{base}^{\text{exponent}} \]
\[ \mathbb{F}= \{ 0 \} \cup \left\{ \pm \frac{m}{2^t} \times 2^e : e \in \mathbb{Z} \text{ and $m$ integer with } 1 \leq m \leq 2^t \right\} \]
Here the integer \(t\) is the precision of the representation (usually 24 or 53)
\(e\) can be an arbitrary integer, so there is no “overflow” or “underflow” (\(\mathbb{F}= 2 \mathbb{F}\))
This is still a useful formal model for the subset of \(\mathbb{R}\) that has a floating point representation
For example, with \(t=53\),
\[\begin{eqnarray*} \mathbb{F}\cap [1,2] &=& \{ 1, 1 + 2^{-52}, 1 + 2 \times 2^{-52}, 1 + 3 \times 2^{-52}, ..., 2 \}, \\ \mathbb{F}\cap [2,4] &=& \{ 2, 2 + 2^{-51}, 2 + 2 \times 2^{-51}, 2 + 3 \times 2^{-51}, ..., 4 \}, \text{etc.} \\ \end{eqnarray*}\]
The resolution of \(\mathbb{F}\) is quantified by a number known as machine epsilon, \(\epsilon_m\)
Let us tentatively define \(\epsilon_m\) to be half the distance between \(1\) and the next larger number in \(\mathbb{F}\)
Clearly, \(\epsilon_m = \frac12 \times 0.000\cdots0001 = \frac12 \times 2^{t-1} = 2^{-t}\), and has the following property:
For all \(x \in \mathbb{R}\), there exists \(x^* \in \mathbb{F}\) such that \(\lvert x - x^* \rvert \leq \epsilon_m \cdot \lvert x \rvert\)
For \(t = 24\) (Float32), \(\epsilon_m = 2^{-24} \approx 6 \times 10^{-8}\)
For \(t = 53\) (Float64), \(\epsilon_m = 2^{-53} \approx 1.1 \times 10^{-16}\)
For any \(x \in \mathbb{R}\), define \(\textrm{fl}(x)\) to be the element in \(\mathbb{F}\) closest to \(x\)
Then, a restatement of the above property is
For all \(x \in \mathbb{R}\), there exists \(\epsilon\) with \(\lvert \epsilon \rvert \leq \epsilon_m\) such that \(\textrm{fl}(x) = x (1 + \epsilon)\)
Consider the elementary arithmetic operations \(+, -, \times, \div\)
How should we expect these to behave on \(\mathbb{F}\)?
Let \(*\) denote one of these elementary operations, and \(\circledast\) denote the corresponding operation on \(\mathbb{F}\)
Then we would ideally want, for \(x, y \in \mathbb{F}\),
\[x \circledast y = \textrm{fl}( x * y)\]
For all \(x, y \in \mathbb{F}\), there exists \(\epsilon\) with \(\lvert \epsilon \rvert \leq \epsilon_m\) such that \(x \circledast y = (x * y) (1 + \epsilon)\)
In practice, this may not hold for the theoretical \(\epsilon_m\), but only for some larger value
The smallest \(\epsilon_m\) for which this is guaranteed (on a given machine) is defined to be the machine epsilon
Suppose we want to solve a problem \(f: X \to Y\)
There can be multiple algorithms to calculate a candidate solution
Let \(\tilde{f}: X \to Y\) be the actual implementation of an algorithm to solve \(f\)
At a minimum, this will involve the approximation of \(x\) by \(\textrm{fl}(x)\)
In practice, suppose we want to calculate \(f(x)\), and actually compute \(\tilde{f}(x)\)
The relative error is
\[ \frac{ \lVert \tilde{f}(x) - f(x) \rVert}{\lVert f(x) \rVert} \]
Recall that \(\textrm{fl}(x) \approx x (1 + \epsilon_m) \implies \frac{\lVert \textrm{fl}(x) - x\rVert}{\lVert x \rVert} \approx \epsilon_m\)
If \(\kappa= \kappa(x)\) is the relative condition number of \(f(x)\), we expect (note: for \(f\), not \(\tilde{f}\))
\[ \frac{ \lVert f(\textrm{fl}(x)) - f(x) \rVert}{\lVert f(x) \rVert} \approx \kappa \frac{\lVert \textrm{fl}(x) - x\rVert}{\lVert x \rVert} \approx \kappa \epsilon_m \]
\[ \frac{ \lVert \tilde{f}(x) - f(x) \rVert}{\lVert f(x) \rVert} \approx \kappa \epsilon_m \]
Instability arises due to ill-conditioned intermediate steps in an algorithm \(\tilde{f}\)
The basic idea is to compare the (inherent) condition of \(f(x)\) with the conditions of intermediate steps
Badly conditioned intermediate steps make the process unstable.
To make the idea concrete, consider the problem: \(f(x) = \sqrt{x+1} - \sqrt{x}, x > 0\)
It is easily seen that the condition of \(f\) at \(x\) is \(\frac{1}{2} \frac{x}{\sqrt{x+1} \sqrt{x}} \approx \frac{1}{2}\) when \(x\) is large
A possible algorithm \(\tilde{f}\), directly using the definition, will proceed as follows
\(x_0 = x\)
\(x_1 = x_0 + 1\)
\(x_2 = \sqrt{x_1}\)
\(x_3 = \sqrt{x_0}\)
\(x_4 = x_2 - x_3\)
In general, suppose \(y = \tilde{f}(x)\) is computed in \(n\) steps
Let \(x_i\) be the output of the \(i\)th step (define \(x_0 = x\))
Then \(y = \tilde{f}(x) = x_n\) can also be viewed as a function of each of the intermediate \(x_i\)s
Denote the \(i\)th such function by \(\tilde{f_i}\), such that \(y = \tilde{f_i} (x_i)\)
In particular, \(\tilde{f}_0 = \tilde{f}\)
Then the instability in the total computation is dominated by the most ill-conditioned \(\tilde{f_i}\)
For the \(\tilde{f}\) given above, we have
\(\tilde{f}(t) = \sqrt{t+1} - \sqrt{t}\)
\(x_0 = x \implies \tilde{f_0}(t) = \sqrt{t+1} - \sqrt{t}\)
\(x_1 = x_0 + 1 \implies \tilde{f_1}(t) = \sqrt{t} - \sqrt{x_0}\)
\(x_2 = \sqrt{x_1} \implies \tilde{f_2}(t) = t - \sqrt{x_0}\)
\(x_3 = \sqrt{x_0} \implies \tilde{f_3}(t) = x_2 - t\)
\[ \left\lvert \frac{\tilde{f_3}^\prime(t) \, t}{\tilde{f_3}(t)} \right\rvert = \left\lvert \frac{t}{x_2 - t} \right\rvert \]
[1] 20.48809 200.49876 2000.49988 20000.49999
An alternative formula for \(f\) is \(f(x) = \frac{1}{\sqrt{x+1} + \sqrt{x}}\)
An algorithm based on this formula would proceed as
\(x_0 = x \implies \tilde{f_0}(t) = \frac{1}{\sqrt{t+1} + \sqrt{t}}\)
\(x_1 = x_0 + 1 \implies \tilde{f_1}(t) = \frac{1}{\sqrt{t} + \sqrt{x_0}}\)
\(x_2 = \sqrt{x_1} \implies \tilde{f_2}(t) = \frac{1}{t + \sqrt{x_0}}\)
\(x_3 = \sqrt{x_0} \implies \tilde{f_3}(t) = \frac{1}{x_2 + t}\)
\(x_4 = x_2 + x_3 \implies \tilde{f_4}(t) = \frac{1}{t}\)
\(x_5 = 1 / x_4\)
Exercise: All these have good condition when \(t\) is large