Brownian Bridge™

Setup

Direct solvers — LU factorisation (Module 2) and Cholesky (Module 2) — cost $O(n^3)$ flops and require $O(n^2)$ storage. For a Crank-Nicolson finite difference grid with $N = 10^4$ spatial nodes, forming and factorising the full stiffness matrix is $10^{12}$ flops and 800 MB of storage — impractical. For a 2-factor interest rate PDE on a $200 \times 200$ grid, the system is $n = 4 \times 10^4$ — far beyond direct methods.

Iterative solvers exploit the structure of the system: they require only matrix-vector products $Ap$ , never the factorisation. For sparse matrices (finite difference stencils, sparse covariance models), each $Ap$ costs $O(n)$ rather than $O(n^2)$ . The cost of the solve is $O(n \cdot k_\text{iter})$ where $k_\text{iter}$ depends on the condition number.

Where iterative solvers appear in quant finance:

PDE pricing (finite differences). Black-Scholes and Heston PDEs on dense grids produce large tridiagonal or banded systems at each time step. For 1D grids the Thomas algorithm (a direct tridiagonal solver, $O(n)$ ) is standard; for 2D grids, alternating direction implicit (ADI) splits into 1D tridiagonals. For 3D or coupled systems, iterative solvers (PCG) are used.
Yield curve calibration. Bootstrapping a swap curve with $n$ pillars gives an $n \times n$ lower-triangular system (direct, trivial). Fitting a regularised Nelson-Siegel-Svensson model to $m \gg n$ market quotes leads to a normal-equations solve $(A^\top A + \lambda I)x = A^\top b$ — either direct (via Cholesky) or iterative (CG) depending on $n$ .
Portfolio optimisation. Mean-variance with $n = 500$ assets and $k$ linear constraints gives a KKT system of dimension $n + k \approx 600$ . Cholesky is fine here. But factor-model covariance $\Sigma = F F^\top + D$ (low-rank plus diagonal) inverts analytically via Woodbury — no solver needed.
Large-scale calibration of stochastic vol models. Iterative refinement is used when the Jacobian is large and only approximately known (numerical differentiation), making direct factorisation unreliable.

Mathematical setting. We focus on three cases:

SPD system $Ax = b$ with $A$ sparse and symmetric positive definite: solved by Conjugate Gradient (CG).
Non-symmetric system: GMRES (Generalised Minimal RESidual).
Overdetermined LS $\min \|Ax - b\|_2$ : LSQR.

Conventions. $A \in \mathbb{R}^{n \times n}$ SPD unless stated. $r_k = b - Ax_k$ is the residual at step $k$ . $e_k = x^* - x_k$ is the error. $\|v\|_A = \sqrt{v^\top A v}$ is the A-norm (energy norm).

INSIGHT

Financial Insight. In production calibration at a bank, iterative solvers are used not just for speed but for robustness: they can be stopped early (before convergence) when the residual is below market-quote precision. Solving to $10^{-10}$ accuracy when quotes are only known to $10^{-4}$ wastes computation. Early stopping is equivalent to a form of regularisation — an important practical insight.

Theory

1. Quadratic Minimisation and the CG Motivation

For $A$ SPD, solving $Ax = b$ is equivalent to minimising the strictly convex quadratic: $f(x) = \frac{1}{2} x^\top A x - b^\top x.$ The gradient is $\nabla f(x) = Ax - b = -r(x)$ . The minimum is at $x^* = A^{-1}b$ .

Steepest descent. The simplest iterative approach: move in the direction of the residual $r_k = b - Ax_k$ by the optimal step size:

DEFINITION

Definition 5.1 (Steepest Descent). Given $x_0$ , iterate: $\alpha_k = \frac{r_k^\top r_k}{r_k^\top A r_k}, \qquad x_{k+1} = x_k + \alpha_k r_k.$ The step $\alpha_k$ is chosen to minimise $f(x_k + \alpha r_k)$ over $\alpha \in \mathbb{R}$ .

Convergence of steepest descent:

THEOREM

Theorem 5.1. Let $\kappa = \kappa_2(A) = \lambda_\text{max}/\lambda_\text{min}$ . Steepest descent satisfies: $\|e_{k+1}\|_A \leq \left(\frac{\kappa - 1}{\kappa + 1}\right)^2 \|e_k\|_A.$ For ill-conditioned problems ( $\kappa \gg 1$ ), the convergence factor approaches 1 — very slow.

The fundamental problem with steepest descent: consecutive search directions $r_{k+1} \perp r_k$ in the Euclidean inner product, but they are not $A$ -orthogonal. The method revisits the same subspaces, causing "zigzag" convergence along narrow valleys of $f$ when eigenvalues are spread widely.

2. Conjugate Gradient Method

The CG method eliminates the redundant search directions by enforcing A-conjugacy:

DEFINITION

Definition 5.2 (A-conjugate directions). Vectors $p, q$ are A-conjugate (A-orthogonal) if $p^\top A q = 0$ . A set $\{p_0, \ldots, p_{k-1}\}$ is A-conjugate if $p_i^\top A p_j = 0$ for all $i \neq j$ .

THEOREM

Theorem 5.2 (CG Algorithm — Hestenes-Stiefel, 1952). Starting from $x_0$ , $r_0 = b - Ax_0$ , $p_0 = r_0$ , iterate: $\alpha_k = \frac{r_k^\top r_k}{p_k^\top A p_k}, \quad x_{k+1} = x_k + \alpha_k p_k, \quad r_{k+1} = r_k - \alpha_k A p_k,$ $\beta_k = \frac{r_{k+1}^\top r_{k+1}}{r_k^\top r_k}, \quad p_{k+1} = r_{k+1} + \beta_k p_k.$

Key properties (proofs by induction on the Krylov subspace structure):

$r_i \perp r_j$ for $i \neq j$ (residuals are mutually orthogonal in the Euclidean inner product).
$p_i^\top A p_j = 0$ for $i \neq j$ (search directions are A-conjugate).
$x_k$ minimises $f(x) = \frac{1}{2}x^\top Ax - b^\top x$ over the Krylov subspace $\mathcal{K}_k = \text{span}\{r_0, Ar_0, \ldots, A^{k-1}r_0\}$ .
In exact arithmetic, CG terminates in at most $n$ steps with the exact solution.

CG convergence rate:

THEOREM

Theorem 5.3 (CG convergence). The CG iterates satisfy: $\|e_k\|_A \leq 2\left(\frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1}\right)^k \|e_0\|_A.$ For $\kappa = 100$ : CG convergence factor $\approx 0.82$ ; steepest descent factor $\approx 0.96$ .

Comparison: CG depends on $\sqrt{\kappa}$ rather than $\kappa$ . For $\kappa = 10^4$ (typical calibration matrix):

Steepest descent: $\approx 10^4$ iterations to reduce error by $10^{-4}$ .
CG: $\approx 100$ iterations (100× faster).

EXAMPLE

Example 5.1 (Tridiagonal FD system). A Crank-Nicolson price step for a $n = 500$ node grid gives a tridiagonal SPD matrix $A$ with $\kappa \approx 4n^2/\pi^2 \approx 10^5$ (for the standard 1D heat equation stencil). CG converges in $\approx \sqrt{10^5} \approx 316$ iterations to machine precision — competitive with the Thomas algorithm ( $O(n) = O(500)$ steps). For 2D ADI, the subproblems are 1D tridiagonals, so Thomas algorithm dominates.

Iterative Solvers for Calibration Problems

Setup

Theory

1. Quadratic Minimisation and the CG Motivation

2. Conjugate Gradient Method

The full lesson requires Premium