Brownian Bridge™

Setup

The Calibration Problem

Model calibration is the process of finding parameter values $\theta \in \mathbb{R}^p$ such that a pricing model reproduces observed market prices. In quantitative finance, this typically means:

Given: market-implied vols $\hat{\sigma}_i$ for instruments $i = 1, \ldots, N$ (strikes $K_i$ , maturities $T_i$ , put/call flags).
Find: model parameters $\theta = (\theta_1, \ldots, \theta_p)$ minimising the weighted sum of squared residuals.

For the Heston model, $\theta = (\kappa, \bar{\nu}, \xi, \rho, \nu_0)$ (mean-reversion speed, long-run variance, vol-of-vol, correlation, initial variance). For SABR, $\theta = (\alpha, \beta, \rho, \nu)$ .

Why implied vols, not prices? Fitting prices across different maturities and strikes without normalisation gives excessive weight to long-dated options (higher absolute prices). Fitting implied vols gives roughly equal weight across the surface and aligns with how traders think about calibration quality.

Assumptions

The pricing function $\sigma_{\mathrm{model}}(K, T; \theta)$ is differentiable in $\theta$ (needed for the Jacobian).
$N \geq p$ : at least as many market instruments as model parameters (identified system). For Heston, $N = 30\text{–}100$ instruments is typical.
The loss surface has a local minimum near the initial guess $\theta_0$ . For most models, multiple local minima exist — global optimisation or multi-start is needed in production.

The Nonlinear Least-Squares Objective

Define residuals $r: \mathbb{R}^p \to \mathbb{R}^N$ :

$r_i(\theta) = w_i \bigl(\sigma_{\mathrm{model}}(K_i, T_i; \theta) - \hat{\sigma}_i\bigr), \qquad i = 1, \ldots, N,$

where $w_i > 0$ are weights (typically proportional to the liquidity or inverse bid-ask spread of instrument $i$ ). The objective is:

$F(\theta) = \frac{1}{2}\|r(\theta)\|^2 = \frac{1}{2}\sum_{i=1}^N r_i(\theta)^2.$

The factor $1/2$ is conventional and simplifies gradient expressions. The calibration seeks:

$\theta^* = \arg\min_{\theta \in \Theta} F(\theta),$

subject to box constraints $\Theta = \{\theta : \ell \leq \theta \leq u\}$ (e.g., $\kappa > 0$ , $\rho \in (-1, 1)$ ).

Theory: Levenberg-Marquardt

Gauss-Newton as the Foundation

The gradient and Hessian of $F$ are:

$\nabla F(\theta) = J(\theta)^\top r(\theta), \qquad \nabla^2 F(\theta) = J(\theta)^\top J(\theta) + \sum_{i=1}^N r_i(\theta) \nabla^2 r_i(\theta),$

where $J \in \mathbb{R}^{N \times p}$ is the Jacobian matrix with $J_{ij} = \partial r_i / \partial \theta_j$ .

The Gauss-Newton (GN) approximation drops the second-order term (the sum of $r_i \nabla^2 r_i$ ), giving:

$\nabla^2 F(\theta) \approx J^\top J.$

This is exact when the residuals $r_i$ are small (near the solution). The GN update solves:

$(J^\top J)\, \delta\theta = -J^\top r, \qquad \theta \leftarrow \theta + \delta\theta.$

GN converges quadratically when the residuals are small and the problem is well-conditioned, but fails when $J^\top J$ is singular or ill-conditioned (degenerate directions in parameter space).

The LM Damping Term

Levenberg-Marquardt adds a damping term $\lambda D$ to $J^\top J$ :

$(J^\top J + \lambda D)\, \delta\theta = -J^\top r,$

where $\lambda > 0$ is the damping parameter and $D$ is a diagonal matrix.

Two standard choices for $D$ :

Levenberg (1944): $D = I$ (identity). Moves in the steepest-descent direction when $\lambda$ is large.
Marquardt (1963): $D = \mathrm{diag}(J^\top J)$ . Scales each parameter direction by its gradient curvature, making the step invariant to parameter rescaling. This is the standard choice.

Interpolation between GN and gradient descent. When $\lambda \to 0$ : the equation becomes $J^\top J\, \delta\theta = -J^\top r$ , i.e., Gauss-Newton. When $\lambda \to \infty$ : the update becomes $\delta\theta \approx -(\lambda D)^{-1} J^\top r$ , a scaled steepest-descent step. LM adaptively chooses between these extremes.

Geometric Interpretation

The LM step minimises a quadratic approximation to $F$ within a trust region:

$\delta\theta_{\mathrm{LM}} = \arg\min_{\delta\theta} \left\{\tfrac{1}{2}\|J\delta\theta + r\|^2\right\} \quad \text{subject to} \quad \|\delta\theta\| \leq \Delta,$

for some trust-region radius $\Delta$ . The damping $\lambda$ is the Lagrange multiplier for the constraint. Large $\lambda$ corresponds to a small trust region (conservative step); small $\lambda$ allows a large step.

Damping Parameter Update Rule

A standard update rule (Marquardt 1963, as refined by Moré 1978):

Evaluate proposed step: $\delta\theta_{\mathrm{LM}} = -(J^\top J + \lambda D)^{-1} J^\top r$ .
Compute gain ratio: $\rho = \frac{F(\theta) - F(\theta + \delta\theta)}{\mathcal{L}(\theta) - \mathcal{L}(\theta + \delta\theta)},$ where $\mathcal{L}$ is the quadratic model: $\mathcal{L}(\theta + \delta) = F(\theta) + \delta^\top J^\top r + \tfrac{1}{2}\delta^\top J^\top J \delta$ .
Accept the step if $\rho > \rho_{\min}$ (e.g., $\rho_{\min} = 0.25$ ): set $\theta \leftarrow \theta + \delta\theta$ and decrease $\lambda$ (e.g., $\lambda \leftarrow \lambda/3$ ).
Reject the step if $\rho \leq \rho_{\min}$ : increase $\lambda$ (e.g., $\lambda \leftarrow \lambda \times 3$ ) and retry.

This self-tuning makes LM robust: it contracts to gradient descent near bad regions and expands to GN near the solution.

Jacobian Computation

The Jacobian $J_{ij} = \partial r_i / \partial \theta_j = w_i \, \partial \sigma_{\mathrm{model}}(K_i, T_i; \theta) / \partial \theta_j$ is the most expensive part of LM calibration.

Finite Differences (FD)

Forward differences (first-order accurate, order $O(h)$ ):

$J_{ij} \approx \frac{r_i(\theta + h e_j) - r_i(\theta)}{h}.$

Requires $p$ extra model evaluations per iteration (one per parameter direction). For Heston with $p = 5$ parameters and $N = 50$ instruments, this is $5 \times 50 = 250$ Fourier integrals per LM step.

Central differences (second-order accurate, order $O(h^2)$ ):

$J_{ij} \approx \frac{r_i(\theta + h e_j) - r_i(\theta - h e_j)}{2h}.$

Twice the cost, but significantly more accurate for the same step size $h$ . Step size selection: $h \approx \sqrt{\varepsilon_{\mathrm{mach}}} \approx 10^{-7}$ for forward, $h \approx \varepsilon_{\mathrm{mach}}^{1/3} \approx 10^{-5}$ for central.

Analytic Jacobian for Affine Models

For models with a closed-form characteristic function $\varphi_T(u; \theta)$ (e.g., Heston, Bates), the sensitivity of the model price to each parameter can be computed by differentiating under the Fourier integral:

Levenberg-Marquardt for Model Calibration