Brownian Bridge™

Setup

The Calibration Problem

Model calibration is the process of finding parameter values $\theta \in \mathbb{R}^p$ such that a pricing model reproduces observed market prices. In quantitative finance, this typically means:

Given: market-implied vols $\hat{\sigma}_i$ for instruments $i = 1, \ldots, N$ (strikes $K_i$ , maturities $T_i$ , put/call flags).
Find: model parameters $\theta = (\theta_1, \ldots, \theta_p)$ minimising the weighted sum of squared residuals.

For the Heston model, $\theta = (\kappa, \bar{\nu}, \xi, \rho, \nu_0)$ (mean-reversion speed, long-run variance, vol-of-vol, correlation, initial variance). For SABR, $\theta = (\alpha, \beta, \rho, \nu)$ .

Why implied vols, not prices? Fitting prices across different maturities and strikes without normalisation gives excessive weight to long-dated options (higher absolute prices). Fitting implied vols gives roughly equal weight across the surface and aligns with how traders think about calibration quality.

Assumptions

The pricing function $\sigma_{\mathrm{model}}(K, T; \theta)$ is differentiable in $\theta$ (needed for the Jacobian).
$N \geq p$ : at least as many market instruments as model parameters (identified system). For Heston, $N = 30\text{–}100$ instruments is typical.
The loss surface has a local minimum near the initial guess $\theta_0$ . For most models, multiple local minima exist — global optimisation or multi-start is needed in production.

The Nonlinear Least-Squares Objective

Define residuals $r: \mathbb{R}^p \to \mathbb{R}^N$ :

$r_i(\theta) = w_i \bigl(\sigma_{\mathrm{model}}(K_i, T_i; \theta) - \hat{\sigma}_i\bigr), \qquad i = 1, \ldots, N,$

where $w_i > 0$ are weights (typically proportional to the liquidity or inverse bid-ask spread of instrument $i$ ). The objective is:

$F(\theta) = \frac{1}{2}\|r(\theta)\|^2 = \frac{1}{2}\sum_{i=1}^N r_i(\theta)^2.$

The factor $1/2$ is conventional and simplifies gradient expressions. The calibration seeks:

$\theta^* = \arg\min_{\theta \in \Theta} F(\theta),$

subject to box constraints $\Theta = \{\theta : \ell \leq \theta \leq u\}$ (e.g., $\kappa > 0$ , $\rho \in (-1, 1)$ ).

Theory: Levenberg-Marquardt

Gauss-Newton as the Foundation

The gradient and Hessian of $F$ are:

$\nabla F(\theta) = J(\theta)^\top r(\theta), \qquad \nabla^2 F(\theta) = J(\theta)^\top J(\theta) + \sum_{i=1}^N r_i(\theta) \nabla^2 r_i(\theta),$

where $J \in \mathbb{R}^{N \times p}$ is the Jacobian matrix with $J_{ij} = \partial r_i / \partial \theta_j$ .

The Gauss-Newton (GN) approximation drops the second-order term (the sum of $r_i \nabla^2 r_i$ ), giving:

$\nabla^2 F(\theta) \approx J^\top J.$

This is exact when the residuals $r_i$ are small (near the solution). The GN update solves:

$(J^\top J)\, \delta\theta = -J^\top r, \qquad \theta \leftarrow \theta + \delta\theta.$

GN converges quadratically when the residuals are small and the problem is well-conditioned, but fails when $J^\top J$ is singular or ill-conditioned (degenerate directions in parameter space).

The LM Damping Term

Levenberg-Marquardt adds a damping term $\lambda D$ to $J^\top J$ :

$(J^\top J + \lambda D)\, \delta\theta = -J^\top r,$

where $\lambda > 0$ is the damping parameter and $D$ is a diagonal matrix.

Two standard choices for $D$ :

Levenberg (1944): $D = I$ (identity). Moves in the steepest-descent direction when $\lambda$ is large.
Marquardt (1963): $D = \mathrm{diag}(J^\top J)$ . Scales each parameter direction by its gradient curvature, making the step invariant to parameter rescaling. This is the standard choice.

Interpolation between GN and gradient descent. When $\lambda \to 0$ : the equation becomes $J^\top J\, \delta\theta = -J^\top r$ , i.e., Gauss-Newton. When $\lambda \to \infty$ : the update becomes $\delta\theta \approx -(\lambda D)^{-1} J^\top r$ , a scaled steepest-descent step. LM adaptively chooses between these extremes.

Geometric Interpretation

The LM step minimises a quadratic approximation to $F$ within a trust region:

$\delta\theta_{\mathrm{LM}} = \arg\min_{\delta\theta} \left\{\tfrac{1}{2}\|J\delta\theta + r\|^2\right\} \quad \text{subject to} \quad \|\delta\theta\| \leq \Delta,$

for some trust-region radius $\Delta$ . The damping $\lambda$ is the Lagrange multiplier for the constraint. Large $\lambda$ corresponds to a small trust region (conservative step); small $\lambda$ allows a large step.

Damping Parameter Update Rule

A standard update rule (Marquardt 1963, as refined by Moré 1978):

Evaluate proposed step: $\delta\theta_{\mathrm{LM}} = -(J^\top J + \lambda D)^{-1} J^\top r$ .
Compute gain ratio: $\rho = \frac{F(\theta) - F(\theta + \delta\theta)}{\mathcal{L}(\theta) - \mathcal{L}(\theta + \delta\theta)},$ where $\mathcal{L}$ is the quadratic model: $\mathcal{L}(\theta + \delta) = F(\theta) + \delta^\top J^\top r + \tfrac{1}{2}\delta^\top J^\top J \delta$ .
Accept the step if $\rho > \rho_{\min}$ (e.g., $\rho_{\min} = 0.25$ ): set $\theta \leftarrow \theta + \delta\theta$ and decrease $\lambda$ (e.g., $\lambda \leftarrow \lambda/3$ ).
Reject the step if $\rho \leq \rho_{\min}$ : increase $\lambda$ (e.g., $\lambda \leftarrow \lambda \times 3$ ) and retry.

This self-tuning makes LM robust: it contracts to gradient descent near bad regions and expands to GN near the solution.

Jacobian Computation

The Jacobian $J_{ij} = \partial r_i / \partial \theta_j = w_i \, \partial \sigma_{\mathrm{model}}(K_i, T_i; \theta) / \partial \theta_j$ is the most expensive part of LM calibration.

Finite Differences (FD)

Forward differences (first-order accurate, order $O(h)$ ):

$J_{ij} \approx \frac{r_i(\theta + h e_j) - r_i(\theta)}{h}.$

Requires $p$ extra model evaluations per iteration (one per parameter direction). For Heston with $p = 5$ parameters and $N = 50$ instruments, this is $5 \times 50 = 250$ Fourier integrals per LM step.

Central differences (second-order accurate, order $O(h^2)$ ):

$J_{ij} \approx \frac{r_i(\theta + h e_j) - r_i(\theta - h e_j)}{2h}.$

Twice the cost, but significantly more accurate for the same step size $h$ . Step size selection: $h \approx \sqrt{\varepsilon_{\mathrm{mach}}} \approx 10^{-7}$ for forward, $h \approx \varepsilon_{\mathrm{mach}}^{1/3} \approx 10^{-5}$ for central.

Analytic Jacobian for Affine Models

For models with a closed-form characteristic function $\varphi_T(u; \theta)$ (e.g., Heston, Bates), the sensitivity of the model price to each parameter can be computed by differentiating under the Fourier integral:

$\frac{\partial C}{\partial \theta_j} = \frac{\partial}{\partial \theta_j} \left[ \text{Re} \int_0^\infty \varphi_T(u - i/2; \theta) \frac{e^{-iuk}}{u^2 + 1/4}\, du \right] = \text{Re} \int_0^\infty \frac{\partial \varphi_T}{\partial \theta_j}(u - i/2; \theta) \frac{e^{-iuk}}{u^2 + 1/4}\, du.$

For Heston, $\partial \varphi_T / \partial \theta_j$ can be computed in closed form by differentiating the Riccati solution. This yields the exact Jacobian at the cost of 5 additional Fourier integrals (one per parameter), which is far cheaper than FD (especially when each Fourier integral prices all $N$ strikes simultaneously via FFT).

Convergence Criteria

LM terminates when any of the following conditions is met:

Small gradient: $\|J^\top r\|_\infty < \varepsilon_1$ (first-order optimality; gradient norm small).
Small step: $\|\delta\theta\|_2 < \varepsilon_2 (1 + \|\theta\|_2)$ (parameter change negligible).
Small residual: $F(\theta) < \varepsilon_3$ (objective small enough).
Max iterations: $k > k_{\max}$ (wall-clock budget exceeded).

In production, condition 1 is the cleanest termination test. The gradient norm $\|J^\top r\|$ is zero at any stationary point — including saddle points and local minima — so satisfying criterion 1 only guarantees a stationary point, not a global minimum. Always verify by restarting from alternative initial points.

Implementation

#include <Eigen/Dense>
#include <functional>
#include <vector>
#include <cmath>
#include <stdexcept>

// Calibration result
struct CalibResult {
    Eigen::VectorXd params;    // calibrated parameters
    double          final_rss; // residual sum of squares F(theta*)
    int             iters;     // LM iterations used
    bool            converged;
};

// Levenberg-Marquardt calibrator.
//
// residuals_fn: given theta (p-vector), returns r (N-vector) of weighted residuals
// jacobian_fn:  given theta, returns J (N x p matrix); pass nullptr for FD fallback
// theta0:       initial parameter guess
// lb, ub:       box constraints on parameters
// max_iter, tol: stopping criteria
CalibResult levenberg_marquardt(
    std::function<Eigen::VectorXd(const Eigen::VectorXd&)> residuals_fn,
    std::function<Eigen::MatrixXd(const Eigen::VectorXd&)> jacobian_fn,
    const Eigen::VectorXd& theta0,
    const Eigen::VectorXd& lb,
    const Eigen::VectorXd& ub,
    int    max_iter = 200,
    double tol_grad = 1.0e-8,
    double tol_step = 1.0e-12,
    double lambda0  = 1.0e-3
) {
    const int p = static_cast<int>(theta0.size());

    Eigen::VectorXd theta = theta0;
    Eigen::VectorXd r     = residuals_fn(theta);
    double F              = 0.5 * r.squaredNorm();

    double lambda = lambda0;
    bool converged = false;

    for (int k = 0; k < max_iter; ++k) {
        // Compute Jacobian
        Eigen::MatrixXd J = jacobian_fn
            ? jacobian_fn(theta)
            : [&]() {
                // Central-difference fallback
                Eigen::MatrixXd Jfd(r.size(), p);
                const double h = 1.0e-5;
                for (int j = 0; j < p; ++j) {
                    Eigen::VectorXd th_p = theta, th_m = theta;
                    th_p(j) += h; th_m(j) -= h;
                    Jfd.col(j) = (residuals_fn(th_p) - residuals_fn(th_m)) / (2.0 * h);
                }
                return Jfd;
            }();

        const Eigen::MatrixXd JtJ = J.transpose() * J;
        const Eigen::VectorXd Jtr = J.transpose() * r;

        // Convergence check: gradient norm
        if (Jtr.lpNorm<Eigen::Infinity>() < tol_grad) {
            converged = true;
            break;
        }

        // Marquardt scaling: D = diag(J^T J), clamped to avoid zero
        Eigen::VectorXd diag_D = JtJ.diagonal().cwiseMax(1.0e-16);

        // Solve (J^T J + lambda * diag(D)) * delta = -J^T r
        Eigen::MatrixXd A = JtJ;
        A.diagonal() += lambda * diag_D;
        const Eigen::VectorXd delta = A.ldlt().solve(-Jtr);

        if (delta.norm() < tol_step * (1.0 + theta.norm())) {
            converged = true;
            break;
        }

        // Candidate step, projected onto box constraints
        Eigen::VectorXd theta_new = (theta + delta).cwiseMax(lb).cwiseMin(ub);
        Eigen::VectorXd r_new     = residuals_fn(theta_new);
        double          F_new     = 0.5 * r_new.squaredNorm();

        // Gain ratio rho = (actual reduction) / (predicted reduction)
        double predicted = -Jtr.dot(delta) - 0.5 * delta.dot(JtJ * delta);
        double actual    = F - F_new;
        double rho       = (predicted > 0.0) ? actual / predicted : -1.0;

        if (rho > 0.25) {
            // Accept step
            theta = theta_new;
            r     = r_new;
            F     = F_new;
            // Decrease lambda: expand trust region
            lambda = std::max(lambda / 3.0, 1.0e-16);
        } else {
            // Reject step: contract trust region
            lambda = std::min(lambda * 3.0, 1.0e+16);
        }
    }

    return {theta, F, max_iter, converged};
}

Multi-Start Strategy

The LM algorithm is local. To avoid poor local minima, run multiple independent restarts from diverse initial points and select the solution with the lowest final objective:

// Simple multi-start wrapper
CalibResult multi_start_lm(
    std::function<Eigen::VectorXd(const Eigen::VectorXd&)> residuals_fn,
    std::function<Eigen::MatrixXd(const Eigen::VectorXd&)> jacobian_fn,
    const std::vector<Eigen::VectorXd>& starts,
    const Eigen::VectorXd& lb,
    const Eigen::VectorXd& ub
) {
    CalibResult best;
    best.final_rss = std::numeric_limits<double>::infinity();
    for (const auto& theta0 : starts) {
        auto result = levenberg_marquardt(residuals_fn, jacobian_fn, theta0, lb, ub);
        if (result.final_rss < best.final_rss)
            best = result;
    }
    return best;
}

Validation

Synthetic Calibration Test

Plant known Heston parameters, generate a synthetic vol surface via the Lewis formula, then calibrate and verify recovery. A well-implemented calibrator should recover planted parameters to within $10^{-4}$ in absolute parameter error for a noise-free surface of $N = 50$ instruments.

import numpy as np
from scipy.optimize import least_squares

# Reference: planted Heston parameters
true_params = dict(kappa=2.0, theta=0.04, xi=0.4, rho=-0.6, v0=0.04)

# Generate synthetic surface (assume heston_implied_vol is implemented)
strikes   = np.array([80, 90, 95, 100, 105, 110, 120] * 4, dtype=float)
maturities = np.repeat([0.25, 0.5, 1.0, 2.0], 7)
true_vols  = np.array([
    heston_implied_vol(K, T, **true_params)
    for K, T in zip(strikes, maturities)
])

# Residuals function
def residuals(x):
    kappa, theta, xi, rho, v0 = x
    model_vols = np.array([
        heston_implied_vol(K, T, kappa=kappa, theta=theta, xi=xi, rho=rho, v0=v0)
        for K, T in zip(strikes, maturities)
    ])
    return model_vols - true_vols  # weights = 1 here

# Bounds: kappa, theta, xi > 0; rho in (-1, 1); v0 > 0
bounds = ([1e-4, 1e-4, 1e-4, -0.999, 1e-4],
          [20.0,  1.0,  2.0,  0.999, 1.0])

# Initial guess — intentionally different from true values
x0 = [1.0, 0.06, 0.3, -0.4, 0.06]
result = least_squares(residuals, x0, bounds=bounds, method='trf', ftol=1e-12, xtol=1e-12)
print(f"Calibrated: kappa={result.x[0]:.4f}, theta={result.x[1]:.4f}, "
      f"xi={result.x[2]:.4f}, rho={result.x[3]:.4f}, v0={result.x[4]:.4f}")
print(f"True:       kappa=2.0000, theta=0.0400, xi=0.4000, rho=-0.6000, v0=0.0400")
print(f"Final RMS vol error: {np.sqrt(2 * result.cost / len(strikes)):.2e}")

Limitations

Local minima. The Heston loss surface contains ridges along which many parameter combinations produce nearly identical implied vol surfaces (e.g., the $\kappa\theta = \text{const}$ ridge). LM finds the nearest local minimum, which may be far from the global optimum. Multi-start or global optimisers (differential evolution, simulated annealing) are needed to escape these regions.

Parameter unidentifiability. The Heston model has five parameters but only two distinct effects dominate the implied vol surface: the ATM vol term structure (driven by $\kappa$ , $\theta$ , $v_0$ ) and the smile curvature (driven by $\xi$ and $\rho$ ). The individual values of $\kappa$ and $\theta$ are poorly identified — many pairs with similar $\kappa\theta$ produce indistinguishable surfaces. Regularisation or parameter reduction is required in practice.

Jacobian quality. FD Jacobians introduce truncation error that degrades convergence near the solution. The optimal step size $h^* \approx (\varepsilon_{\mathrm{mach}})^{1/2}$ for forward FD is on the order of $10^{-7}$ , but choosing $h$ too small introduces cancellation error. Central differences are safer but 2× as expensive. Analytic Jacobians are preferred whenever available.

Calibration instability over time. Daily recalibration of all five Heston parameters produces large day-to-day jumps even when the vol surface moves smoothly. This instability is the result of a flat loss surface, not numerical noise. Tikhonov regularisation (penalising deviation from yesterday's calibration) is the standard fix — see the Regularisation module.

Constraint handling. The LM algorithm as described assumes unconstrained optimisation. Box constraints are handled approximately by projecting the step onto $[\ell, u]$ , which can cause the gain ratio $\rho$ to be computed incorrectly. A proper constrained LM (e.g., MINPACK's LMSTR) uses an active-set approach or interior-point scaling.

Interview Angle

L1. Explain the calibration setup for the Heston model. What is the objective function? Why use implied vols rather than prices as the calibration target?

Setup: Find $\theta = (\kappa, \bar{\nu}, \xi, \rho, \nu_0)$ minimising $\sum_i w_i(\sigma_i^{\mathrm{Heston}} - \hat{\sigma}_i)^2$ . Implied vols rather than prices: prices have very different magnitudes across strikes and maturities (deep OTM options are worth cents, ATM options worth dollars). Fitting prices gives enormous weight to expensive options. Implied vols are approximately normalised, giving the calibrator roughly equal influence per market instrument.

L2. Derive the LM update equation. Explain what happens to the step as $\lambda \to 0$ and $\lambda \to \infty$ . What is the Marquardt scaling, and why is it preferable to the identity?

Derivation: LM minimises the second-order Taylor expansion of $F$ augmented by a penalty: $\min_\delta \{\tfrac{1}{2}\|J\delta + r\|^2 + \tfrac{\lambda}{2}\delta^\top D \delta\}.$ Taking the gradient and setting to zero: $(J^\top J + \lambda D)\delta = -J^\top r$ . As $\lambda \to 0$ : $\delta = -(J^\top J)^{-1} J^\top r$ (Gauss-Newton, quadratic convergence near the solution). As $\lambda \to \infty$ : $\delta \approx -(1/\lambda) D^{-1} J^\top r$ (scaled gradient descent, always descends). Marquardt scaling $D = \mathrm{diag}(J^\top J)$ makes the step scale-invariant: multiplying parameter $\theta_j$ by a constant leaves the step unchanged, unlike the identity which biases towards parameters with small natural scale.

L3. Discuss the identifiability problem in Heston calibration. Formally characterise the ridge in the loss surface and propose a regularisation strategy. How would you construct the analytic Jacobian for the Heston model via the Lewis integral?

Identifiability: The ATM implied vol at maturity $T$ is approximately $\sigma_{\mathrm{ATM}}(T) \approx \sqrt{\nu_0 e^{-\kappa T} + \bar{\nu}(1 - e^{-\kappa T})}$ . For $T$ in a finite set, many combinations of $(\kappa, \nu_0, \bar{\nu})$ produce the same sequence $\{\sigma_{\mathrm{ATM}}(T_i)\}$ . The loss surface has a shallow valley along these combinations. Formally, the Hessian $J^\top J$ has small eigenvalues in these directions, producing a large condition number. Tikhonov regularisation adds $\lambda \|\theta - \theta_{\mathrm{prev}}\|^2$ to the objective, effectively constraining the solution to be near the previous day's calibration and bounding the condition number. The analytic Jacobian element $\partial \sigma^{\mathrm{Heston}} / \partial \theta_j$ is obtained by differentiating the Lewis integral: $\partial C / \partial \theta_j = \mathrm{Re} \int_0^\infty (\partial \varphi_T / \partial \theta_j)(u - i/2) \cdot e^{-iuk}/(u^2 + 1/4)\, du$ , where $\partial \varphi_T / \partial \theta_j$ follows from differentiating the Riccati ODE solutions $A(u,T;\theta)$ and $B(u,T;\theta)$ with respect to $\theta_j$ .