Brownian Bridge™

Setup

The Ill-Posed Inverse Problem

Calibration is an inverse problem: given observations (market implied vols), infer the underlying model parameters. Hadamard (1902) defined a well-posed problem as one in which a solution exists, is unique, and depends continuously on the data. Calibration typically violates the third condition: small perturbations in market quotes can produce large changes in calibrated parameters.

This ill-posedness is not a modelling pathology — it is a fundamental property of the calibration problem for any sufficiently flexible model. The Heston model has a flat loss surface along parameter combinations that produce similar implied vol surfaces. A local vol model has infinitely many solutions (the ill-posedness is more severe). Understanding and controlling this instability is what separates a robust production calibrator from an academic prototype.

Conventions

Market data: implied vol vector $\hat{\sigma} \in \mathbb{R}^N$ , perturbed by noise $\delta\hat{\sigma} \sim \mathcal{N}(0, \epsilon^2 I)$ (bid-ask spread, model error).
Model parameters: $\theta \in \mathbb{R}^p$ , constrained to $\Theta \subseteq \mathbb{R}^p$ .
Calibration operator: $\mathcal{F}: \Theta \to \mathbb{R}^N$ , $\theta \mapsto \sigma_{\mathrm{model}}(\cdot; \theta)$ .
Condition number: $\kappa(J^\top J)$ , where $J = D\mathcal{F}(\theta)$ is the Jacobian. Large $\kappa$ indicates ill-conditioning.

Theory: Sources of Instability

Flat Loss Surfaces and Ridges

Define the calibration loss $F(\theta) = \tfrac{1}{2}\|r(\theta)\|^2$ . Its curvature is characterised by the Hessian approximation $H \approx J^\top J$ . If $H$ has small eigenvalues $\lambda_{\min} \ll \lambda_{\max}$ , the loss surface is nearly flat in the corresponding eigendirections.

Formally: if $H v = \lambda_{\min} v$ for a unit vector $v$ , then changing $\theta$ by $\varepsilon v$ changes the objective by only $\tfrac{1}{2}\lambda_{\min}\varepsilon^2$ . But the model output changes by $J v \cdot \varepsilon$ , and $\|Jv\|^2 = v^\top J^\top J v = \lambda_{\min}\varepsilon^2$ . The parameter direction $v$ is weakly identified: it barely affects the implied vol surface.

When calibrating to noisy data $\hat{\sigma} + \delta\hat{\sigma}$ , the perturbed optimum is:

$\delta\theta \approx (J^\top J)^{-1} J^\top \delta\hat{\sigma}.$

The sensitivity is $\|(J^\top J)^{-1} J^\top\| = \sigma_{\max}(J) / \sigma_{\min}(J)^2 = \kappa(J) / \sigma_{\min}(J)$ , where $\sigma_{\min}, \sigma_{\max}$ are singular values of $J$ . For ill-conditioned $J$ , small data noise $\|\delta\hat{\sigma}\|$ causes large parameter perturbation $\|\delta\theta\|$ .

SVD Decomposition of the Calibration Problem

Write the SVD of the Jacobian: $J = U \Sigma V^\top$ , where $U \in \mathbb{R}^{N \times p}$ , $\Sigma = \mathrm{diag}(\sigma_1, \ldots, \sigma_p)$ , $V \in \mathbb{R}^{p \times p}$ .

The unregularised least-squares solution is:

$\delta\theta_{\mathrm{LS}} = V \Sigma^{-1} U^\top \delta\hat{\sigma} = \sum_{j=1}^p \frac{u_j^\top \delta\hat{\sigma}}{\sigma_j} v_j.$

Small singular values $\sigma_j \approx 0$ amplify the noise component $u_j^\top \delta\hat{\sigma}$ arbitrarily. This is the mechanism of instability.

Tikhonov Regularisation

The Penalised Objective

Replace the unconstrained calibration objective with:

$F_\lambda(\theta) = \underbrace{\|r(\theta)\|^2}_{\text{data fit}} + \lambda \underbrace{\|\Gamma(\theta - \theta_0)\|^2}_{\text{regularisation penalty}},$

where:

$\lambda > 0$ : regularisation parameter, controls the bias-variance tradeoff.
$\Gamma \in \mathbb{R}^{q \times p}$ : regularisation matrix (often the identity $I$ or a difference operator).
$\theta_0$ : prior (e.g., yesterday's calibrated parameters, or a reference set of parameters).

Interpretation: We seek parameters that fit the market data AND are close to the prior $\theta_0$ . The penalty $\lambda\|\Gamma(\theta - \theta_0)\|^2$ discourages large deviations from the prior.

Modified Normal Equations

The first-order condition for the penalised objective (linearising $r$ around $\theta_0$ ):

$(J^\top J + \lambda \Gamma^\top \Gamma)\, \delta\theta = J^\top (J\theta_0 - r(\theta_0)) - \lambda \Gamma^\top \Gamma \theta_0 + \ldots$

More practically, in the LM setting, the Tikhonov-LM update is:

$(J^\top J + \lambda D + \mu \Gamma^\top \Gamma)\, \delta\theta = -J^\top r - \mu \Gamma^\top \Gamma (\theta - \theta_0),$

where $\lambda$ is the LM damping and $\mu$ is the Tikhonov strength. The regularisation term $\mu\Gamma^\top\Gamma$ adds to the diagonal (or subdiagonal if $\Gamma$ is a difference operator), directly bounding the minimum eigenvalue of the system matrix from below.

Bias-Variance Tradeoff

Small $\lambda$ : The solution minimises the data fit with little constraint. The estimator is approximately unbiased ( $\mathbb{E}[\hat{\theta}] \approx \theta^*$ ) but highly variable: small noise in $\hat{\sigma}$ causes large swings in $\hat{\theta}$ . High variance.

Large $\lambda$ : The solution is heavily pulled towards the prior $\theta_0$ . The estimator has low variance (stable day-to-day calibration) but is biased away from the true parameters. High bias.

Optimal $\lambda$ : Minimises the mean squared error $\mathbb{E}[\|\hat{\theta} - \theta^*\|^2] = \mathrm{Bias}^2 + \mathrm{Variance}$ . Methods to select $\lambda$ are discussed below.

Effect on Singular Values

With $\Gamma = I$ , the regularised singular values become:

$\hat{\theta}_{\lambda} = \sum_{j=1}^p \frac{\sigma_j}{\sigma_j^2 + \lambda}\, u_j^\top r\, v_j.$

The factor $\sigma_j / (\sigma_j^2 + \lambda)$ dampens directions with small $\sigma_j$ : for $\sigma_j \gg \sqrt{\lambda}$ , it is approximately $1/\sigma_j$ (no regularisation effect); for $\sigma_j \ll \sqrt{\lambda}$ , it is approximately $\sigma_j/\lambda$ (strongly suppressed). This is a soft truncation of the SVD.

Parameter Selection Methods

L-Curve Method

Plot $\log \|r(\theta_\lambda)\|^2$ (residual norm) versus $\log \|\Gamma(\theta_\lambda - \theta_0)\|^2$ (regularisation norm) for a range of $\lambda$ values. The curve is typically L-shaped:

Horizontal arm (small $\lambda$ ): residual small but regularisation term large (overfitting, unstable parameters).
Vertical arm (large $\lambda$ ): residual large (underfitting) but regularisation norm small.
Corner: the optimal $\lambda$ that balances fit and stability.

The corner is identified as the point of maximum curvature on the log-log plot:

$\lambda^* = \arg\max_\lambda \kappa_{\mathrm{curve}}(\lambda), \qquad \kappa_{\mathrm{curve}} = \frac{|\rho_{\mathrm{res}}'' \rho_{\mathrm{reg}}' - \rho_{\mathrm{res}}' \rho_{\mathrm{reg}}''|}{((\rho_{\mathrm{res}}')^2 + (\rho_{\mathrm{reg}}')^2)^{3/2}},$

where $\rho_{\mathrm{res}} = \log\|r\|$ and $\rho_{\mathrm{reg}} = \log\|\Gamma\delta\theta\|$ as functions of $\log\lambda$ .

Morozov Discrepancy Principle

If the noise level $\epsilon$ in the data is known (e.g., from the bid-ask spread), choose $\lambda$ such that the residual matches the expected data noise:

$\|r(\theta_\lambda)\| \approx \epsilon\sqrt{N}.$

Interpretation: if the calibrator fits significantly better than the noise floor, it is overfitting to noise and parameters will be unstable. The discrepancy principle says: stop when fit is as good as the data quality warrants.

In practice, $\epsilon$ is estimated from the average bid-ask half-spread in implied vol units.

Generalised Cross-Validation (GCV)

For linear problems, generalised cross-validation selects $\lambda$ by minimising:

$\mathrm{GCV}(\lambda) = \frac{\|r(\theta_\lambda)\|^2 / N}{(1 - \mathrm{tr}(A(\lambda))/N)^2},$

where $A(\lambda) = J(J^\top J + \lambda I)^{-1} J^\top$ is the hat (influence) matrix. GCV approximates leave-one-out cross-validation for linear problems without the computational overhead of fitting $N$ separate models.

Smoothness Regularisation for Vol Surfaces

For full vol surface calibration (e.g., fitting a nonparametric local vol function), regularisation must additionally enforce smoothness. The penalised objective is:

$F_\lambda(\sigma_{\mathrm{loc}}) = \sum_{i=1}^N (C_{\mathrm{model}}(K_i, T_i; \sigma_{\mathrm{loc}}) - C_{\mathrm{mkt}}_i)^2 + \lambda_1 \|\partial_{kk} \sigma_{\mathrm{loc}}\|^2 + \lambda_2 \|\partial_T \sigma_{\mathrm{loc}}\|^2.$

The first penalty $\|\partial_{kk}\sigma_{\mathrm{loc}}\|^2$ penalises curvature in the strike dimension (suppresses oscillations across strikes). The second penalty $\|\partial_T\sigma_{\mathrm{loc}}\|^2$ penalises variation in the maturity dimension (suppresses unsmooth vol surfaces across maturities). Both are discretised on the calibration grid using finite differences.

Implementation

import numpy as np
from scipy.optimize import minimize
from scipy.linalg import svd

def tikhonov_calibrate(
    residuals_fn,           # callable: theta -> r (N-vector)
    jacobian_fn,            # callable: theta -> J (N x p matrix)
    theta0:  np.ndarray,    # prior / initial guess
    lb:      np.ndarray,    # lower bounds on parameters
    ub:      np.ndarray,    # upper bounds on parameters
    mu:      float = 1e-3,  # Tikhonov regularisation strength
    gamma:   np.ndarray | None = None  # regularisation matrix (default: identity)
) -> dict:
    """
    Tikhonov-regularised calibration via L-BFGS-B.

    The penalised objective is:
        F_mu(theta) = ||r(theta)||^2 + mu * ||Gamma(theta - theta0)||^2

    Inputs:
        residuals_fn: callable (p,) -> (N,)
        jacobian_fn:  callable (p,) -> (N, p)
        theta0:       prior parameter vector (p,)
        lb, ub:       box constraints on parameters
        mu:           regularisation strength
        gamma:        regularisation matrix, shape (q, p); default = identity

    Returns:
        dict with keys: params, rms, converged, reg_norm
    """
    p     = len(theta0)
    if gamma is None:
        gamma = np.eye(p)

    def penalised_objective_and_grad(theta: np.ndarray):
        r   = residuals_fn(theta)
        J   = jacobian_fn(theta)
        # Data fit
        obj = float(np.dot(r, r))
        g   = 2.0 * J.T @ r
        # Tikhonov penalty
        diff  = theta - theta0
        reg   = float(np.dot(diff, gamma.T @ gamma @ diff))
        obj  += mu * reg
        g    += 2.0 * mu * (gamma.T @ gamma @ diff)
        return obj, g

    from scipy.optimize import minimize, Bounds
    result = minimize(
        penalised_objective_and_grad,
        theta0,
        method='L-BFGS-B',
        jac=True,
        bounds=Bounds(lb=lb, ub=ub),
        options=dict(ftol=1e-14, gtol=1e-10, maxiter=500),
    )

    r_final  = residuals_fn(result.x)
    rms      = float(np.sqrt(np.mean(r_final**2)))
    reg_norm = float(np.linalg.norm(gamma @ (result.x - theta0)))

    return dict(params=result.x, rms=rms, converged=result.success, reg_norm=reg_norm)


def l_curve_analysis(
    residuals_fn,
    jacobian_fn,
    theta0: np.ndarray,
    lb:     np.ndarray,
    ub:     np.ndarray,
    mu_grid: np.ndarray | None = None
) -> dict:
    """
    Compute the L-curve across a range of regularisation strengths.

    Returns dict with keys: mu_grid, rms_grid, reg_norm_grid,
                            optimal_mu (corner of L-curve).
    """
    if mu_grid is None:
        mu_grid = np.logspace(-6, 1, 30)

    rms_list = []
    reg_list = []

    for mu in mu_grid:
        result = tikhonov_calibrate(residuals_fn, jacobian_fn, theta0, lb, ub, mu=mu)
        rms_list.append(result['rms'])
        reg_list.append(result['reg_norm'])

    rms_arr = np.array(rms_list)
    reg_arr = np.array(reg_list)

    # Identify corner: maximum curvature on log-log L-curve
    log_rms = np.log(rms_arr + 1e-16)
    log_reg = np.log(reg_arr + 1e-16)
    # Numerical curvature via finite differences
    drms  = np.gradient(log_rms, np.log(mu_grid))
    dreg  = np.gradient(log_reg, np.log(mu_grid))
    d2rms = np.gradient(drms,  np.log(mu_grid))
    d2reg = np.gradient(dreg,  np.log(mu_grid))
    kappa = np.abs(d2rms * dreg - drms * d2reg) / (drms**2 + dreg**2)**1.5
    corner_idx = int(np.argmax(kappa))

    return dict(
        mu_grid=mu_grid,
        rms_grid=rms_arr,
        reg_norm_grid=reg_arr,
        optimal_mu=float(mu_grid[corner_idx]),
    )


def condition_number_analysis(jacobian: np.ndarray) -> dict:
    """
    SVD analysis of the Jacobian to assess calibration conditioning.

    Returns singular values, condition number, and relative contributions
    of each parameter direction to the implied vol surface variation.
    """
    U, s, Vt = svd(jacobian, full_matrices=False)
    cond = float(s[0] / s[-1]) if s[-1] > 0 else float('inf')
    return dict(
        singular_values=s,
        condition_number=cond,
        right_singular_vectors=Vt,   # rows are parameter-space directions
        left_singular_vectors=U,     # rows are data-space directions
    )

Practical Calibration Workflow

def daily_recalibration(
    market_vols:   np.ndarray,  # today's market implied vols
    prev_params:   np.ndarray,  # yesterday's calibrated parameters
    model_residuals_fn,         # model residuals function
    model_jacobian_fn,          # model Jacobian function
    lb:            np.ndarray,
    ub:            np.ndarray,
    noise_level:   float = 0.002,  # ~0.2 vol point bid-ask half-spread
    n_instruments: int   = 50,
) -> dict:
    """
    Production daily recalibration:
    1. Attempt unregularised calibration from previous day's parameters.
    2. If condition number > threshold, switch to Tikhonov with mu from discrepancy principle.
    3. Cross-validate against held-out instruments.
    """
    # Step 1: Condition number check
    J_prev = model_jacobian_fn(prev_params)
    cond   = condition_number_analysis(J_prev)['condition_number']
    print(f"Condition number at prior: {cond:.1f}")

    # Step 2: Choose regularisation
    if cond > 1e4:
        # Morozov: target residual ~ noise_level * sqrt(N)
        target_rms = noise_level
        # Sweep mu to find where rms ~= target
        lc = l_curve_analysis(model_residuals_fn, model_jacobian_fn, prev_params, lb, ub)
        mu = float(lc['optimal_mu'])
        print(f"Ill-conditioned: using Tikhonov mu={mu:.2e}")
    else:
        mu = 0.0  # no regularisation needed

    result = tikhonov_calibrate(
        model_residuals_fn, model_jacobian_fn, prev_params, lb, ub, mu=mu
    )
    return result

Limitations

Bias is always present. Tikhonov regularisation introduces bias: the calibrated parameters are pulled towards the prior $\theta_0$ . If $\theta_0$ is wrong (e.g., the market has moved significantly overnight), the bias can be substantial. The regularised calibration will underfit the market at some strikes. This must be monitored via residual analysis.

Prior dependence. The choice of prior $\theta_0$ matters significantly when $\mu$ is large. Starting each day from scratch with $\theta_0 = \theta_{\mathrm{long-run}}$ (a historical average) is more stable than using the previous day's calibration when the market has experienced a large move. A jump-aware prior (detecting vol regime changes) is important in production.

Smoothness regularisation for local vol. Regularising with $\|\partial_{kk}\sigma_{\mathrm{loc}}\|^2$ suppresses oscillations but also smooths genuine sharp features (e.g., the steep ATM skew in single-stock options with near-term earnings). The regularisation parameter must be small enough to preserve these features.

Condition number as a calibration health metric. In production, the condition number of $J^\top J$ should be logged daily. A sudden increase in condition number may indicate that the model is losing its ability to fit the market — a sign that a model change or regime shift has occurred.

GCV and L-curve are approximate. Both methods assume that the calibration problem is (approximately) linear, which is only valid near the solution. For strongly nonlinear problems (e.g., calibrating a rough vol model), these criteria can select poor regularisation strengths.

Interview Angle

L1. What does it mean for a calibration problem to be ill-posed? Give a concrete example using the Heston model and explain what "instability" means in practical terms for a trading desk.

Ill-posedness: The calibrated Heston parameters change significantly from one day to the next despite a nearly unchanged implied vol surface. Concretely: $\kappa$ and $\bar{\nu}$ individually are poorly identified — many $(\kappa, \bar{\nu})$ pairs with the same $\kappa\bar{\nu}$ produce indistinguishable surfaces. On the desk, this means the P&L attribution from theta (time decay, via $\kappa$ and $\bar{\nu}$ ) fluctuates wildly day-to-day without a corresponding market move. Model risk reserves are inflated to cover this instability.

L2. Explain Tikhonov regularisation. What is the penalised objective, and how does the regularisation parameter $\lambda$ control the bias-variance tradeoff? What is the Morozov discrepancy principle, and how do you estimate the noise level from market data?

Penalised objective: $F_\lambda(\theta) = \|r(\theta)\|^2 + \lambda\|\theta - \theta_0\|^2$ . Small $\lambda$ : low bias (fits market well), high variance (parameters jump with market noise). Large $\lambda$ : high bias (pulled to prior), low variance (stable calibration). Morozov principle: set $\lambda$ such that $\|r(\theta_\lambda)\| \approx \epsilon\sqrt{N}$ , where $\epsilon$ is the noise standard deviation per instrument. Estimate $\epsilon$ from bid-ask half-spreads in implied vol space: for liquid equity index options, $\epsilon \approx 0.1\%\text{–}0.3\%$ (1–3 vol points).

L3. Analyse the ill-posedness via the SVD of the Jacobian. How does Tikhonov regularisation modify the singular value decomposition? Derive the bias and variance of the regularised estimator as a function of $\lambda$ and discuss the optimal $\lambda$ in the bias-variance sense.

SVD analysis: Write $J = U\Sigma V^\top$ . The regularised solution is $\hat{\theta}_\lambda = \sum_j f_j(\lambda) (u_j^\top \hat{\sigma}) v_j / \sigma_j$ where $f_j(\lambda) = \sigma_j^2/(\sigma_j^2 + \lambda)$ are the Tikhonov filter factors. For the true parameters $\theta^*$ and noise $\delta\hat{\sigma} \sim \mathcal{N}(0,\epsilon^2 I)$ :

$\text{Bias}(\hat{\theta}_\lambda) = \sum_j (1 - f_j(\lambda))(v_j^\top \theta^*) v_j, \qquad \text{Var}(\hat{\theta}_\lambda) = \epsilon^2 \sum_j \frac{f_j(\lambda)^2}{\sigma_j^2}.$

The mean squared error is $\text{MSE} = \|\text{Bias}\|^2 + \text{trace}(\text{Var})$ . The optimal $\lambda^*$ trades off these two terms: for a specific direction $v_j$ , the optimal balance gives $\lambda^* \approx \epsilon^2 / (v_j^\top \theta^*)^2$ . In practice, the L-curve or GCV approximates this optimum without requiring knowledge of $\theta^*$ .