CalibrationRegularisationTikhonovInverse ProblemsNumerical Stability

Regularisation and Stability in Calibration

25 min readLevel: Hard

Setup

The Ill-Posed Inverse Problem

Calibration is an inverse problem: given observations (market implied vols), infer the underlying model parameters. Hadamard (1902) defined a well-posed problem as one in which a solution exists, is unique, and depends continuously on the data. Calibration typically violates the third condition: small perturbations in market quotes can produce large changes in calibrated parameters.

This ill-posedness is not a modelling pathology — it is a fundamental property of the calibration problem for any sufficiently flexible model. The Heston model has a flat loss surface along parameter combinations that produce similar implied vol surfaces. A local vol model has infinitely many solutions (the ill-posedness is more severe). Understanding and controlling this instability is what separates a robust production calibrator from an academic prototype.

Conventions

  • Market data: implied vol vector σ^RN\hat{\sigma} \in \mathbb{R}^N, perturbed by noise δσ^N(0,ϵ2I)\delta\hat{\sigma} \sim \mathcal{N}(0, \epsilon^2 I) (bid-ask spread, model error).
  • Model parameters: θRp\theta \in \mathbb{R}^p, constrained to ΘRp\Theta \subseteq \mathbb{R}^p.
  • Calibration operator: F:ΘRN\mathcal{F}: \Theta \to \mathbb{R}^N, θσmodel(;θ)\theta \mapsto \sigma_{\mathrm{model}}(\cdot; \theta).
  • Condition number: κ(JJ)\kappa(J^\top J), where J=DF(θ)J = D\mathcal{F}(\theta) is the Jacobian. Large κ\kappa indicates ill-conditioning.

Theory: Sources of Instability

Flat Loss Surfaces and Ridges

Define the calibration loss F(θ)=12r(θ)2F(\theta) = \tfrac{1}{2}\|r(\theta)\|^2. Its curvature is characterised by the Hessian approximation HJJH \approx J^\top J. If HH has small eigenvalues λminλmax\lambda_{\min} \ll \lambda_{\max}, the loss surface is nearly flat in the corresponding eigendirections.

Formally: if Hv=λminvH v = \lambda_{\min} v for a unit vector vv, then changing θ\theta by εv\varepsilon v changes the objective by only 12λminε2\tfrac{1}{2}\lambda_{\min}\varepsilon^2. But the model output changes by JvεJ v \cdot \varepsilon, and Jv2=vJJv=λminε2\|Jv\|^2 = v^\top J^\top J v = \lambda_{\min}\varepsilon^2. The parameter direction vv is weakly identified: it barely affects the implied vol surface.

When calibrating to noisy data σ^+δσ^\hat{\sigma} + \delta\hat{\sigma}, the perturbed optimum is:

δθ(JJ)1Jδσ^.\delta\theta \approx (J^\top J)^{-1} J^\top \delta\hat{\sigma}.

The sensitivity is (JJ)1J=σmax(J)/σmin(J)2=κ(J)/σmin(J)\|(J^\top J)^{-1} J^\top\| = \sigma_{\max}(J) / \sigma_{\min}(J)^2 = \kappa(J) / \sigma_{\min}(J), where σmin,σmax\sigma_{\min}, \sigma_{\max} are singular values of JJ. For ill-conditioned JJ, small data noise δσ^\|\delta\hat{\sigma}\| causes large parameter perturbation δθ\|\delta\theta\|.

SVD Decomposition of the Calibration Problem

Write the SVD of the Jacobian: J=UΣVJ = U \Sigma V^\top, where URN×pU \in \mathbb{R}^{N \times p}, Σ=diag(σ1,,σp)\Sigma = \mathrm{diag}(\sigma_1, \ldots, \sigma_p), VRp×pV \in \mathbb{R}^{p \times p}.

The unregularised least-squares solution is:

δθLS=VΣ1Uδσ^=j=1pujδσ^σjvj.\delta\theta_{\mathrm{LS}} = V \Sigma^{-1} U^\top \delta\hat{\sigma} = \sum_{j=1}^p \frac{u_j^\top \delta\hat{\sigma}}{\sigma_j} v_j.

Small singular values σj0\sigma_j \approx 0 amplify the noise component ujδσ^u_j^\top \delta\hat{\sigma} arbitrarily. This is the mechanism of instability.


Tikhonov Regularisation

The Penalised Objective

Replace the unconstrained calibration objective with:

Fλ(θ)=r(θ)2data fit+λΓ(θθ0)2regularisation penalty,F_\lambda(\theta) = \underbrace{\|r(\theta)\|^2}_{\text{data fit}} + \lambda \underbrace{\|\Gamma(\theta - \theta_0)\|^2}_{\text{regularisation penalty}},

where:

  • λ>0\lambda > 0: regularisation parameter, controls the bias-variance tradeoff.
  • ΓRq×p\Gamma \in \mathbb{R}^{q \times p}: regularisation matrix (often the identity II or a difference operator).
  • θ0\theta_0: prior (e.g., yesterday's calibrated parameters, or a reference set of parameters).

Interpretation: We seek parameters that fit the market data AND are close to the prior θ0\theta_0. The penalty λΓ(θθ0)2\lambda\|\Gamma(\theta - \theta_0)\|^2 discourages large deviations from the prior.

Modified Normal Equations

The first-order condition for the penalised objective (linearising rr around θ0\theta_0):

(JJ+λΓΓ)δθ=J(Jθ0r(θ0))λΓΓθ0+(J^\top J + \lambda \Gamma^\top \Gamma)\, \delta\theta = J^\top (J\theta_0 - r(\theta_0)) - \lambda \Gamma^\top \Gamma \theta_0 + \ldots

More practically, in the LM setting, the Tikhonov-LM update is:

(JJ+λD+μΓΓ)δθ=JrμΓΓ(θθ0),(J^\top J + \lambda D + \mu \Gamma^\top \Gamma)\, \delta\theta = -J^\top r - \mu \Gamma^\top \Gamma (\theta - \theta_0),

where λ\lambda is the LM damping and μ\mu is the Tikhonov strength. The regularisation term μΓΓ\mu\Gamma^\top\Gamma adds to the diagonal (or subdiagonal if Γ\Gamma is a difference operator), directly bounding the minimum eigenvalue of the system matrix from below.

Bias-Variance Tradeoff

Small λ\lambda: The solution minimises the data fit with little constraint. The estimator is approximately unbiased (E[θ^]θ\mathbb{E}[\hat{\theta}] \approx \theta^*) but highly variable: small noise in σ^\hat{\sigma} causes large swings in θ^\hat{\theta}. High variance.

Large λ\lambda: The solution is heavily pulled towards the prior θ0\theta_0. The estimator has low variance (stable day-to-day calibration) but is biased away from the true parameters. High bias.

Optimal λ\lambda: Minimises the mean squared error E[θ^θ2]=Bias2+Variance\mathbb{E}[\|\hat{\theta} - \theta^*\|^2] = \mathrm{Bias}^2 + \mathrm{Variance}. Methods to select λ\lambda are discussed below.

Effect on Singular Values

With Γ=I\Gamma = I, the regularised singular values become:

θ^λ=j=1pσjσj2+λujrvj.\hat{\theta}_{\lambda} = \sum_{j=1}^p \frac{\sigma_j}{\sigma_j^2 + \lambda}\, u_j^\top r\, v_j.

The factor σj/(σj2+λ)\sigma_j / (\sigma_j^2 + \lambda) dampens directions with small σj\sigma_j: for σjλ\sigma_j \gg \sqrt{\lambda}, it is approximately 1/σj1/\sigma_j (no regularisation effect); for σjλ\sigma_j \ll \sqrt{\lambda}, it is approximately σj/λ\sigma_j/\lambda (strongly suppressed). This is a soft truncation of the SVD.


Parameter Selection Methods

L-Curve Method

Plot logr(θλ)2\log \|r(\theta_\lambda)\|^2 (residual norm) versus logΓ(θλθ0)2\log \|\Gamma(\theta_\lambda - \theta_0)\|^2 (regularisation norm) for a range of λ\lambda values. The curve is typically L-shaped:

  • Horizontal arm (small λ\lambda): residual small but regularisation term large (overfitting, unstable parameters).
  • Vertical arm (large λ\lambda): residual large (underfitting) but regularisation norm small.
  • Corner: the optimal λ\lambda that balances fit and stability.

The corner is identified as the point of maximum curvature on the log-log plot:

λ=argmaxλκcurve(λ),κcurve=ρresρregρresρreg((ρres)2+(ρreg)2)3/2,\lambda^* = \arg\max_\lambda \kappa_{\mathrm{curve}}(\lambda), \qquad \kappa_{\mathrm{curve}} = \frac{|\rho_{\mathrm{res}}'' \rho_{\mathrm{reg}}' - \rho_{\mathrm{res}}' \rho_{\mathrm{reg}}''|}{((\rho_{\mathrm{res}}')^2 + (\rho_{\mathrm{reg}}')^2)^{3/2}},

where ρres=logr\rho_{\mathrm{res}} = \log\|r\| and ρreg=logΓδθ\rho_{\mathrm{reg}} = \log\|\Gamma\delta\theta\| as functions of logλ\log\lambda.

Morozov Discrepancy Principle

If the noise level ϵ\epsilon in the data is known (e.g., from the bid-ask spread), choose λ\lambda such that the residual matches the expected data noise:

r(θλ)ϵN.\|r(\theta_\lambda)\| \approx \epsilon\sqrt{N}.

Interpretation: if the calibrator fits significantly better than the noise floor, it is overfitting to noise and parameters will be unstable. The discrepancy principle says: stop when fit is as good as the data quality warrants.

In practice, ϵ\epsilon is estimated from the average bid-ask half-spread in implied vol units.

Generalised Cross-Validation (GCV)

For linear problems, generalised cross-validation selects λ\lambda by minimising:

GCV(λ)=r(θλ)2/N(1tr(A(λ))/N)2,\mathrm{GCV}(\lambda) = \frac{\|r(\theta_\lambda)\|^2 / N}{(1 - \mathrm{tr}(A(\lambda))/N)^2},

where A(λ)=J(JJ+λI)1JA(\lambda) = J(J^\top J + \lambda I)^{-1} J^\top is the hat (influence) matrix. GCV approximates leave-one-out cross-validation for linear problems without the computational overhead of fitting NN separate models.


Smoothness Regularisation for Vol Surfaces

For full vol surface calibration (e.g., fitting a nonparametric local vol function), regularisation must additionally enforce smoothness. The penalised objective is:

F_\lambda(\sigma_{\mathrm{loc}}) = \sum_{i=1}^N (C_{\mathrm{model}}(K_i, T_i; \sigma_{\mathrm{loc}}) - C_{\mathrm{mkt}}_i)^2 + \lambda_1 \|\partial_{kk} \sigma_{\mathrm{loc}}\|^2 + \lambda_2 \|\partial_T \sigma_{\mathrm{loc}}\|^2.

The first penalty kkσloc2\|\partial_{kk}\sigma_{\mathrm{loc}}\|^2 penalises curvature in the strike dimension (suppresses oscillations across strikes). The second penalty Tσloc2\|\partial_T\sigma_{\mathrm{loc}}\|^2 penalises variation in the maturity dimension (suppresses unsmooth vol surfaces across maturities). Both are discretised on the calibration grid using finite differences.


Implementation

import numpy as np
from scipy.optimize import minimize
from scipy.linalg import svd

def tikhonov_calibrate(
    residuals_fn,           # callable: theta -> r (N-vector)
    jacobian_fn,            # callable: theta -> J (N x p matrix)
    theta0:  np.ndarray,    # prior / initial guess
    lb:      np.ndarray,    # lower bounds on parameters
    ub:      np.ndarray,    # upper bounds on parameters
    mu:      float = 1e-3,  # Tikhonov regularisation strength
    gamma:   np.ndarray | None = None  # regularisation matrix (default: identity)
) -> dict:
    """
    Tikhonov-regularised calibration via L-BFGS-B.

    The penalised objective is:
        F_mu(theta) = ||r(theta)||^2 + mu * ||Gamma(theta - theta0)||^2

    Inputs:
        residuals_fn: callable (p,) -> (N,)
        jacobian_fn:  callable (p,) -> (N, p)
        theta0:       prior parameter vector (p,)
        lb, ub:       box constraints on parameters
        mu:           regularisation strength
        gamma:        regularisation matrix, shape (q, p); default = identity

    Returns:
        dict with keys: params, rms, converged, reg_norm
    """
    p     = len(theta0)
    if gamma is None:
        gamma = np.eye(p)

    def penalised_objective_and_grad(theta: np.ndarray):
        r   = residuals_fn(theta)
        J   = jacobian_fn(theta)
        # Data fit
        obj = float(np.dot(r, r))
        g   = 2.0 * J.T @ r
        # Tikhonov penalty
        diff  = theta - theta0
        reg   = float(np.dot(diff, gamma.T @ gamma @ diff))
        obj  += mu * reg
        g    += 2.0 * mu * (gamma.T @ gamma @ diff)
        return obj, g

    from scipy.optimize import minimize, Bounds
    result = minimize(
        penalised_objective_and_grad,
        theta0,
        method='L-BFGS-B',
        jac=True,
        bounds=Bounds(lb=lb, ub=ub),
        options=dict(ftol=1e-14, gtol=1e-10, maxiter=500),
    )

    r_final  = residuals_fn(result.x)
    rms      = float(np.sqrt(np.mean(r_final**2)))
    reg_norm = float(np.linalg.norm(gamma @ (result.x - theta0)))

    return dict(params=result.x, rms=rms, converged=result.success, reg_norm=reg_norm)


def l_curve_analysis(
    residuals_fn,
    jacobian_fn,
    theta0: np.ndarray,
    lb:     np.ndarray,
    ub:     np.ndarray,
    mu_grid: np.ndarray | None = None
) -> dict:
    """
    Compute the L-curve across a range of regularisation strengths.

    Returns dict with keys: mu_grid, rms_grid, reg_norm_grid,
                            optimal_mu (corner of L-curve).
    """
    if mu_grid is None:
        mu_grid = np.logspace(-6, 1, 30)

    rms_list = []
    reg_list = []

    for mu in mu_grid:
        result = tikhonov_calibrate(residuals_fn, jacobian_fn, theta0, lb, ub, mu=mu)
        rms_list.append(result['rms'])
        reg_list.append(result['reg_norm'])

    rms_arr = np.array(rms_list)
    reg_arr = np.array(reg_list)

    # Identify corner: maximum curvature on log-log L-curve
    log_rms = np.log(rms_arr + 1e-16)
    log_reg = np.log(reg_arr + 1e-16)
    # Numerical curvature via finite differences
    drms  = np.gradient(log_rms, np.log(mu_grid))
    dreg  = np.gradient(log_reg, np.log(mu_grid))
    d2rms = np.gradient(drms,  np.log(mu_grid))
    d2reg = np.gradient(dreg,  np.log(mu_grid))
    kappa = np.abs(d2rms * dreg - drms * d2reg) / (drms**2 + dreg**2)**1.5
    corner_idx = int(np.argmax(kappa))

    return dict(
        mu_grid=mu_grid,
        rms_grid=rms_arr,
        reg_norm_grid=reg_arr,
        optimal_mu=float(mu_grid[corner_idx]),
    )


def condition_number_analysis(jacobian: np.ndarray) -> dict:
    """
    SVD analysis of the Jacobian to assess calibration conditioning.

    Returns singular values, condition number, and relative contributions
    of each parameter direction to the implied vol surface variation.
    """
    U, s, Vt = svd(jacobian, full_matrices=False)
    cond = float(s[0] / s[-1]) if s[-1] > 0 else float('inf')
    return dict(
        singular_values=s,
        condition_number=cond,
        right_singular_vectors=Vt,   # rows are parameter-space directions
        left_singular_vectors=U,     # rows are data-space directions
    )

Practical Calibration Workflow

def daily_recalibration(
    market_vols:   np.ndarray,  # today's market implied vols
    prev_params:   np.ndarray,  # yesterday's calibrated parameters
    model_residuals_fn,         # model residuals function
    model_jacobian_fn,          # model Jacobian function
    lb:            np.ndarray,
    ub:            np.ndarray,
    noise_level:   float = 0.002,  # ~0.2 vol point bid-ask half-spread
    n_instruments: int   = 50,
) -> dict:
    """
    Production daily recalibration:
    1. Attempt unregularised calibration from previous day's parameters.
    2. If condition number > threshold, switch to Tikhonov with mu from discrepancy principle.
    3. Cross-validate against held-out instruments.
    """
    # Step 1: Condition number check
    J_prev = model_jacobian_fn(prev_params)
    cond   = condition_number_analysis(J_prev)['condition_number']
    print(f"Condition number at prior: {cond:.1f}")

    # Step 2: Choose regularisation
    if cond > 1e4:
        # Morozov: target residual ~ noise_level * sqrt(N)
        target_rms = noise_level
        # Sweep mu to find where rms ~= target
        lc = l_curve_analysis(model_residuals_fn, model_jacobian_fn, prev_params, lb, ub)
        mu = float(lc['optimal_mu'])
        print(f"Ill-conditioned: using Tikhonov mu={mu:.2e}")
    else:
        mu = 0.0  # no regularisation needed

    result = tikhonov_calibrate(
        model_residuals_fn, model_jacobian_fn, prev_params, lb, ub, mu=mu
    )
    return result

Limitations

Bias is always present. Tikhonov regularisation introduces bias: the calibrated parameters are pulled towards the prior θ0\theta_0. If θ0\theta_0 is wrong (e.g., the market has moved significantly overnight), the bias can be substantial. The regularised calibration will underfit the market at some strikes. This must be monitored via residual analysis.

Prior dependence. The choice of prior θ0\theta_0 matters significantly when μ\mu is large. Starting each day from scratch with θ0=θlongrun\theta_0 = \theta_{\mathrm{long-run}} (a historical average) is more stable than using the previous day's calibration when the market has experienced a large move. A jump-aware prior (detecting vol regime changes) is important in production.

Smoothness regularisation for local vol. Regularising with kkσloc2\|\partial_{kk}\sigma_{\mathrm{loc}}\|^2 suppresses oscillations but also smooths genuine sharp features (e.g., the steep ATM skew in single-stock options with near-term earnings). The regularisation parameter must be small enough to preserve these features.

Condition number as a calibration health metric. In production, the condition number of JJJ^\top J should be logged daily. A sudden increase in condition number may indicate that the model is losing its ability to fit the market — a sign that a model change or regime shift has occurred.

GCV and L-curve are approximate. Both methods assume that the calibration problem is (approximately) linear, which is only valid near the solution. For strongly nonlinear problems (e.g., calibrating a rough vol model), these criteria can select poor regularisation strengths.


Interview Angle

L1. What does it mean for a calibration problem to be ill-posed? Give a concrete example using the Heston model and explain what "instability" means in practical terms for a trading desk.

Ill-posedness: The calibrated Heston parameters change significantly from one day to the next despite a nearly unchanged implied vol surface. Concretely: κ\kappa and νˉ\bar{\nu} individually are poorly identified — many (κ,νˉ)(\kappa, \bar{\nu}) pairs with the same κνˉ\kappa\bar{\nu} produce indistinguishable surfaces. On the desk, this means the P&L attribution from theta (time decay, via κ\kappa and νˉ\bar{\nu}) fluctuates wildly day-to-day without a corresponding market move. Model risk reserves are inflated to cover this instability.

L2. Explain Tikhonov regularisation. What is the penalised objective, and how does the regularisation parameter λ\lambda control the bias-variance tradeoff? What is the Morozov discrepancy principle, and how do you estimate the noise level from market data?

Penalised objective: Fλ(θ)=r(θ)2+λθθ02F_\lambda(\theta) = \|r(\theta)\|^2 + \lambda\|\theta - \theta_0\|^2. Small λ\lambda: low bias (fits market well), high variance (parameters jump with market noise). Large λ\lambda: high bias (pulled to prior), low variance (stable calibration). Morozov principle: set λ\lambda such that r(θλ)ϵN\|r(\theta_\lambda)\| \approx \epsilon\sqrt{N}, where ϵ\epsilon is the noise standard deviation per instrument. Estimate ϵ\epsilon from bid-ask half-spreads in implied vol space: for liquid equity index options, ϵ0.1%0.3%\epsilon \approx 0.1\%\text{–}0.3\% (1–3 vol points).

L3. Analyse the ill-posedness via the SVD of the Jacobian. How does Tikhonov regularisation modify the singular value decomposition? Derive the bias and variance of the regularised estimator as a function of λ\lambda and discuss the optimal λ\lambda in the bias-variance sense.

SVD analysis: Write J=UΣVJ = U\Sigma V^\top. The regularised solution is θ^λ=jfj(λ)(ujσ^)vj/σj\hat{\theta}_\lambda = \sum_j f_j(\lambda) (u_j^\top \hat{\sigma}) v_j / \sigma_j where fj(λ)=σj2/(σj2+λ)f_j(\lambda) = \sigma_j^2/(\sigma_j^2 + \lambda) are the Tikhonov filter factors. For the true parameters θ\theta^* and noise δσ^N(0,ϵ2I)\delta\hat{\sigma} \sim \mathcal{N}(0,\epsilon^2 I):

Bias(θ^λ)=j(1fj(λ))(vjθ)vj,Var(θ^λ)=ϵ2jfj(λ)2σj2.\text{Bias}(\hat{\theta}_\lambda) = \sum_j (1 - f_j(\lambda))(v_j^\top \theta^*) v_j, \qquad \text{Var}(\hat{\theta}_\lambda) = \epsilon^2 \sum_j \frac{f_j(\lambda)^2}{\sigma_j^2}.

The mean squared error is MSE=Bias2+trace(Var)\text{MSE} = \|\text{Bias}\|^2 + \text{trace}(\text{Var}). The optimal λ\lambda^* trades off these two terms: for a specific direction vjv_j, the optimal balance gives λϵ2/(vjθ)2\lambda^* \approx \epsilon^2 / (v_j^\top \theta^*)^2. In practice, the L-curve or GCV approximates this optimum without requiring knowledge of θ\theta^*.

Verify your understanding before moving on.

Start Quiz →