Regularisation and Stability in Calibration

Hard·25 min read

CalibrationRegularisationTikhonovInverse ProblemsNumerical Stability

Setup

The Ill-Posed Inverse Problem

Calibration is an inverse problem: given observations (market implied vols), infer the underlying model parameters. Hadamard (1902) defined a well-posed problem as one in which a solution exists, is unique, and depends continuously on the data. Calibration typically violates the third condition: small perturbations in market quotes can produce large changes in calibrated parameters.

This ill-posedness is not a modelling pathology — it is a fundamental property of the calibration problem for any sufficiently flexible model. The Heston model has a flat loss surface along parameter combinations that produce similar implied vol surfaces. A local vol model has infinitely many solutions (the ill-posedness is more severe). Understanding and controlling this instability is what separates a robust production calibrator from an academic prototype.

Conventions

Market data: implied vol vector $\hat{\sigma} \in \mathbb{R}^N$ , perturbed by noise $\delta\hat{\sigma} \sim \mathcal{N}(0, \epsilon^2 I)$ (bid-ask spread, model error).
Model parameters: $\theta \in \mathbb{R}^p$ , constrained to $\Theta \subseteq \mathbb{R}^p$ .
Calibration operator: $\mathcal{F}: \Theta \to \mathbb{R}^N$ , $\theta \mapsto \sigma_{\mathrm{model}}(\cdot; \theta)$ .
Condition number: $\kappa(J^\top J)$ , where $J = D\mathcal{F}(\theta)$ is the Jacobian. Large $\kappa$ indicates ill-conditioning.

Theory: Sources of Instability

Flat Loss Surfaces and Ridges

Define the calibration loss $F(\theta) = \tfrac{1}{2}\|r(\theta)\|^2$ . Its curvature is characterised by the Hessian approximation $H \approx J^\top J$ . If $H$ has small eigenvalues $\lambda_{\min} \ll \lambda_{\max}$ , the loss surface is nearly flat in the corresponding eigendirections.

Formally: if $H v = \lambda_{\min} v$ for a unit vector $v$ , then changing $\theta$ by $\varepsilon v$ changes the objective by only $\tfrac{1}{2}\lambda_{\min}\varepsilon^2$ . But the model output changes by $J v \cdot \varepsilon$ , and $\|Jv\|^2 = v^\top J^\top J v = \lambda_{\min}\varepsilon^2$ . The parameter direction $v$ is weakly identified: it barely affects the implied vol surface.

When calibrating to noisy data $\hat{\sigma} + \delta\hat{\sigma}$ , the perturbed optimum is:

$\delta\theta \approx (J^\top J)^{-1} J^\top \delta\hat{\sigma}.$

The sensitivity is $\|(J^\top J)^{-1} J^\top\| = \sigma_{\max}(J) / \sigma_{\min}(J)^2 = \kappa(J) / \sigma_{\min}(J)$ , where $\sigma_{\min}, \sigma_{\max}$ are singular values of $J$ . For ill-conditioned $J$ , small data noise $\|\delta\hat{\sigma}\|$ causes large parameter perturbation $\|\delta\theta\|$ .

SVD Decomposition of the Calibration Problem

Write the SVD of the Jacobian: $J = U \Sigma V^\top$ , where $U \in \mathbb{R}^{N \times p}$ , $\Sigma = \mathrm{diag}(\sigma_1, \ldots, \sigma_p)$ , $V \in \mathbb{R}^{p \times p}$ .

The unregularised least-squares solution is:

$\delta\theta_{\mathrm{LS}} = V \Sigma^{-1} U^\top \delta\hat{\sigma} = \sum_{j=1}^p \frac{u_j^\top \delta\hat{\sigma}}{\sigma_j} v_j.$

Small singular values $\sigma_j \approx 0$ amplify the noise component $u_j^\top \delta\hat{\sigma}$ arbitrarily. This is the mechanism of instability.

Tikhonov Regularisation

The Penalised Objective

Replace the unconstrained calibration objective with:

$F_\lambda(\theta) = \underbrace{\|r(\theta)\|^2}_{\text{data fit}} + \lambda \underbrace{\|\Gamma(\theta - \theta_0)\|^2}_{\text{regularisation penalty}},$

where:

$\lambda > 0$ : regularisation parameter, controls the bias-variance tradeoff.
$\Gamma \in \mathbb{R}^{q \times p}$ : regularisation matrix (often the identity $I$ or a difference operator).
$\theta_0$ : prior (e.g., yesterday's calibrated parameters, or a reference set of parameters).

Interpretation: We seek parameters that fit the market data AND are close to the prior $\theta_0$ . The penalty $\lambda\|\Gamma(\theta - \theta_0)\|^2$ discourages large deviations from the prior.

Modified Normal Equations

The first-order condition for the penalised objective (linearising $r$ around $\theta_0$ ):

$(J^\top J + \lambda \Gamma^\top \Gamma)\, \delta\theta = J^\top (J\theta_0 - r(\theta_0)) - \lambda \Gamma^\top \Gamma \theta_0 + \ldots$

More practically, in the LM setting, the Tikhonov-LM update is:

$(J^\top J + \lambda D + \mu \Gamma^\top \Gamma)\, \delta\theta = -J^\top r - \mu \Gamma^\top \Gamma (\theta - \theta_0),$

where $\lambda$ is the LM damping and $\mu$ is the Tikhonov strength. The regularisation term $\mu\Gamma^\top\Gamma$ adds to the diagonal (or subdiagonal if $\Gamma$ is a difference operator), directly bounding the minimum eigenvalue of the system matrix from below.

Bias-Variance Tradeoff

Small $\lambda$ : The solution minimises the data fit with little constraint. The estimator is approximately unbiased ( $\mathbb{E}[\hat{\theta}] \approx \theta^*$ ) but highly variable: small noise in $\hat{\sigma}$ causes large swings in $\hat{\theta}$ . High variance.

Large $\lambda$ : The solution is heavily pulled towards the prior $\theta_0$ . The estimator has low variance (stable day-to-day calibration) but is biased away from the true parameters. High bias.

Optimal $\lambda$ : Minimises the mean squared error $\mathbb{E}[\|\hat{\theta} - \theta^*\|^2] = \mathrm{Bias}^2 + \mathrm{Variance}$ . Methods to select $\lambda$ are discussed below.

Effect on Singular Values

With $\Gamma = I$ , the regularised singular values become:

$\hat{\theta}_{\lambda} = \sum_{j=1}^p \frac{\sigma_j}{\sigma_j^2 + \lambda}\, u_j^\top r\, v_j.$

The factor $\sigma_j / (\sigma_j^2 + \lambda)$ dampens directions with small $\sigma_j$ : for $\sigma_j \gg \sqrt{\lambda}$ , it is approximately $1/\sigma_j$ (no regularisation effect); for $\sigma_j \ll \sqrt{\lambda}$ , it is approximately $\sigma_j/\lambda$ (strongly suppressed). This is a soft truncation of the SVD.

Parameter Selection Methods

L-Curve Method

Plot $\log \|r(\theta_\lambda)\|^2$ (residual norm) versus $\log \|\Gamma(\theta_\lambda - \theta_0)\|^2$ (regularisation norm) for a range of $\lambda$ values. The curve is typically L-shaped:

Horizontal arm (small $\lambda$ ): residual small but regularisation term large (overfitting, unstable parameters).
Vertical arm (large $\lambda$ ): residual large (underfitting) but regularisation norm small.
Corner: the optimal $\lambda$ that balances fit and stability.

This topic requires Premium

Only today's featured topic is free. Unlock the full Today's Focus archive with Premium.

View pricing →Browse free content

Read the theory? Run the code.

View Notebook→