Non-Linear Least SquaresCalibration ObjectiveResidual VectorJacobianGauss-Newton

Model Calibration as a Non-Linear Least Squares Problem

Module 1 of 522 min readLevel: Medium

Setup

Model calibration is the process of choosing a parameter vector θRp\theta \in \mathbb{R}^p so that a model's prices (or implied volatilities) best reproduce a set of NN observed market quotes. It is the bridge between a theoretical model and a live trading system: a model with uncalibrated parameters is mathematically interesting but commercially useless.

INSIGHT

Financial Insight. Every derivatives desk calibrates models to market prices every morning — and often intraday after large moves. A vol desk calibrating Heston to the S&P 500 surface runs this optimisation on 30–60 strike-expiry pairs in under a second. A rates desk calibrating an LMM to the swaption cube may solve a sequence of smaller optimisations (cascade calibration) over 50–100 swaptions. The mathematical framework is the same in each case: a non-linear least-squares (NLS) problem.

Assumptions for this module:

  • The model Cmodel(K,T;θ)C^{\text{model}}(K, T; \theta) is a smooth function of θ\theta on an open domain ΘRp\Theta \subseteq \mathbb{R}^p (smoothness ensures the Jacobian exists).
  • Market quotes are point observations: we observe NN prices or implied vols {yi}i=1N\{y_i\}_{i=1}^N at specified strikes and expiries, with no bid-ask structure modelled.
  • The pricing function is deterministic: given θ\theta, CmodelC^{\text{model}} is evaluated exactly (no Monte Carlo noise; this assumption is relaxed when we treat calibration under MC pricing).
  • We work with implied volatility space by default. Fitting in price space is discussed under Limitations.
  • Constraints on parameters (positivity, correlation bounds, Feller condition) are handled via bound constraints or interior-point methods. This module focuses on the unconstrained problem; constraints enter naturally in the Jacobian rank and KKT conditions.

Theory

The Calibration Objective Function

DEFINITION

Definition 1.1 (Calibration Residuals and Objective). Let θΘ\theta \in \Theta be a parameter vector. For each instrument i=1,,Ni = 1, \ldots, N, define the residual ri(θ)  =  yifi(θ)r_i(\theta) \;=\; y_i - f_i(\theta) where yiy_i is the market quote (e.g.\ implied vol) and fi(θ)f_i(\theta) is the model's corresponding output. The weighted NLS objective is L(θ)  =  12i=1Nwi2ri(θ)2  =  12Wr(θ)22\mathcal{L}(\theta) \;=\; \tfrac{1}{2}\sum_{i=1}^N w_i^2\, r_i(\theta)^2 \;=\; \tfrac{1}{2}\| W r(\theta) \|_2^2 where W=diag(w1,,wN)W = \mathrm{diag}(w_1, \ldots, w_N) is a diagonal weight matrix and r(θ)=(r1(θ),,rN(θ))RNr(\theta) = (r_1(\theta), \ldots, r_N(\theta))^\top \in \mathbb{R}^N.

The factor of 12\tfrac{1}{2} is conventional — it cancels the factor of 2 in gradient expressions. The objective is everywhere non-negative, with global minimum at L=0\mathcal{L}^* = 0 if and only if a perfect fit exists (an exact solution r(θ)=0r(\theta^*) = 0).

Weighting choices in practice:

Schemewiw_iRationale
Uniform11Equal treatment of all instruments
Bid-ask inverse1/(σaskσbid)1/(\sigma_{\text{ask}} - \sigma_{\text{bid}})Down-weights illiquid wings
Vega-weightedVi/Vˉ\mathcal{V}_i / \bar{\mathcal{V}}Equivalent to fitting price errors scaled by vega
Relative1/yi1/y_iPenalises relative (not absolute) misfit

On a live vol desk, bid-ask weights are standard: wide spreads signal illiquid options whose market quotes are less reliable. Uniform weighting in vol space is the starting point for calibration diagnostics.

The Jacobian Matrix

DEFINITION

Definition 1.2 (Jacobian). The Jacobian of the residual vector is J(θ)  =  rθ    RN×pJ(\theta) \;=\; \frac{\partial r}{\partial \theta} \;\in\; \mathbb{R}^{N \times p} with entries Jij(θ)=ri/θj=fi/θjJ_{ij}(\theta) = \partial r_i / \partial \theta_j = -\partial f_i / \partial \theta_j.

(The sign follows from ri=yifir_i = y_i - f_i; market quotes yiy_i are independent of θ\theta.)

The Jacobian encodes the sensitivity of every model output to every parameter. For a 5-parameter Heston model fitted to 40 market vols, JJ is 40×540 \times 5 — a tall, overdetermined system.

Gradient and Hessian

Differentiating the objective:

θL(θ)  =  J(θ)W2r(θ)\nabla_\theta \mathcal{L}(\theta) \;=\; -J(\theta)^\top W^2 r(\theta)

At a local minimum θ\theta^*, the first-order condition is:

J(θ)W2r(θ)  =  0J(\theta^*)^\top W^2 r(\theta^*) \;=\; 0

This says the weighted residuals are orthogonal to the column space of JJ — a geometric statement that the residual vector is in the orthogonal complement of the model's tangent plane.

The exact Hessian involves second derivatives of the model:

H(θ)  =  J(θ)W2J(θ)    i=1Nwi2ri(θ)θ2fi(θ)H(\theta) \;=\; J(\theta)^\top W^2 J(\theta) \;-\; \sum_{i=1}^N w_i^2 r_i(\theta)\, \nabla^2_\theta f_i(\theta)

DEFINITION

Definition 1.3 (Gauss-Newton Hessian Approximation). The Gauss-Newton approximation discards the second-order term: HGN(θ)  =  J(θ)W2J(θ)H^{\text{GN}}(\theta) \;=\; J(\theta)^\top W^2 J(\theta)

The Gauss-Newton approximation is valid when: (i) residuals are small at the solution (model fits well), or (ii) the model fif_i is nearly linear in θ\theta (second derivatives are negligible). In practice, both conditions are approximately satisfied at convergence, making Gauss-Newton the foundation of all efficient NLS solvers.

REMARK

Remark. HGN=JW2JH^{\text{GN}} = J^\top W^2 J is always positive semi-definite (PSD). The exact Hessian may not be PSD away from the solution. Using HGNH^{\text{GN}} guarantees descent directions even far from the optimum.

The Normal Equations

Setting the gradient to zero and using the Gauss-Newton approximation:

THEOREM

Theorem 1.1 (Gauss-Newton Normal Equations). The Gauss-Newton update step δθ\delta\theta satisfies the normal equations: (JW2J)δθ  =  JW2r(θ)\left(J^\top W^2 J\right) \delta\theta \;=\; J^\top W^2\, r(\theta) The Gauss-Newton update is then θ(k+1)=θ(k)+δθ\theta^{(k+1)} = \theta^{(k)} + \delta\theta.

This is a p×pp \times p linear system. For small pp (Heston has 5 parameters), it is cheap to solve via Cholesky factorisation. The system is singular when JJ has rank less than pp — signalling parameter identifiability problems.

EXAMPLE

Example 1.1. For a 2-parameter model (θ=(α,ν)\theta = (\alpha, \nu)) fitted to N=5N=5 vol quotes, JR5×2J \in \mathbb{R}^{5\times 2}, and JW2JR2×2J^\top W^2 J \in \mathbb{R}^{2\times 2}. The normal equations reduce to a 2×22\times 2 linear system solvable in closed form. The solution is the least-squares estimate of the parameter update.

Condition Number and Well-Posedness

The condition number κ(JW2J)\kappa(J^\top W^2 J) determines how sensitive the solution is to perturbations in market data:

THEOREM

Theorem 1.2 (Sensitivity of NLS Solution). If the calibration objective has a unique minimum θ\theta^* and the Jacobian JJ^* has full column rank pp, then the sensitivity of θ\theta^* to a perturbation δy\delta y in market quotes satisfies: δθθ    κ(JW2J)  δyy\frac{\|\delta\theta\|}{\|\theta^*\|} \;\lesssim\; \kappa(J^{*\top} W^2 J^*)\; \frac{\|\delta y\|}{\|y\|} A large condition number indicates that small changes in market quotes produce large changes in calibrated parameters — a signature of ill-posedness.

In Heston calibration, the condition number of JJJ^\top J is often 10410^410610^6. This is why regularisation (Module: Regularisation & Stability) is necessary before reporting model parameters as "stable".

Price Space vs. Volatility Space

A critical practical choice is whether to calibrate in price space or implied vol space.

WARNING

Warning — Price Space Calibration. If fi(θ)f_i(\theta) represents model prices (in dollars), deeply in-the-money options have prices close to their intrinsic value and carry negligible model sensitivity. Deep OTM options have small prices but carry most of the smile information. Minimising wi2(PimktPimodel)2\sum w_i^2 (P_i^{\text{mkt}} - P_i^{\text{model}})^2 in price space implicitly up-weights expensive (ITM) options and down-weights cheap (OTM) ones — the opposite of what the vol desk cares about. Calibrate in implied vol space or use vega weighting to transform back to approximately uniform vol sensitivity.

Bound Constraints and KKT Conditions

Most calibration problems impose bounds: θj[lj,uj]\theta_j \in [l_j, u_j] for all jj. The first-order optimality condition at a constrained solution θ\theta^* is given by the Karush-Kuhn-Tucker (KKT) conditions:

θL(θ)+j(μjuμjl)=0,μju0,μjl0,μju(ujθj)=0,μjl(θjlj)=0\nabla_\theta \mathcal{L}(\theta^*) + \sum_j (\mu_j^u - \mu_j^l) = 0, \qquad \mu_j^u \ge 0,\quad \mu_j^l \ge 0, \qquad \mu_j^u (u_j - \theta_j^*) = 0, \qquad \mu_j^l (\theta_j^* - l_j) = 0

Practically: parameters at their bounds have non-zero multipliers (they are constrained by the bound, not by the objective). Parameters strictly interior to their bounds satisfy the unconstrained first-order condition L/θj=0\partial \mathcal{L} / \partial \theta_j = 0.

INSIGHT

Financial Insight. In Heston calibration, the Feller condition 2κvˉξ22\kappa\bar{v} \ge \xi^2 ensures the variance process vtv_t stays positive. This is an inequality constraint on parameters. In practice, many calibrated Heston surfaces violate Feller — the optimiser is not enforcing it unless explicitly coded. A quant who does not check this is reporting mathematically invalid parameters.


Validation

The companion notebook constructs a 2-parameter toy model (parameterised normal smile), generates synthetic market quotes, then verifies:

  1. The residual vector r(θ)r(\theta) is zero at the true parameters.
  2. The Jacobian computed by central finite differences matches the analytic sensitivities to within O(h2)O(h^2) accuracy.
  3. The Gauss-Newton step from a perturbed initial point moves toward the true solution.
  4. The condition number of JJJ^\top J changes with the choice of instruments (uniform strikes vs. clustered at the money).
PRACTICE

Before opening the notebook: For the function f(θ)=θ1eθ2xf(\theta) = \theta_1 e^{-\theta_2 x} evaluated at x=1,2,3x = 1, 2, 3 with observations y=(2.0,1.2,0.7)y = (2.0, 1.2, 0.7): (a) Write down the 3×23\times 2 Jacobian JJ at θ=(2.0,0.5)\theta = (2.0, 0.5). (b) Compute the Gauss-Newton update δθ\delta\theta. (c) Is the solution unique? What would make the system degenerate?


Limitations

WARNING

Warning — Non-Convexity. The NLS objective for any nontrivial model (Heston, SABR, LMM) is non-convex in θ\theta. The Gauss-Newton algorithm finds a local minimum, not the global one. Multiple local minima can yield very different parameter sets that fit the market equally well numerically but extrapolate very differently. Global search (grid initialisation, differential evolution, or basin hopping) is required before trusting a calibrated parameter set.

WARNING

Warning — Overfitting the Wing. With many parameters (e.g. LMM with 50+ rates), the model has enough degrees of freedom to fit every market quote exactly, including noise. Perfect fit (L=0\mathcal{L}^* = 0) is not the goal; a smooth fit that generalises to unobserved strikes is. Regularisation terms penalising parameter roughness are necessary — see the Regularisation & Stability module.

Other limitations:

  • Sparse market data: Far-dated expiries often have 3 quoted strikes. JJJ^\top J with rank deficiency produces unstable parameter estimates. Fix: regularisation or parameter fixing.
  • MC pricing noise: If model prices are computed by Monte Carlo, the objective has stochastic noise. Standard NLS solvers assume deterministic objectives; use of noisy objectives requires specialised methods (SPSA, stochastic approximation).
  • Non-smooth models: Jump models (Merton, Variance Gamma) have less smooth parameter dependence, degrading finite-difference Jacobian accuracy.
  • Numeraire and convention sensitivity: A mismatch between market vol convention (Bachelier vs. Black, forward vs. spot, ACT/360 vs. ACT/365) in yiy_i and fi(θ)f_i(\theta) produces a spurious calibration error that the optimiser will try — and partially succeed — to absorb via parameter distortion.

Interview Angle

PRACTICE

L1 (Junior) — Typical questions:

  1. What does it mean to calibrate a model? Expected: fit model parameters to market prices/vols. Minimise sum of squared differences (or similar). Know that this is done fresh each day.

  2. Why calibrate in vol space rather than price space? Expected: price space up-weights expensive (ITM) options. Vol space treats all quoted strikes comparably. Vega weighting is a weighted compromise.

  3. What is the Jacobian in the context of calibration? Expected: matrix of sensitivities of model outputs to parameters, Jij=fi/θjJ_{ij} = \partial f_i/\partial \theta_j. Used to compute the Gauss-Newton update step.

PRACTICE

L2 (Senior) — Typical questions:

  1. Derive the gradient of the NLS objective. When is the Gauss-Newton Hessian approximation valid? Expected: gradient =JW2r= -J^\top W^2 r. GN approximation valid when residuals small (good fit) or model nearly linear. Full derivation expected.

  2. What does a high condition number of JJJ^\top J imply for calibration? Expected: small perturbations in market data produce large parameter changes — ill-conditioned problem. Parameters are not uniquely identified. Regularisation needed.

  3. Why do calibrated parameters often violate the Feller condition? Expected: because LM finds a local minimum of the vol-space objective, ignoring model validity constraints unless they are explicitly imposed. Parameters at the boundary of the feasible set require active bound handling.

PRACTICE

L3 (Researcher) — Typical questions:

  1. Under what conditions is the NLS calibration problem locally well-posed, and what can you say about the uniqueness of the global solution? Expected: locally well-posed if JJ^* has full column rank pp (implicit function theorem). Global uniqueness requires convexity of L\mathcal{L} — not available for stochastic vol models. Comment on the relationship to identifiability (whether the model is distinguishable from data).

  2. Compare L1, L2 (Huber), and L∞ loss functions for calibration. When would you prefer a robust loss? Expected: L2 sensitive to outliers (e.g. stale wing quotes). Huber loss = L2 near origin, L1 in tails, reduces outlier influence. L∞ (minimax) minimises worst-case fit. Discussion of robustness vs. differentiability trade-off.

  3. How does calibrating a model with unobservable state variables (like instantaneous variance vtv_t in Heston) change the problem structure? Expected: vtv_t is a latent variable. Can treat as a parameter (joint calibration of θ\theta and v0v_0), or estimate via a separate filter (particle filter, Kalman). Joint calibration is standard but v0v_0 is poorly identified from short-dated options alone.