Setup
The Calibration Problem
Model calibration is the process of finding parameter values such that a pricing model reproduces observed market prices. In quantitative finance, this typically means:
- Given: market-implied vols for instruments (strikes , maturities , put/call flags).
- Find: model parameters minimising the weighted sum of squared residuals.
For the Heston model, (mean-reversion speed, long-run variance, vol-of-vol, correlation, initial variance). For SABR, .
Why implied vols, not prices? Fitting prices across different maturities and strikes without normalisation gives excessive weight to long-dated options (higher absolute prices). Fitting implied vols gives roughly equal weight across the surface and aligns with how traders think about calibration quality.
Assumptions
- The pricing function is differentiable in (needed for the Jacobian).
- : at least as many market instruments as model parameters (identified system). For Heston, instruments is typical.
- The loss surface has a local minimum near the initial guess . For most models, multiple local minima exist — global optimisation or multi-start is needed in production.
The Nonlinear Least-Squares Objective
Define residuals :
where are weights (typically proportional to the liquidity or inverse bid-ask spread of instrument ). The objective is:
The factor is conventional and simplifies gradient expressions. The calibration seeks:
subject to box constraints (e.g., , ).
Theory: Levenberg-Marquardt
Gauss-Newton as the Foundation
The gradient and Hessian of are:
where is the Jacobian matrix with .
The Gauss-Newton (GN) approximation drops the second-order term (the sum of ), giving:
This is exact when the residuals are small (near the solution). The GN update solves:
GN converges quadratically when the residuals are small and the problem is well-conditioned, but fails when is singular or ill-conditioned (degenerate directions in parameter space).
The LM Damping Term
Levenberg-Marquardt adds a damping term to :
where is the damping parameter and is a diagonal matrix.
Two standard choices for :
- Levenberg (1944): (identity). Moves in the steepest-descent direction when is large.
- Marquardt (1963): . Scales each parameter direction by its gradient curvature, making the step invariant to parameter rescaling. This is the standard choice.
Interpolation between GN and gradient descent. When : the equation becomes , i.e., Gauss-Newton. When : the update becomes , a scaled steepest-descent step. LM adaptively chooses between these extremes.
Geometric Interpretation
The LM step minimises a quadratic approximation to within a trust region:
for some trust-region radius . The damping is the Lagrange multiplier for the constraint. Large corresponds to a small trust region (conservative step); small allows a large step.
Damping Parameter Update Rule
A standard update rule (Marquardt 1963, as refined by Moré 1978):
- Evaluate proposed step: .
- Compute gain ratio: where is the quadratic model: .
- Accept the step if (e.g., ): set and decrease (e.g., ).
- Reject the step if : increase (e.g., ) and retry.
This self-tuning makes LM robust: it contracts to gradient descent near bad regions and expands to GN near the solution.
Jacobian Computation
The Jacobian is the most expensive part of LM calibration.
Finite Differences (FD)
Forward differences (first-order accurate, order ):
Requires extra model evaluations per iteration (one per parameter direction). For Heston with parameters and instruments, this is Fourier integrals per LM step.
Central differences (second-order accurate, order ):
Twice the cost, but significantly more accurate for the same step size . Step size selection: for forward, for central.
Analytic Jacobian for Affine Models
For models with a closed-form characteristic function (e.g., Heston, Bates), the sensitivity of the model price to each parameter can be computed by differentiating under the Fourier integral:
For Heston, can be computed in closed form by differentiating the Riccati solution. This yields the exact Jacobian at the cost of 5 additional Fourier integrals (one per parameter), which is far cheaper than FD (especially when each Fourier integral prices all strikes simultaneously via FFT).
Convergence Criteria
LM terminates when any of the following conditions is met:
- Small gradient: (first-order optimality; gradient norm small).
- Small step: (parameter change negligible).
- Small residual: (objective small enough).
- Max iterations: (wall-clock budget exceeded).
In production, condition 1 is the cleanest termination test. The gradient norm is zero at any stationary point — including saddle points and local minima — so satisfying criterion 1 only guarantees a stationary point, not a global minimum. Always verify by restarting from alternative initial points.
Implementation
#include <Eigen/Dense>
#include <functional>
#include <vector>
#include <cmath>
#include <stdexcept>
// Calibration result
struct CalibResult {
Eigen::VectorXd params; // calibrated parameters
double final_rss; // residual sum of squares F(theta*)
int iters; // LM iterations used
bool converged;
};
// Levenberg-Marquardt calibrator.
//
// residuals_fn: given theta (p-vector), returns r (N-vector) of weighted residuals
// jacobian_fn: given theta, returns J (N x p matrix); pass nullptr for FD fallback
// theta0: initial parameter guess
// lb, ub: box constraints on parameters
// max_iter, tol: stopping criteria
CalibResult levenberg_marquardt(
std::function<Eigen::VectorXd(const Eigen::VectorXd&)> residuals_fn,
std::function<Eigen::MatrixXd(const Eigen::VectorXd&)> jacobian_fn,
const Eigen::VectorXd& theta0,
const Eigen::VectorXd& lb,
const Eigen::VectorXd& ub,
int max_iter = 200,
double tol_grad = 1.0e-8,
double tol_step = 1.0e-12,
double lambda0 = 1.0e-3
) {
const int p = static_cast<int>(theta0.size());
Eigen::VectorXd theta = theta0;
Eigen::VectorXd r = residuals_fn(theta);
double F = 0.5 * r.squaredNorm();
double lambda = lambda0;
bool converged = false;
for (int k = 0; k < max_iter; ++k) {
// Compute Jacobian
Eigen::MatrixXd J = jacobian_fn
? jacobian_fn(theta)
: [&]() {
// Central-difference fallback
Eigen::MatrixXd Jfd(r.size(), p);
const double h = 1.0e-5;
for (int j = 0; j < p; ++j) {
Eigen::VectorXd th_p = theta, th_m = theta;
th_p(j) += h; th_m(j) -= h;
Jfd.col(j) = (residuals_fn(th_p) - residuals_fn(th_m)) / (2.0 * h);
}
return Jfd;
}();
const Eigen::MatrixXd JtJ = J.transpose() * J;
const Eigen::VectorXd Jtr = J.transpose() * r;
// Convergence check: gradient norm
if (Jtr.lpNorm<Eigen::Infinity>() < tol_grad) {
converged = true;
break;
}
// Marquardt scaling: D = diag(J^T J), clamped to avoid zero
Eigen::VectorXd diag_D = JtJ.diagonal().cwiseMax(1.0e-16);
// Solve (J^T J + lambda * diag(D)) * delta = -J^T r
Eigen::MatrixXd A = JtJ;
A.diagonal() += lambda * diag_D;
const Eigen::VectorXd delta = A.ldlt().solve(-Jtr);
if (delta.norm() < tol_step * (1.0 + theta.norm())) {
converged = true;
break;
}
// Candidate step, projected onto box constraints
Eigen::VectorXd theta_new = (theta + delta).cwiseMax(lb).cwiseMin(ub);
Eigen::VectorXd r_new = residuals_fn(theta_new);
double F_new = 0.5 * r_new.squaredNorm();
// Gain ratio rho = (actual reduction) / (predicted reduction)
double predicted = -Jtr.dot(delta) - 0.5 * delta.dot(JtJ * delta);
double actual = F - F_new;
double rho = (predicted > 0.0) ? actual / predicted : -1.0;
if (rho > 0.25) {
// Accept step
theta = theta_new;
r = r_new;
F = F_new;
// Decrease lambda: expand trust region
lambda = std::max(lambda / 3.0, 1.0e-16);
} else {
// Reject step: contract trust region
lambda = std::min(lambda * 3.0, 1.0e+16);
}
}
return {theta, F, max_iter, converged};
}
Multi-Start Strategy
The LM algorithm is local. To avoid poor local minima, run multiple independent restarts from diverse initial points and select the solution with the lowest final objective:
// Simple multi-start wrapper
CalibResult multi_start_lm(
std::function<Eigen::VectorXd(const Eigen::VectorXd&)> residuals_fn,
std::function<Eigen::MatrixXd(const Eigen::VectorXd&)> jacobian_fn,
const std::vector<Eigen::VectorXd>& starts,
const Eigen::VectorXd& lb,
const Eigen::VectorXd& ub
) {
CalibResult best;
best.final_rss = std::numeric_limits<double>::infinity();
for (const auto& theta0 : starts) {
auto result = levenberg_marquardt(residuals_fn, jacobian_fn, theta0, lb, ub);
if (result.final_rss < best.final_rss)
best = result;
}
return best;
}
Validation
Synthetic Calibration Test
Plant known Heston parameters, generate a synthetic vol surface via the Lewis formula, then calibrate and verify recovery. A well-implemented calibrator should recover planted parameters to within in absolute parameter error for a noise-free surface of instruments.
import numpy as np
from scipy.optimize import least_squares
# Reference: planted Heston parameters
true_params = dict(kappa=2.0, theta=0.04, xi=0.4, rho=-0.6, v0=0.04)
# Generate synthetic surface (assume heston_implied_vol is implemented)
strikes = np.array([80, 90, 95, 100, 105, 110, 120] * 4, dtype=float)
maturities = np.repeat([0.25, 0.5, 1.0, 2.0], 7)
true_vols = np.array([
heston_implied_vol(K, T, **true_params)
for K, T in zip(strikes, maturities)
])
# Residuals function
def residuals(x):
kappa, theta, xi, rho, v0 = x
model_vols = np.array([
heston_implied_vol(K, T, kappa=kappa, theta=theta, xi=xi, rho=rho, v0=v0)
for K, T in zip(strikes, maturities)
])
return model_vols - true_vols # weights = 1 here
# Bounds: kappa, theta, xi > 0; rho in (-1, 1); v0 > 0
bounds = ([1e-4, 1e-4, 1e-4, -0.999, 1e-4],
[20.0, 1.0, 2.0, 0.999, 1.0])
# Initial guess — intentionally different from true values
x0 = [1.0, 0.06, 0.3, -0.4, 0.06]
result = least_squares(residuals, x0, bounds=bounds, method='trf', ftol=1e-12, xtol=1e-12)
print(f"Calibrated: kappa={result.x[0]:.4f}, theta={result.x[1]:.4f}, "
f"xi={result.x[2]:.4f}, rho={result.x[3]:.4f}, v0={result.x[4]:.4f}")
print(f"True: kappa=2.0000, theta=0.0400, xi=0.4000, rho=-0.6000, v0=0.0400")
print(f"Final RMS vol error: {np.sqrt(2 * result.cost / len(strikes)):.2e}")
Limitations
Local minima. The Heston loss surface contains ridges along which many parameter combinations produce nearly identical implied vol surfaces (e.g., the ridge). LM finds the nearest local minimum, which may be far from the global optimum. Multi-start or global optimisers (differential evolution, simulated annealing) are needed to escape these regions.
Parameter unidentifiability. The Heston model has five parameters but only two distinct effects dominate the implied vol surface: the ATM vol term structure (driven by , , ) and the smile curvature (driven by and ). The individual values of and are poorly identified — many pairs with similar produce indistinguishable surfaces. Regularisation or parameter reduction is required in practice.
Jacobian quality. FD Jacobians introduce truncation error that degrades convergence near the solution. The optimal step size for forward FD is on the order of , but choosing too small introduces cancellation error. Central differences are safer but 2× as expensive. Analytic Jacobians are preferred whenever available.
Calibration instability over time. Daily recalibration of all five Heston parameters produces large day-to-day jumps even when the vol surface moves smoothly. This instability is the result of a flat loss surface, not numerical noise. Tikhonov regularisation (penalising deviation from yesterday's calibration) is the standard fix — see the Regularisation module.
Constraint handling. The LM algorithm as described assumes unconstrained optimisation. Box constraints are handled approximately by projecting the step onto , which can cause the gain ratio to be computed incorrectly. A proper constrained LM (e.g., MINPACK's LMSTR) uses an active-set approach or interior-point scaling.
Interview Angle
L1. Explain the calibration setup for the Heston model. What is the objective function? Why use implied vols rather than prices as the calibration target?
Setup: Find minimising . Implied vols rather than prices: prices have very different magnitudes across strikes and maturities (deep OTM options are worth cents, ATM options worth dollars). Fitting prices gives enormous weight to expensive options. Implied vols are approximately normalised, giving the calibrator roughly equal influence per market instrument.
L2. Derive the LM update equation. Explain what happens to the step as and . What is the Marquardt scaling, and why is it preferable to the identity?
Derivation: LM minimises the second-order Taylor expansion of augmented by a penalty: Taking the gradient and setting to zero: . As : (Gauss-Newton, quadratic convergence near the solution). As : (scaled gradient descent, always descends). Marquardt scaling makes the step scale-invariant: multiplying parameter by a constant leaves the step unchanged, unlike the identity which biases towards parameters with small natural scale.
L3. Discuss the identifiability problem in Heston calibration. Formally characterise the ridge in the loss surface and propose a regularisation strategy. How would you construct the analytic Jacobian for the Heston model via the Lewis integral?
Identifiability: The ATM implied vol at maturity is approximately . For in a finite set, many combinations of produce the same sequence . The loss surface has a shallow valley along these combinations. Formally, the Hessian has small eigenvalues in these directions, producing a large condition number. Tikhonov regularisation adds to the objective, effectively constraining the solution to be near the previous day's calibration and bounding the condition number. The analytic Jacobian element is obtained by differentiating the Lewis integral: , where follows from differentiating the Riccati ODE solutions and with respect to .