Brownian Bridge™

1. The LM damped normal equations are $(J^\top J + \lambda D)\delta_\lambda = J^\top r$ . At $\lambda = 0$ , this gives the Gauss-Newton step. As $\lambda \to \infty$ , the step approaches:

A steepest-descent step: δ → −(1/λ) Jᵀr of length O(1/λ)A Newton step using the full HessianZero (the algorithm stops moving)The conjugate gradient direction

2. The gain ratio $\rho = (\mathcal{L}(\theta) - \mathcal{L}(\theta+\delta)) / (m(0) - m(\delta))$ . A gain ratio of $\rho = 0.95$ means:

The actual reduction in objective matches the predicted reduction well → decrease λThe actual reduction is much less than predicted → increase λThe step was rejected because ρ > 0.75 is too highThe algorithm has converged to a local minimum

3. Prove that the LM step $\delta_\lambda$ is always a descent direction. The key ingredient is:

The matrix (JᵀJ + λD) is positive definite for λ > 0, so ∇ℒ · δλ < 0The residuals r are always non-zeroThe Jacobian J has full rankThe objective function ℒ is convex

4. LM converges quadratically when the residuals vanish at the solution ( $r(\theta^*) = 0$ ) but only linearly when $r(\theta^*) \ne 0$ . What does slow convergence near the solution indicate about the model?

Model misspecification: the model cannot fit the market data exactlyWrong initial parameters: try a different starting pointJacobian is ill-conditioned: increase λConvergence tolerance is set too tight

5. A Heston calibration starts from a cold start and converges in 45 iterations. The next day, starting from the previous day's calibrated parameters, it converges in 7 iterations. The reason is:

The warm start is inside the basin of attraction — the linear model is already accurateThe optimizer uses a different algorithm on the second runThe market has not moved, so no real calibration is neededThe Jacobian is computed more accurately on the second run

6. The LM algorithm with $D = I$ is applied to a problem where one parameter $\kappa \approx 2.0$ and another $\xi \approx 0.3$ . What problem arises, and how does Marquardt's scaling $D = \mathrm{diag}(J^\top J)$ fix it?

D = I treats both parameters equally; diag(JᵀJ) scales each direction by its natural curvature, making the step scale-invariantD = I causes the algorithm to converge to the wrong local minimumD = I makes the normal equations singular for parameters of different magnitudeD = diag(JᵀJ) is only valid when all parameters have the same physical units

7. During a calibration run, $\lambda$ grows from $10^{-4}$ to $10^{10}$ without the gradient norm decreasing below tolerance. The most likely root cause is:

The Jacobian is computed incorrectly (numerical derivatives with wrong step size)The convergence tolerance is set too tightThe model has converged to the global minimumN < p (fewer instruments than parameters)

8. A calibration algorithm reports "converged" after 3 iterations, but the final residuals are large ( $\text{RMSE} = 5\%$ in vol space). What likely happened?

False convergence on the step-size criterion: λ was large, the step δ was tiny, but the gradient is still large — not a true minimumThe model is a perfect fit and residuals are expected to be largeThe algorithm used too many instruments and overfit the surfaceThe initial parameters were already at the global minimum

Quiz: The Levenberg-Marquardt Algorithm — Theory and Implementation

Quick Quiz