Setup
The Levenberg-Marquardt algorithm requires the Jacobian at every iteration. For a Heston calibration with instruments and parameters, has 200 entries. For an LMM calibration with swaptions and parameters, has 5000 entries — each requiring a pricing function evaluation or a derivative computation.
How is computed determines the speed and accuracy of the entire calibration pipeline. There are four approaches: (i) analytic differentiation, (ii) finite differences, (iii) complex step differentiation, and (iv) automatic differentiation (AAD). Each involves a different trade-off between implementation cost, accuracy, and computational load.
Financial Insight. On a production vol desk, the Jacobian must be computed in milliseconds. A Heston calibration running every minute with 40 instruments and 5 parameters requires additional function evaluations per iteration (central FD) on top of the base evaluation — a overhead. AAD reduces this to approximately (one additional backward pass), making real-time calibration feasible. Understanding this trade-off is the difference between a research-grade implementation and a production system.
Assumptions:
- The pricing function (or ) is smooth in on the domain of interest.
- Floating-point arithmetic: all computations are in IEEE 754 double precision, machine epsilon .
- For AAD: the pricing function is implemented as a sequence of elementary operations (addition, multiplication, , , etc.) — the standard assumption for any AD tool.
Theory
Finite Differences: Forward and Central
Definition 3.1 (Finite Difference Approximations). For scalar :
Forward difference (first order):
Backward difference (first order):
Central difference (second order):
Error analysis: By Taylor's theorem:
For the forward difference:
Truncation error: . Roundoff error from floating-point evaluation of : .
For the central difference:
Truncation error: . Roundoff error: (same structure, since cancels but floating-point errors in persist).
Theorem 3.1 (Optimal Step Size for Finite Differences). The total error (truncation + roundoff) is minimised at:
-
Forward FD:
-
Central FD:
At the optimal , the forward FD achieves accuracy and the central FD achieves .
Central FD is times more accurate than forward FD at their respective optimal step sizes, at the cost of twice as many function evaluations. In calibration, the reduced error is almost always worth the extra evaluation.
Cost for the Jacobian:
| Method | Evaluations per row of | Total for |
|---|---|---|
| Forward FD | (includes base) | (one base, perturbed) |
| Central FD | (no base reuse) | |
| Complex step | (same cost as forward, but no base cancel) |
For the Jacobian, all rows share the same perturbations: compute for , compute all residuals at each perturbed point. Total: (forward) or (central) calls to the full pricing engine (all instruments).
Remark. For Heston (, central FD): 10 full pricing engine calls per LM iteration. For LMM (, central FD): 100 full pricing engine calls per iteration. With 50 LM iterations and 1 ms per pricing call, LMM calibration takes — acceptable for end-of-day but not for real-time.
The Complex Step Method
Theorem 3.2 (Complex Step Derivative). If extends to an analytic function on a strip of the complex plane containing the real axis, then for any : where the error is purely from the Taylor remainder, with no cancellation error from floating-point subtraction.
Derivation: By the Taylor expansion of in :
Taking the imaginary part:
Dividing by : .
The key advantage: there is no subtraction of nearly-equal quantities. Catastrophic cancellation — the main source of roundoff error in FD — does not occur. One can use and achieve machine-precision accuracy.
Warning — Implementation Requirement. The complex step method requires evaluating
with complex-number arguments. The entire pricing chain — characteristic functions, numerical
integrations, norm distributions — must support complex input. This is straightforward for
Black-Scholes (the NormCdf can be extended to complex arguments via the error function),
but requires careful implementation for models involving branching logic (if x > 0).
A naive implementation that uses fabs(x) (real absolute value) will give wrong derivatives
when applied to a complex number.
Automatic Differentiation (AAD)
Finite differences and complex step compute one column of per perturbation (forward mode in the sense of perturbation direction). When (few parameters, many instruments), this is efficient. When or when computing a scalar objective gradient, the reverse mode of AAD computes all partial derivatives in a single backward pass.
Definition 3.2 (Computation Graph and Tape). Any differentiable computation can be written as a sequence of elementary operations (a computation graph or tape). Each depends on previous nodes; the output is .
Forward mode (tangent): propagate alongside values. Cost per derivative: one forward pass. Total for all derivatives: passes.
Reverse mode (adjoint): propagate adjoints backward from output to inputs. Cost: one forward pass (to record the tape) plus one backward pass. Computes the full gradient in passes — independent of .
Theorem 3.3 (Cost of Reverse-Mode AAD). For a scalar function , reverse-mode AAD computes the full gradient in at most times the cost of a single function evaluation, independent of .
For the calibration Jacobian , reverse mode computes one row per backward pass — backward passes in total. Forward mode computes one column per pass — forward passes in total. The efficient choice depends on the ratio :
| Mode | Passes | Prefer when |
|---|---|---|
| Forward (tangent) | (few parameters, many instruments) | |
| Reverse (adjoint) | (many parameters, few instruments) | |
| Central FD | No AAD available; small |
For Heston (, ): forward mode (5 passes) vs. reverse mode (40 passes) — forward wins. For LMM (, ): reverse mode (100 passes) vs. forward mode (50 passes) — forward mode still wins on pass count, but each pass is cheaper with AAD than with FD.
Financial Insight. The true benefit of reverse-mode AAD is not in the Jacobian count — it is in computing the gradient of the scalar objective in one backward pass. This is what makes AAD critical for gradient-based calibration: is scalar, and is computed in one backward pass regardless of . For LMM with : one backward pass vs. 100 forward passes with FD. This is the speedup that motivates AAD implementation on production desks.
Application: Analytic Vega in Black-Scholes
For a European call under Black-Scholes:
where .
The analytic derivative with respect to (vega) is:
where is the standard normal PDF. This is the benchmark for validating FD and AAD implementations: the complex step or reverse-mode AAD result must match this to within .
Example 3.1. For , , , , :
-
Analytic vega:
-
Forward FD with : gives approximately with error .
-
Central FD with : matches to 10 significant figures.
-
Complex step with : matches to 14 significant figures.
The notebook verifies these numerically and plots the FD error as a function of .
Validation
The companion notebook verifies the four derivative methods on the Black-Scholes vega and on a 2-parameter nonlinear function, confirming:
- Forward FD error decreases as for and increases as for .
- Central FD error decreases as until the roundoff floor.
- Complex step achieves machine-precision accuracy for all tested values.
- A simple reverse-mode AAD implementation produces the correct gradient on a small model.
- The optimal step size predictions of Theorem 3.1 are verified empirically.
Before opening the notebook: For at : (a) Compute and analytically. (b) Estimate the forward FD error at . (c) What step size would you choose for central FD, and why?
Limitations
Warning — FD Step Size Selection. Using a single default for all parameters is wrong when parameters have different scales. For , gives a relative perturbation of — fine. For , gives a relative perturbation of — also fine. But for a correlation bounded in , a step of is fine, whereas a step of would give a wrong estimate. Scale parameters to before applying FD.
Warning — AAD Through Branching Code. Reverse-mode AAD requires the computation graph
to be differentiable. If the pricing function contains branches (if K > S) or discontinuous
operations (e.g., digital payoffs), the derivative may not exist at the branch point.
Smoothing or indicator relaxation is required. The complex step method has the same issue:
a real-valued branch if real(x) > 0 evaluated at a complex must be handled carefully.
Other limitations:
- Parallelism: Column-wise FD (one column of per perturbation) parallelises trivially over threads. Reverse-mode AAD is sequential per backward pass; parallelism requires batching.
- Memory cost of AAD: Reverse mode requires storing the tape (the entire computation graph) between the forward and backward passes. For complex pricing functions, this tape can be large. Checkpointing techniques trade recomputation for memory.
- Implicit functions: If the pricing function involves solving an implicit equation (e.g., Newton's method for implied vol inversion), the Jacobian requires the implicit function theorem: where defines the implied vol. This must be handled explicitly in AAD — naively differentiating through the Newton iterations works but is less efficient.
Interview Angle
L1 (Junior) — Typical questions:
-
What is a finite difference and when would you use it to compute a derivative? Expected: (forward) or (central). Use when analytic derivatives are unavailable. Know that must be chosen carefully.
-
What is the difference between forward and central finite differences? Expected: forward is accurate, central is . Central is more accurate at the cost of one extra function evaluation. Know the respective optimal step sizes.
-
What is vega? How would you compute it numerically? Expected: sensitivity of option price to implied vol. Forward or central FD on . Better: analytic formula for Black-Scholes.
L2 (Senior) — Typical questions:
-
Derive the optimal step size for central finite differences. What limits the accuracy? Expected: minimise truncation plus roundoff → . At , achieve accuracy. The limit is machine epsilon, not the Taylor truncation.
-
Explain the complex step method. Why is it immune to cancellation error? Expected: evaluate at complex argument ; take imaginary part divided by . The imaginary part is proportional to by the Taylor expansion; no real-minus-real cancellation occurs. Step can be arbitrarily small.
-
When is reverse-mode AAD preferred over forward-mode or finite differences? Expected: when the function is scalar (computing gradient of ) or when (fewer outputs than parameters). Derive the cost comparison.
L3 (Researcher) — Typical questions:
-
Explain the forward and reverse modes of automatic differentiation. What is the cost complexity of each? Expected: full derivation. Forward mode propagates alongside ; cost passes for full Jacobian. Reverse mode propagates adjoints backward; cost passes. For scalar objective: reverse mode is passes regardless of .
-
How do you differentiate through an implicit function (e.g., the Newton iteration for implied vol inversion)? Expected: implicit function theorem. If , then . This avoids differentiating through the Newton loop. Naive AAD through the loop works but is less numerically stable and more expensive.
-
Compare complex step and dual-number forward-mode AAD. Are they mathematically equivalent? Expected: both extend the real numbers to a two-component system. Dual numbers: with ; complex: with . Dual-number forward mode gives — exact to machine precision. Complex step uses so higher-order terms appear in the real part (but not the imaginary part if we take only the imaginary term). For derivatives: equivalent in practice. Dual numbers are exact; complex step has error from the imaginary part being