Setup
Motivation
Black-Scholes assumes volatility is constant. The empirical reality is that volatility is stochastic: it clusters, mean-reverts, and correlates with the spot. The simplest tractable model that captures these effects is the Heston (1993) model, in which the instantaneous variance follows a mean-reverting square-root (CIR) process correlated with the spot.
The Heston model is analytically semi-tractable: European option prices are available via Fourier inversion of an explicit characteristic function. This makes calibration computationally feasible.
Assumptions and Parameters
Let be the risk-neutral filtered probability space. The Heston dynamics are:
Parameters — all must be stated before use:
- : risk-free rate (continuously compounded, constant).
- : mean-reversion speed of variance.
- : long-run variance (mean level to which reverts).
- : volatility of variance (vol-of-vol).
- : correlation between spot and variance Brownian motions.
- : initial instantaneous variance.
The instantaneous volatility is and the implied vol at ATM at time 0 is approximately for short maturities.
The Feller Condition
The variance process is a Cox-Ingersoll-Ross (CIR) process. Its boundary behaviour at zero depends on the ratio :
If Feller holds, the boundary is inaccessible: the variance process remains strictly positive almost surely. This ensures is well-defined at all times.
If Feller fails (), the variance process can reach zero with positive probability and reflect. For , the boundary is reflecting (the process can touch zero and immediately leave); for , the boundary is regular.
In practice, calibrated Heston parameters frequently violate the Feller condition — market-implied vol-of-vol is often large relative to mean reversion. This is not a catastrophic failure: the SDE remains well-posed (with reflecting boundary), but numerical simulation requires care. The full-truncation Euler scheme (described below) handles this robustly.
Characteristic Function and the Lewis Formula
Characteristic Function
The key to semi-analytic pricing is that the log-price has a known conditional characteristic function under :
where the functions and are given by the Albrecher et al. (2007) stable formulation:
Why the Albrecher Formulation?
The original Heston (1993) characteristic function uses a different branch of the square root for . For long maturities or large , the original formulation can produce branch-cut discontinuities that cause the characteristic function to be evaluated on the wrong Riemann sheet, leading to mispriced options. The Albrecher formulation avoids this by choosing a consistent branch throughout. Always use the Albrecher version in production.
Lewis Formula for European Options
Lewis (2001) derived a Fourier pricing formula that avoids the dampening-parameter sensitivity of Carr-Madan. For a European call with log-strike where :
The integrand is real-valued (after taking the real part) and rapidly decaying. Numerical integration via adaptive Gauss-Legendre or Gauss-Kronrod is standard. This integral does not require a dampening parameter and is numerically stable for .
Derivation sketch. Express the call price as . Write the payoff in terms of the log-price, apply Parseval's identity in Fourier space, and use the characteristic function of the log-price. The denominator arises from the Fourier transform of the call payoff along a contour .
Simulation: Full-Truncation Euler Scheme
Monte Carlo simulation of the Heston SDE requires a discretisation of the CIR process for . Naive Euler:
can produce negative values of , making undefined.
The full-truncation Euler scheme (Lord et al., 2010):
where . The truncated value is used in the drift and diffusion coefficient, but itself can go negative and is carried forward (the truncation only applies at the point of use). This scheme is consistent and weakly convergent of order 1.
For the log-price:
where and are independent standard normals scaled by .
Antithetic Variates
Variance reduction via antithetic paths: for each path of random normals, generate a paired path . The payoff estimate is the average of the two. Under Heston, antithetic variates reduce variance significantly for nearly-ATM options.
Quadratic-Exponential Scheme (Advanced)
Andersen's (2008) Quadratic-Exponential (QE) scheme is the benchmark for Heston simulation. It matches the conditional moments of the CIR distribution exactly (rather than approximating them via Euler). The scheme switches between:
- A quadratic approximation of the CIR conditional distribution for high .
- An exponential approximation for low (near the boundary).
The QE scheme eliminates the "Feller violation bias" that affects Euler schemes when the Feller condition fails, at the cost of slightly more complex implementation.
Calibration via Levenberg-Marquardt
Calibration Setup
Given market-observed call prices for (or equivalently, implied vols ), find the Heston parameters minimising:
subject to: , , .
Weights typically down-weight deep OTM options (low liquidity, high bid-ask spread) and up-weight near-the-money options.
Levenberg-Marquardt Algorithm
Given residuals , define the Jacobian . The LM update is:
where is the damping parameter and (Marquardt scaling). When , this becomes Gauss-Newton; when , it becomes a gradient descent step. is increased on failure and decreased on success.
Jacobian computation. The Jacobian is computed either by finite differences (first-order accurate, expensive) or via differentiation of the Lewis integral (analytic, preferred). For the characteristic function parametrisation, analytic sensitivities of with respect to each parameter exist in closed form.
Calibration Pitfalls
Flat loss surface. The Heston loss surface has a shallow valley: many combinations of with similar product (which determines the long-run variance contribution) fit the surface similarly. Regularisation or Tikhonov penalty on is often needed to avoid degenerate solutions.
Parameter instability. Daily recalibration of Heston parameters often produces large day-to-day changes in and despite stable implied vol surfaces. This is a calibration instability issue — the surface is only weakly identified in these parameters. Practitioners often fix or and calibrate fewer free parameters.
Local minima. The LM algorithm is a local method. Initialisation matters significantly. A common practice is to run multiple restarts from diverse initial points and select the global minimum. Differential evolution or particle swarm algorithms are sometimes used for global initialisation.
Limitations
Feller violation in practice. Calibrated parameters frequently violate the Feller condition (). This means the variance process touches zero with positive probability. The Heston SDE remains well-posed (reflecting boundary), but certain moments of the distribution diverge. Specifically, for large, the moment generating function of may fail to exist for large argument — leading to explosions in FFT-based pricing that requires dampening.
Tail behaviour. Heston generates heavier tails than Black-Scholes but still sub-exponential tail decay. Models with jumps (Bates 1996 = Heston + Merton jumps; Barndorff-Nielsen-Shephard) are needed to match short-maturity skew and OTM put prices.
Rough volatility. Empirical studies (Gatheral, Jaisson, Rosenbaum 2018) show that realised volatility has a rough Hölder regularity consistent with fractional Brownian motion with , not the smooth CIR dynamics of Heston (). Rough Heston (El Euch and Rosenbaum 2019) extends the characteristic function to fractional Riccati equations at significant computational cost.
Single-factor variance. One-factor variance fails to capture the full term structure of vol-of-vol. Heston with two variance factors (double Heston) is used for more accurate term structure fitting.
Smile at short maturities. Heston produces insufficient skew at short maturities (weeks to one month) because the vol-of-vol impact on the smile is proportional to . Jump models are better suited here.
Interview Angle
L1. Write down the Heston SDE system. State the Feller condition and explain what happens when it is violated. What does each parameter control intuitively?
: speed at which variance mean-reverts. High → fast reversion → variance stays near , less impact of initial at long maturities. : long-run variance; is the long-run implied vol. : vol of variance — controls smile curvature. : spot-variance correlation — controls skew (negative → left skew). : initial variance — controls ATM vol at short maturities.
Feller: If , almost surely. If violated, variance can hit zero and the SDE requires a boundary condition at zero. The process remains well-posed (absorbing or reflecting boundary depending on parameter range), but simulation requires a scheme (full-truncation Euler, QE) that handles correctly.
L2. Derive the Heston characteristic function structure — you do not need to compute and in closed form, but explain how the Feynman-Kac approach converts the pricing PDE into a Riccati ODE system. State the Lewis formula and explain why the Albrecher branch is preferred over the original Heston formulation.
Riccati ODE approach. The Heston model is affine: is linear in the state . This follows because the generator of applied to separates into terms of order 0 and 1 in , yielding coupled first-order ODEs for and :
This Riccati ODE has an explicit solution (quadratic formula). The branch choice in taking the square root of the Riccati discriminant determines whether one is on the correct Riemann sheet. The Albrecher formulation selects the branch that is continuous in for all maturities.
L3. Explain the calibration instability of the Heston model: why are and poorly identified from market implied vols? How would you regularise the calibration? Discuss the QE scheme: what does it improve over Euler, and what is the computational cost?
Identification. The ATM vol term structure is primarily driven by and the initial variance . The smile curvature is driven by . But and individually have similar effects at short and medium maturities: a model with high and high can produce the same term structure as one with low and lower . The loss surface has a ridge along the locus. Adding a Tikhonov penalty (penalising deviation from the prior-day calibration) stabilises the solution and produces smoother daily recalibration.
QE vs Euler. Euler truncates at zero and introduces a bias proportional to near the boundary. QE matches the conditional mean and variance of the CIR distribution (first and second moments of the transition law), eliminating the bias. The cost is one inversion of the conditional CDF (via a quadratic or exponential approximation), roughly 2–3× the cost per step of Euler, but requires far fewer steps for equivalent accuracy.