Derivatives PricingStochastic VolatilityHeston ModelFourier PricingCalibration

Heston Model: Calibration and Simulation

Module 3 of 428 min readLevel: Hard

Setup

Motivation

Black-Scholes assumes volatility is constant. The empirical reality is that volatility is stochastic: it clusters, mean-reverts, and correlates with the spot. The simplest tractable model that captures these effects is the Heston (1993) model, in which the instantaneous variance follows a mean-reverting square-root (CIR) process correlated with the spot.

The Heston model is analytically semi-tractable: European option prices are available via Fourier inversion of an explicit characteristic function. This makes calibration computationally feasible.

Assumptions and Parameters

Let (Ω,F,Q)(\Omega, \mathcal{F}, \mathbb{Q}) be the risk-neutral filtered probability space. The Heston dynamics are:

dSt=rStdt+vtStdWt(1),S0>0,dS_t = r S_t \, dt + \sqrt{v_t}\, S_t \, dW_t^{(1)}, \qquad S_0 > 0,

dvt=κ(θvt)dt+ξvtdWt(2),v0>0,dv_t = \kappa(\theta - v_t)\, dt + \xi \sqrt{v_t}\, dW_t^{(2)}, \qquad v_0 > 0,

dW(1),W(2)t=ρdt.d\langle W^{(1)}, W^{(2)} \rangle_t = \rho \, dt.

Parameters — all must be stated before use:

  • rr: risk-free rate (continuously compounded, constant).
  • κ>0\kappa > 0: mean-reversion speed of variance.
  • θ>0\theta > 0: long-run variance (mean level to which vtv_t reverts).
  • ξ>0\xi > 0: volatility of variance (vol-of-vol).
  • ρ(1,1)\rho \in (-1, 1): correlation between spot and variance Brownian motions.
  • v0>0v_0 > 0: initial instantaneous variance.

The instantaneous volatility is vt\sqrt{v_t} and the implied vol at ATM at time 0 is approximately v0\sqrt{v_0} for short maturities.


The Feller Condition

The variance process vtv_t is a Cox-Ingersoll-Ross (CIR) process. Its boundary behaviour at zero depends on the ratio 2κθ/ξ22\kappa\theta / \xi^2:

Feller condition:2κθ>ξ2.\text{Feller condition}: \quad 2\kappa\theta > \xi^2.

If Feller holds, the boundary vt=0v_t = 0 is inaccessible: the variance process remains strictly positive almost surely. This ensures vt\sqrt{v_t} is well-defined at all times.

If Feller fails (2κθξ22\kappa\theta \leq \xi^2), the variance process can reach zero with positive probability and reflect. For 2κθ<ξ22\kappa\theta < \xi^2, the boundary is reflecting (the process can touch zero and immediately leave); for 2κθ=ξ22\kappa\theta = \xi^2, the boundary is regular.

In practice, calibrated Heston parameters frequently violate the Feller condition — market-implied vol-of-vol ξ\xi is often large relative to mean reversion. This is not a catastrophic failure: the SDE remains well-posed (with reflecting boundary), but numerical simulation requires care. The full-truncation Euler scheme (described below) handles this robustly.


Characteristic Function and the Lewis Formula

Characteristic Function

The key to semi-analytic pricing is that the log-price xT=ln(ST/S0)x_T = \ln(S_T/S_0) has a known conditional characteristic function under Q\mathbb{Q}:

φT(u)=EQ ⁣[eiuxT]=exp ⁣(iurT+A(u,T)+B(u,T)v0),\varphi_T(u) = \mathbb{E}^{\mathbb{Q}}\!\left[e^{iu x_T}\right] = \exp\!\left(iurT + A(u,T) + B(u,T)\, v_0\right),

where the functions AA and BB are given by the Albrecher et al. (2007) stable formulation:

d(u)=(ρξiuκ)2+ξ2(iu+u2),d(u) = \sqrt{(\rho\xi iu - \kappa)^2 + \xi^2(iu + u^2)},

g(u)=κρξiud(u)κρξiu+d(u),g(u) = \frac{\kappa - \rho\xi iu - d(u)}{\kappa - \rho\xi iu + d(u)},

B(u,T)=κρξiud(u)ξ21ed(u)T1g(u)ed(u)T,B(u, T) = \frac{\kappa - \rho\xi iu - d(u)}{\xi^2} \cdot \frac{1 - e^{-d(u)T}}{1 - g(u) e^{-d(u)T}},

A(u,T)=κθξ2[(κρξiud(u))T2ln ⁣(1g(u)ed(u)T1g(u))].A(u, T) = \frac{\kappa\theta}{\xi^2}\left[(\kappa - \rho\xi iu - d(u))\, T - 2\ln\!\left(\frac{1 - g(u) e^{-d(u)T}}{1 - g(u)}\right)\right].

Why the Albrecher Formulation?

The original Heston (1993) characteristic function uses a different branch of the square root for d(u)d(u). For long maturities or large u|u|, the original formulation can produce branch-cut discontinuities that cause the characteristic function to be evaluated on the wrong Riemann sheet, leading to mispriced options. The Albrecher formulation avoids this by choosing a consistent branch throughout. Always use the Albrecher version in production.

Lewis Formula for European Options

Lewis (2001) derived a Fourier pricing formula that avoids the dampening-parameter sensitivity of Carr-Madan. For a European call with log-strike k=ln(K/F)k = \ln(K/F) where F=S0erTF = S_0 e^{rT}:

C=S0FKerTπ0Re ⁣[φT ⁣(ui2)eiuku2+14]du.C = S_0 - \frac{\sqrt{FK} \cdot e^{-rT}}{\pi} \int_0^\infty \mathrm{Re}\!\left[\varphi_T\!\left(u - \frac{i}{2}\right) \frac{e^{-iu k}}{u^2 + \frac{1}{4}}\right] du.

The integrand is real-valued (after taking the real part) and rapidly decaying. Numerical integration via adaptive Gauss-Legendre or Gauss-Kronrod is standard. This integral does not require a dampening parameter α\alpha and is numerically stable for u(0,)u \in (0, \infty).

Derivation sketch. Express the call price as C=erTEQ[(STK)+]C = e^{-rT}\mathbb{E}^{\mathbb{Q}}[(S_T - K)^+]. Write the payoff in terms of the log-price, apply Parseval's identity in Fourier space, and use the characteristic function of the log-price. The 1/(u2+1/4)1/(u^2 + 1/4) denominator arises from the Fourier transform of the call payoff along a contour Im(u)=1/2\mathrm{Im}(u) = -1/2.


Simulation: Full-Truncation Euler Scheme

Monte Carlo simulation of the Heston SDE requires a discretisation of the CIR process for vtv_t. Naive Euler:

vt+Δt=vt+κ(θvt)Δt+ξvtΔWt(2)v_{t+\Delta t} = v_t + \kappa(\theta - v_t)\Delta t + \xi\sqrt{v_t}\, \Delta W_t^{(2)}

can produce negative values of vt+Δtv_{t+\Delta t}, making vt+Δt\sqrt{v_{t+\Delta t}} undefined.

The full-truncation Euler scheme (Lord et al., 2010):

vt+Δt=vt+κ(θvt+)Δt+ξvt+ΔWt(2),v_{t+\Delta t} = v_t + \kappa(\theta - v_t^+)\Delta t + \xi\sqrt{v_t^+}\, \Delta W_t^{(2)},

where vt+=max(vt,0)v_t^+ = \max(v_t, 0). The truncated value vt+v_t^+ is used in the drift and diffusion coefficient, but vt+Δtv_{t+\Delta t} itself can go negative and is carried forward (the truncation only applies at the point of use). This scheme is consistent and weakly convergent of order 1.

For the log-price:

xt+Δt=xt+(r12vt+)Δt+vt+(ρΔWt(2)+1ρ2ΔWt),x_{t+\Delta t} = x_t + \left(r - \frac{1}{2}v_t^+\right)\Delta t + \sqrt{v_t^+} \left(\rho \, \Delta W_t^{(2)} + \sqrt{1-\rho^2}\, \Delta W_t^\perp\right),

where ΔWt(2)\Delta W_t^{(2)} and ΔWt\Delta W_t^\perp are independent standard normals scaled by Δt\sqrt{\Delta t}.

Antithetic Variates

Variance reduction via antithetic paths: for each path {Zi(2),Zi}i=1n\{Z^{(2)}_i, Z^\perp_i\}_{i=1}^n of random normals, generate a paired path {Zi(2),Zi}i=1n\{-Z^{(2)}_i, -Z^\perp_i\}_{i=1}^n. The payoff estimate is the average of the two. Under Heston, antithetic variates reduce variance significantly for nearly-ATM options.

Quadratic-Exponential Scheme (Advanced)

Andersen's (2008) Quadratic-Exponential (QE) scheme is the benchmark for Heston simulation. It matches the conditional moments of the CIR distribution exactly (rather than approximating them via Euler). The scheme switches between:

  • A quadratic approximation of the CIR conditional distribution for high vtv_t.
  • An exponential approximation for low vtv_t (near the boundary).

The QE scheme eliminates the "Feller violation bias" that affects Euler schemes when the Feller condition fails, at the cost of slightly more complex implementation.


Calibration via Levenberg-Marquardt

Calibration Setup

Given market-observed call prices Cmkt(Ki,Ti)C_{\mathrm{mkt}}(K_i, T_i) for i=1,,Ni = 1, \ldots, N (or equivalently, implied vols σ^i\hat{\sigma}_i), find the Heston parameters θcal=(κ,θ,ξ,ρ,v0)\theta_{\mathrm{cal}} = (\kappa, \theta, \xi, \rho, v_0) minimising:

minκ,θ,ξ,ρ,v0i=1Nwi(σ^Heston(Ki,Ti;κ,θ,ξ,ρ,v0)σ^i)2,\min_{\kappa, \theta, \xi, \rho, v_0} \sum_{i=1}^{N} w_i \left(\hat{\sigma}_{\mathrm{Heston}}(K_i, T_i; \kappa, \theta, \xi, \rho, v_0) - \hat{\sigma}_i\right)^2,

subject to: κ,θ,ξ>0\kappa, \theta, \xi > 0, ρ(1,1)\rho \in (-1, 1), v0>0v_0 > 0.

Weights wiw_i typically down-weight deep OTM options (low liquidity, high bid-ask spread) and up-weight near-the-money options.

Levenberg-Marquardt Algorithm

Given residuals ri(θ)=σ^Heston(Ki,Ti;θ)σ^ir_i(\theta) = \hat{\sigma}_{\mathrm{Heston}}(K_i, T_i; \theta) - \hat{\sigma}_i, define the Jacobian Jij=ri/θjJ_{ij} = \partial r_i / \partial \theta_j. The LM update is:

δθ=(JJ+λD)1Jr,\delta\theta = -(J^\top J + \lambda D)^{-1} J^\top r,

where λ>0\lambda > 0 is the damping parameter and D=diag(JJ)D = \mathrm{diag}(J^\top J) (Marquardt scaling). When λ0\lambda \to 0, this becomes Gauss-Newton; when λ\lambda \to \infty, it becomes a gradient descent step. λ\lambda is increased on failure and decreased on success.

Jacobian computation. The Jacobian σ^Heston/θj\partial \hat{\sigma}_{\mathrm{Heston}} / \partial \theta_j is computed either by finite differences (first-order accurate, expensive) or via differentiation of the Lewis integral (analytic, preferred). For the characteristic function parametrisation, analytic sensitivities of φT\varphi_T with respect to each parameter exist in closed form.

Calibration Pitfalls

Flat loss surface. The Heston loss surface has a shallow valley: many combinations of (κ,θ)(\kappa, \theta) with similar product κθ\kappa\theta (which determines the long-run variance contribution) fit the surface similarly. Regularisation or Tikhonov penalty on θcalθprior2\|\theta_{\mathrm{cal}} - \theta_{\mathrm{prior}}\|^2 is often needed to avoid degenerate solutions.

Parameter instability. Daily recalibration of Heston parameters often produces large day-to-day changes in κ\kappa and ξ\xi despite stable implied vol surfaces. This is a calibration instability issue — the surface is only weakly identified in these parameters. Practitioners often fix κ\kappa or β\beta and calibrate fewer free parameters.

Local minima. The LM algorithm is a local method. Initialisation matters significantly. A common practice is to run multiple restarts from diverse initial points and select the global minimum. Differential evolution or particle swarm algorithms are sometimes used for global initialisation.


Limitations

Feller violation in practice. Calibrated parameters frequently violate the Feller condition (2κθ<ξ22\kappa\theta < \xi^2). This means the variance process touches zero with positive probability. The Heston SDE remains well-posed (reflecting boundary), but certain moments of the distribution diverge. Specifically, for ξ\xi large, the moment generating function of lnST\ln S_T may fail to exist for large argument — leading to explosions in FFT-based pricing that requires dampening.

Tail behaviour. Heston generates heavier tails than Black-Scholes but still sub-exponential tail decay. Models with jumps (Bates 1996 = Heston + Merton jumps; Barndorff-Nielsen-Shephard) are needed to match short-maturity skew and OTM put prices.

Rough volatility. Empirical studies (Gatheral, Jaisson, Rosenbaum 2018) show that realised volatility has a rough Hölder regularity consistent with fractional Brownian motion with H0.1H \approx 0.1, not the smooth CIR dynamics of Heston (H=0.5H = 0.5). Rough Heston (El Euch and Rosenbaum 2019) extends the characteristic function to fractional Riccati equations at significant computational cost.

Single-factor variance. One-factor variance fails to capture the full term structure of vol-of-vol. Heston with two variance factors (double Heston) is used for more accurate term structure fitting.

Smile at short maturities. Heston produces insufficient skew at short maturities (weeks to one month) because the vol-of-vol impact on the smile is proportional to ξ2T\xi^2 T. Jump models are better suited here.


Interview Angle

L1. Write down the Heston SDE system. State the Feller condition and explain what happens when it is violated. What does each parameter control intuitively?

κ\kappa: speed at which variance mean-reverts. High κ\kappa → fast reversion → variance stays near θ\theta, less impact of initial v0v_0 at long maturities. θ\theta: long-run variance; θ\sqrt{\theta} is the long-run implied vol. ξ\xi: vol of variance — controls smile curvature. ρ\rho: spot-variance correlation — controls skew (negative ρ\rho → left skew). v0v_0: initial variance — controls ATM vol at short maturities.

Feller: If 2κθ>ξ22\kappa\theta > \xi^2, vt>0v_t > 0 almost surely. If violated, variance can hit zero and the SDE requires a boundary condition at zero. The process remains well-posed (absorbing or reflecting boundary depending on parameter range), but simulation requires a scheme (full-truncation Euler, QE) that handles vt=0v_t = 0 correctly.

L2. Derive the Heston characteristic function structure — you do not need to compute AA and BB in closed form, but explain how the Feynman-Kac approach converts the pricing PDE into a Riccati ODE system. State the Lewis formula and explain why the Albrecher branch is preferred over the original Heston formulation.

Riccati ODE approach. The Heston model is affine: lnφT(u)\ln \varphi_T(u) is linear in the state vtv_t. This follows because the generator of (xt,vt)(x_t, v_t) applied to eA+Bve^{A + Bv} separates into terms of order 0 and 1 in vv, yielding coupled first-order ODEs for A(u,τ)A(u, \tau) and B(u,τ)B(u, \tau):

dBdτ=12ξ2B2+(ρξiuκ)B+12(u2+iu),B(u,0)=0.\frac{dB}{d\tau} = \frac{1}{2}\xi^2 B^2 + (\rho\xi iu - \kappa)B + \frac{1}{2}(-u^2 + iu), \qquad B(u,0) = 0.

This Riccati ODE has an explicit solution (quadratic formula). The branch choice in taking the square root of the Riccati discriminant determines whether one is on the correct Riemann sheet. The Albrecher formulation selects the branch that is continuous in uu for all maturities.

L3. Explain the calibration instability of the Heston model: why are κ\kappa and ξ\xi poorly identified from market implied vols? How would you regularise the calibration? Discuss the QE scheme: what does it improve over Euler, and what is the computational cost?

Identification. The ATM vol term structure is primarily driven by κθ\kappa\theta and the initial variance v0v_0. The smile curvature is driven by ξ\xi. But κ\kappa and θ\theta individually have similar effects at short and medium maturities: a model with high κ\kappa and high θ\theta can produce the same term structure as one with low κ\kappa and lower θ\theta. The loss surface has a ridge along the κθ=const\kappa\theta = \text{const} locus. Adding a Tikhonov penalty λregθθprev2\lambda_{\mathrm{reg}}\|\theta - \theta_{\mathrm{prev}}\|^2 (penalising deviation from the prior-day calibration) stabilises the solution and produces smoother daily recalibration.

QE vs Euler. Euler truncates at zero and introduces a bias proportional to ξ2Δt\xi^2 \Delta t near the boundary. QE matches the conditional mean and variance of the CIR distribution (first and second moments of the transition law), eliminating the bias. The cost is one inversion of the conditional CDF (via a quadratic or exponential approximation), roughly 2–3× the cost per step of Euler, but requires far fewer steps for equivalent accuracy.