Heston ModelStochastic VolatilityCIR ProcessCharacteristic FunctionVol of VolCorrelationSmile Dynamics

The Heston Stochastic Volatility Model

Module 4 of 630 min readLevel: Hard

Setup

Market Context

Local volatility is a calibration tool, not a dynamical model of volatility. It reproduces any vanilla surface by construction, but it makes a specific and empirically wrong prediction: once the spot moves, the smile stays in place (sticky strike). Observed equity markets are closer to sticky delta: the ATM vol moves with the spot, and the smile translates with the forward. More fundamentally, local vol has zero vol-of-vol — realised volatility does not itself fluctuate — which contradicts the persistent empirical observation that volatility is stochastic.

Heston (1993) introduced the first analytically tractable stochastic volatility model. Volatility is driven by a separate mean-reverting process, and the model admits a semi-closed-form pricing formula via the characteristic function. Heston remains the most widely used stochastic vol model in equity and FX derivatives: it is the benchmark for calibration comparisons, the natural starting point for extensions (Bates jumps, rough vol), and the pedagogical foundation for understanding how stochastic vol generates the smile.

INSIGHT

Financial insight. On an equity derivatives desk, Heston is calibrated to the vanilla surface daily. It is used directly to price vanilla exotics where the smile dynamics matter more than perfect vanilla fit, and as the stochastic vol component in LSV models where it is paired with a Dupire leverage function for exact vanilla calibration. Understanding what the five parameters control — and what they cannot control — is essential knowledge for every equity quant.

Assumptions

  • The underlying StS_t and its instantaneous variance Vt=σt2V_t = \sigma_t^2 follow the Heston SDE system under the risk-neutral measure Q\mathbb{Q}.
  • VtV_t follows a CIR (Cox-Ingersoll-Ross) process: mean-reverting, non-negative (under the Feller condition), driven by a Brownian motion correlated with WtSW_t^S.
  • Correlation between spot and vol Brownian motions is ρ(1,1)\rho \in (-1, 1). For equities, ρ<0\rho < 0 (leverage effect: vol spikes when spot falls).
  • Interest rate rr and dividend yield qq are deterministic. The model is single-factor in rates.
  • No jumps. Heston is a pure diffusion model; adding jumps (Bates 1996) is an extension.
  • The market price of variance risk is proportional to VtV_t: λ(Vt)=λVt\lambda(V_t) = \lambda V_t. Under this specification, the risk-neutral dynamics preserve the affine CIR structure.

Theory

1. The Heston SDE System

Under the risk-neutral measure Q\mathbb{Q}:

DEFINITION

Definition 4.1 (Heston Model). The Heston (1993) model specifies the joint dynamics of spot StS_t and variance VtV_t as:

dSt=(rq)Stdt+VtStdWtSdS_t = (r - q)\,S_t\,dt + \sqrt{V_t}\,S_t\,dW_t^S

dVt=κ(θVt)dt+ξVtdWtVdV_t = \kappa(\theta - V_t)\,dt + \xi\sqrt{V_t}\,dW_t^V

dWS,WVt=ρdtd\langle W^S, W^V \rangle_t = \rho\,dt

Parameters:

  • κ>0\kappa > 0: speed of mean reversion of variance (units: yr1^{-1})
  • θ>0\theta > 0: long-run mean of variance (θ=σˉ2\theta = \bar{\sigma}^2 where σˉ\bar{\sigma} is long-run vol)
  • ξ>0\xi > 0: vol of vol — the diffusion coefficient of VtV_t (units: yr1/2^{-1/2})
  • ρ(1,1)\rho \in (-1,1): correlation between spot returns and variance increments
  • V0>0V_0 > 0: initial variance (often written v0=σ02v_0 = \sigma_0^2)

The log-price Xt=ln(St/S0)X_t = \ln(S_t/S_0) satisfies:

dXt=(rqVt2)dt+VtdWtSdX_t = \left(r - q - \frac{V_t}{2}\right)dt + \sqrt{V_t}\,dW_t^S

The variance process VtV_t is a CIR process — the same process used to model short rates in Cox-Ingersoll-Ross (1985). Key properties: Vt0V_t \geq 0 almost surely (under the Feller condition); the process is mean-reverting to θ\theta at rate κ\kappa; the diffusion scales as Vt\sqrt{V_t} so that variance cannot go negative.

2. The Feller Condition

THEOREM

Theorem 4.1 (Feller Condition). The CIR process VtV_t is strictly positive (Vt>0V_t > 0 a.s.) for all t>0t > 0 if and only if:

2κθξ22\kappa\theta \geq \xi^2

When 2κθ<ξ22\kappa\theta < \xi^2, the process hits zero with positive probability and must be reflected or absorbed.

Interpretation. The Feller condition compares the drift pulling VtV_t away from zero (proportional to κθ\kappa\theta) against the diffusion pushing it toward zero (proportional to ξ2\xi^2). When κθ\kappa\theta is large relative to ξ2\xi^2, the process stays safely positive. In practice, calibrated Heston parameters often violate the Feller condition — particularly for short maturities with strong smiles where large ξ\xi is required. Monte Carlo implementations must handle the absorption at zero numerically (full-truncation Euler or Andersen QE scheme).

REMARK

Remark. In Monte Carlo simulation under Heston, the naive Euler scheme for VtV_t can produce negative values. The full-truncation Euler scheme replaces VtV_t with max(Vt,0)\max(V_t, 0) in the diffusion and drift terms, using the truncated value only for the diffusion coefficient while leaving the drift as (κ(θVt))(κ(θ − V_t)). This is the standard industry scheme and preserves the Feller-condition boundary behaviour correctly.

3. Financial Interpretation of Parameters

Each parameter controls a specific feature of the implied vol surface:

ParameterControlsEffect on smile
V0=σ02V_0 = \sigma_0^2Short-dated ATM volLevel of short-end ATM implied vol
θ=σˉ2\theta = \bar{\sigma}^2Long-run volATM vol at long maturities (term structure level)
κ\kappaMean-reversion speedHow quickly ATM vol term structure decays from σ0\sigma_0 to σˉ\bar{\sigma}; also affects smile curvature
ξ\xiVol of volCurvature of the smile (butterfly / kurtosis)
ρ\rhoSpot-vol correlationSkew of the smile (25d risk reversal); ρ<0\rho < 0 gives a left-skewed smile
INSIGHT

Financial insight. A practitioner's first sanity check after calibration: (1) Does θ\sqrt{\theta} correspond to the long-run ATM vol observed in the vol surface? (2) Is ρ\rho in the range [0.8,0.3][-0.8, -0.3] for equity indices? (3) Is ξ\xi large enough to fit the curvature but not so large that the Feller condition is violated by a factor greater than 4? If these are violated, the calibration has likely found a degenerate solution.

4. The Characteristic Function

The key analytical result is that the Heston model is affine — the log-price characteristic function has an exponential affine form in V0V_0.

THEOREM

Theorem 4.2 (Heston Characteristic Function, Albrecher et al. 2007 — stable form). The characteristic function of XT=ln(ST/S0)X_T = \ln(S_T/S_0) under Q\mathbb{Q} is:

ϕT(u)=EQ ⁣[eiuXT]=exp ⁣(A(u,T)+B(u,T)V0)\phi_T(u) = \mathbb{E}^{\mathbb{Q}}\!\left[e^{i u X_T}\right] = \exp\!\left(A(u, T) + B(u, T)\,V_0\right)

where, defining d=(κiρξu)2+ξ2(iu+u2)d = \sqrt{(\kappa - i\rho\xi u)^2 + \xi^2(iu + u^2)} and g=κiρξudκiρξu+dg = \frac{\kappa - i\rho\xi u - d}{\kappa - i\rho\xi u + d}:

B(u,T)=κiρξudξ21edT1gedTB(u, T) = \frac{\kappa - i\rho\xi u - d}{\xi^2}\cdot\frac{1 - e^{-dT}}{1 - g\,e^{-dT}}

A(u,T)=(rq)iuT+κθξ2 ⁣[(κiρξud)T2ln ⁣1gedT1g]A(u, T) = \left(r - q\right)iuT + \frac{\kappa\theta}{\xi^2}\!\left[\left(\kappa - i\rho\xi u - d\right)T - 2\ln\!\frac{1 - g\,e^{-dT}}{1 - g}\right]

REMARK

Remark: Two branches of the characteristic function. Heston's original (1993) paper used a formulation involving ln(gedT)\ln(g e^{-dT}) which has a branch-cut discontinuity causing the "rotation" problem — the imaginary part of the complex logarithm jumps by 2π2\pi for large uu or large TT. The Albrecher et al. (2007) formulation above avoids this by expressing AA in terms of ln1gedT1g\ln\frac{1 - ge^{-dT}}{1 - g}, which has a well-defined principal branch for the parameter regimes encountered in calibration. Always use the Albrecher form in production code.

Given the characteristic function, European option prices follow via the Lewis (2001) formula (covered in the Fourier & FFT Pricing course):

C(K,T)=S0eqTS0Ke(r+q)T/2πRe ⁣0eiukϕT ⁣(ui2)duu2+14C(K, T) = S_0 e^{-qT} - \frac{\sqrt{S_0 K}\,e^{-(r+q)T/2}}{\pi}\,\text{Re}\!\int_0^\infty e^{iuk}\phi_T\!\left(u - \tfrac{i}{2}\right)\frac{du}{u^2 + \tfrac{1}{4}}

where k=ln(K/S0)k = \ln(K/S_0). This integral is evaluated via numerical quadrature (Gauss-Laguerre or adaptive Simpson) for a single strike-maturity pair, or via FFT for a full grid of strikes at fixed maturity.

5. The Smile Generated by Heston

The Heston model generates a non-flat implied vol surface with both skew and curvature. Understanding the smile shape qualitatively is as important as computing it numerically.

Short-maturity behaviour. As T0T \to 0, the smile is driven by V0V_0: all options price at σATMV0\sigma_{ATM} \approx \sqrt{V_0} and the smile is flat. The vol of vol ξ\xi and correlation ρ\rho take time to inflate the wings. Short-dated skew is therefore limited in Heston — a known failure mode for index options at short maturities.

Long-maturity behaviour. As TT \to \infty, the variance mean-reverts to θ\theta and the smile becomes approximately flat at θ\sqrt{\theta}. The skew and butterfly flatten at a rate determined by κ\kappa — faster mean reversion → faster smile decay.

Skew. Correlation ρ\rho drives the skew: when ρ<0\rho < 0, spot-down moves are accompanied by variance-up moves, inflating the left wing and suppressing the right. The risk reversal at 25 delta increases (in absolute value) as ρ|\rho| increases.

Curvature. Vol of vol ξ\xi drives the curvature (butterfly): large ξ\xi inflates both wings symmetrically. The butterfly spread increases with ξ\xi.

Smile dynamics. Unlike local vol, Heston generates sticky delta behaviour to leading order: as spot moves, the ATM vol shifts with the spot because VtV_t is an independent process. This is more consistent with observed market behaviour. Heston is still imperfect — empirically, the forward smile decay is too fast — but it is a substantial improvement over local vol for forward-starting products.

REMARK

Remark: Skew-butterfly decomposition. In Heston, skew and curvature are not independently controlled: ρ\rho affects both the skew and (through its interaction with ξ\xi) the curvature. This limited parameterisation is a structural constraint of the two-factor affine architecture. Models with more free parameters (SABR, rough vol) achieve better decoupling.

6. Calibration

Heston is calibrated by minimising the weighted squared error between model and market implied vols:

minκ,θ,ξ,ρ,V0i,jwij(σimplmodel(Ki,Tj)σimplmkt(Ki,Tj))2\min_{\kappa, \theta, \xi, \rho, V_0} \sum_{i,j} w_{ij}\left(\sigma_{impl}^{model}(K_i, T_j) - \sigma_{impl}^{mkt}(K_i, T_j)\right)^2

subject to: κ>0\kappa > 0, θ>0\theta > 0, ξ>0\xi > 0, ρ<1|\rho| < 1, V0>0V_0 > 0.

WARNING

Warning: Non-convexity. The Heston calibration objective is non-convex with multiple local minima. Gradient-based solvers (L-BFGS-B, trust-region) are sensitive to initialisation. Standard practice: (1) initialise with a moment-matching estimate; (2) run multiple restarts from randomly perturbed initial conditions; (3) apply soft penalisation of the Feller condition 2κθξ2>02\kappa\theta - \xi^2 > 0 to avoid degenerate solutions.

A practical calibration sequence:

  1. Estimate V0V_0 from the short-dated ATM implied vol: V0σATM(Tmin)2V_0 \approx \sigma_{ATM}(T_{min})^2.
  2. Estimate θ\theta from the long-dated ATM implied vol: θσATM(Tmax)2\theta \approx \sigma_{ATM}(T_{max})^2.
  3. Estimate ρ\rho from the ATM skew: ρ(25d RR at 1y)/(2ξθT)\rho \approx -(\text{25d RR at 1y}) / (2\xi\sqrt{\theta\,T}) (rough approximation).
  4. Run the full optimisation with these as initial parameters.
REMARK

Remark: Weights. Standard weighting schemes: (a) equal weights; (b) vega weighting — up-weights near-ATM options which are most liquid; (c) inverse bid-offer weighting — downweights illiquid far-wing quotes. Vega weighting is the most common in practice.


Validation

The companion notebook verifies the following:

  1. CIR path simulation: simulate VtV_t under Heston and verify empirical mean θ\approx \theta and variance ξ2θ/(2κ)\approx \xi^2\theta/(2\kappa) (the CIR stationary variance).
  2. Characteristic function moment check: verify E[ST]=S0e(rq)T\mathbb{E}[S_T] = S_0 e^{(r-q)T} from the characteristic function at u=iu = -i.
  3. MC vs analytic pricing: compare Heston Monte Carlo call prices to Lewis formula prices at several strikes and maturities.
  4. Parameter sensitivity: display how the implied vol smile changes as each parameter (κ\kappa, θ\theta, ξ\xi, ρ\rho) varies, with the others fixed.
PRACTICE

Before opening the notebook. Consider the Heston model with S0=100S_0 = 100, r=q=0r = q = 0, V0=0.04V_0 = 0.04 (σ0=20%\sigma_0 = 20\%), θ=0.04\theta = 0.04, κ=2\kappa = 2, ξ=0.5\xi = 0.5, ρ=0.7\rho = -0.7, T=1T = 1.

(a) Is the Feller condition satisfied? Compute 2κθ2\kappa\theta and ξ2\xi^2.

(b) What is the stationary variance of VtV_t? What is the stationary variance of Vt\sqrt{V_t}?

(c) Qualitatively, does this parametrisation generate a left-skewed or right-skewed smile? Does it generate positive or negative curvature?


Limitations

WARNING

Short-dated smile calibration. Heston cannot fit steep short-dated smiles (maturities under 1 month for equity indices) because the affine characteristic function generates smiles that flatten too quickly as T0T \to 0. The short-dated implied vol skew in Heston decays as O(T)O(\sqrt{T}), while observed equity smiles often persist as O(1/T)O(1/\sqrt{T}) or even steeper. Rough volatility models (Bergomi, Rough Heston) were developed specifically to address this.

WARNING

Feller violation. Calibrated Heston parameters frequently violate the Feller condition 2κθξ22\kappa\theta \geq \xi^2. When the condition is violated, VtV_t hits zero with positive probability. Numerical schemes that do not handle this correctly (e.g., reflection schemes) can introduce systematic pricing bias. The Andersen (2008) QE scheme is the gold standard for zero-touching variance processes; full-truncation Euler is the standard approximation.

WARNING

Forward smile decay. Heston's forward smile flattens at a rate exp(κT)\exp(-\kappa T). For mean-reversion speeds κ2\kappa \approx 2, the forward smile at 1 year is already significantly flatter than the spot smile. This underestimates the premium of cliquets and forward-starting options relative to the market. Bergomi (2005) and subsequent models were designed to control the forward smile decay directly.

WARNING

Correlation instability. For extreme ρ\rho values (ρ>0.8|\rho| > 0.8), the Heston model can exhibit numerical instability in the characteristic function evaluation — the argument of the complex square root dd can have a small modulus, causing branch-cut issues even in the Albrecher formulation. In calibration, constraining ρ(0.95,0)\rho \in (-0.95, 0) for equities is advisable.

Appropriate use cases:

  • Vanilla exotics where the smile dynamics (forward vol, skew persistence) matter: cliquets, forward-starting options, timer options.
  • Pricing barriers and digitals under stochastic vol to capture the smile impact on digital risk.
  • As the stochastic vol component in LSV models (leveraged Heston).
  • Research baseline for stochastic vol model comparisons.

Inappropriate use cases (where extensions are needed):

  • Very short-dated smile fitting (< 1 month): use rough vol or jump models.
  • Precise long-dated skew fitting (> 5 years): use multi-factor models (Bergomi, ZABR).
  • Variance swap vol-of-vol: Heston over-predicts vol-of-vol for long maturities.

Interview Angle

PRACTICE

L1 — Junior Quant / Quant Developer.

  1. "Write down the Heston model SDEs. What does each parameter control?" Expected: the two SDEs for StS_t and VtV_t, plus the correlation ρ\rho. For each of the five parameters, the candidate should identify which surface feature it controls (see the table in §3). A common mistake: confusing ξ\xi (vol of vol, in yr1\sqrt{\text{yr}}^{-1}) with a vol level.

  2. "What is the Feller condition and why does it matter for Monte Carlo?" Expected: 2κθξ22\kappa\theta \geq \xi^2 ensures Vt>0V_t > 0 a.s. For MC: if violated, the Euler scheme can produce negative variance values; the full-truncation scheme sets Vt=max(Vt,0)V_t^- = \max(V_t, 0) in the diffusion coefficient to prevent this.

  3. "Why is Heston preferred over local vol for cliquets?" Expected: Heston has stochastic vol (positive vol-of-vol) which generates richer forward smile dynamics. Local vol's forward smile collapses; Heston's does not collapse as fast. The forward smile in Heston is controlled by κ\kappa and ξ\xi.

PRACTICE

L2 — Senior Quant / Structurer.

  1. "How is the Heston call price computed analytically, and what is the role of the characteristic function?" Expected: the Lewis (2001) formula expresses the call price as a single Fourier integral of the characteristic function. The characteristic function is available in closed form (Theorem 4.2) because the Heston model is affine. The key exponential-affine structure ϕT(u)=eA(u,T)+B(u,T)V0\phi_T(u) = e^{A(u,T) + B(u,T)V_0} follows from the Riccati ODEs for AA and BB, which are solvable in closed form for affine models. The candidate should mention the branch-cut issue in the original Heston (1993) formulation and the Albrecher et al. (2007) fix.

  2. "How does the Heston smile decay with maturity, and how does this differ from what markets show?" Expected: in Heston, the skew decays approximately as ρξ/(2κ)(1eκT)/(κT)|\rho|\xi/(2\kappa) \cdot (1 - e^{-\kappa T})/(\kappa T) — it is bounded and decays to zero as TT \to \infty at a rate controlled by κ\kappa. Observed equity index smiles often show a slower decay. Rough vol models (Bergomi, RFSV, Rough Heston) show skew decaying as THT^H where H<1/2H < 1/2 (the Hurst exponent), which better matches data.

  3. "If calibrated Heston gives κ=3\kappa=3, θ=0.04\theta=0.04, ξ=0.8\xi=0.8, ρ=0.75\rho=-0.75, V0=0.05V_0=0.05, is the Feller condition satisfied? What are the implications?" Expected: 2κθ=0.242\kappa\theta = 0.24, ξ2=0.64\xi^2 = 0.64. Feller is violated (0.24<0.640.24 < 0.64). VtV_t hits zero with positive probability. For pricing: use full-truncation Euler or the Andersen QE scheme. For calibration: consider adding a Feller penalty to the objective; or accept the violation and trust that the pricing bias is small if variance stays away from zero in practice.

PRACTICE

L3 — Quant Researcher.

  1. "Derive the Riccati ODEs that the functions A(u,T)A(u,T) and B(u,T)B(u,T) in the Heston characteristic function satisfy. What property of the Heston model makes them solvable in closed form?" Expected: the Heston model is affine: the drift and squared diffusion coefficients of (Xt,Vt)(X_t, V_t) are affine functions of the state (Xt,Vt)(X_t, V_t). By the Duffie-Pan-Singleton (2000) theorem, the characteristic function of an affine process is exponential-affine in the state. Substituting ϕT(u)=eA+BV0\phi_T(u) = e^{A + BV_0} into the Feynman-Kac PDE for the characteristic function and matching coefficients yields:

    • B˙=12(iu+u2)+(iρξuκ)Bξ22B2\dot{B} = \frac{1}{2}(iu + u^2) + (i\rho\xi u - \kappa)B - \frac{\xi^2}{2}B^2 (Riccati ODE for BB)
    • A˙=(rq)iu+κθB\dot{A} = (r-q)iu + \kappa\theta B (linear ODE for AA, given BB) The Riccati ODE for BB is solvable in closed form because its coefficients are constant (independent of TT), giving the explicit formula in Theorem 4.2.
  2. "How does the Gyöngy theorem connect Heston to Dupire local vol? What does the Heston local vol surface look like?" Expected: by Gyöngy, the Dupire local vol of the Heston model is σloc2(K,T)=EQ[VTST=K]\sigma_{loc}^2(K,T) = \mathbb{E}^{\mathbb{Q}}[V_T \mid S_T = K]. This conditional expectation can be computed numerically. For ρ<0\rho < 0, the Heston local vol surface is steeper (more skewed) than the Heston implied vol surface — consistent with the half-skew result from Dupire. In an LSV model, the ratio σlocDupire(K,T)/σlocHeston(K,T)\sigma_{loc}^{Dupire}(K,T) / \sigma_{loc}^{Heston}(K,T) defines the leverage function that adjusts the stochastic vol to match the market surface.

  3. "What is the variance term structure in Heston, and how is it related to the VIX?" Expected: the expected total variance from 0 to TT under Heston is E[V0,T]=θT+(V0θ)(1eκT)/κ\mathbb{E}[V_{0,T}] = \theta T + (V_0 - \theta)(1 - e^{-\kappa T})/\kappa. The instantaneous forward variance is TE[V0,T]=θ+(V0θ)eκT\partial_T \mathbb{E}[V_{0,T}] = \theta + (V_0 - \theta)e^{-\kappa T}. The VIX is a model-free measure of 30-day expected variance; calibrating Heston to the VIX requires the model's 30-day total variance to match the squared VIX. Multi-factor variance models (Bergomi, 2-factor CIR) allow independent control of the short-end and long-end variance term structure, which single-factor Heston cannot achieve.