Numerical MethodsMonte CarloVariance ReductionQuasi-Monte Carlo

Monte Carlo: Antithetic Variates, Control Variates, Quasi-MC

22 min readLevel: Medium

Setup

What Monte Carlo Computes

Monte Carlo (MC) methods estimate expectations of the form

μ=E[f(X)],\mu = \mathbb{E}[f(X)],

where XX is a random variable (or vector) with a known distribution and ff is a payoff or functional. In derivatives pricing, XX is typically a path of the risk-neutral process and ff is the discounted payoff.

The standard MC estimator with NN i.i.d. samples X1,,XNX_1, \ldots, X_N is

μ^N=1Ni=1Nf(Xi).\hat{\mu}_N = \frac{1}{N}\sum_{i=1}^N f(X_i).

By the law of large numbers, μ^Nμ\hat{\mu}_N \to \mu almost surely. By the central limit theorem,

N(μ^Nμ)dN(0,σf2),σf2=Var(f(X)).\sqrt{N}\left(\hat{\mu}_N - \mu\right) \xrightarrow{d} \mathcal{N}(0, \sigma_f^2), \qquad \sigma_f^2 = \mathrm{Var}(f(X)).

The standard error is SE=σf/N\mathrm{SE} = \sigma_f / \sqrt{N}, converging at rate O(N1/2)O(N^{-1/2}) regardless of the dimension of XX. This is the central virtue of MC: convergence rate is dimension-independent.

Notation

  • σf2=Var(f(X))\sigma_f^2 = \mathrm{Var}(f(X)): variance of the estimator (per sample).
  • NN: number of independent sample paths.
  • σ^f2\hat{\sigma}_f^2: sample variance estimator, used to construct confidence intervals.
  • The 95%95\% confidence interval for μ\mu is μ^N±1.96σ^f/N\hat{\mu}_N \pm 1.96 \cdot \hat{\sigma}_f / \sqrt{N}.

A key principle: reducing σf\sigma_f by a factor kk is equivalent to increasing NN by k2k^2. Variance reduction techniques achieve this without generating more samples.


Antithetic Variates

Principle

For each sample XiX_i, generate a paired sample XiX_i' from the antithetic distribution — typically by negating the underlying normal draw. The antithetic estimator is:

μ^NAV=1Ni=1Nf(Xi)+f(Xi)2.\hat{\mu}_N^{\mathrm{AV}} = \frac{1}{N}\sum_{i=1}^N \frac{f(X_i) + f(X_i')}{2}.

This is unbiased: E ⁣[f(X)+f(X)2]=μ\mathbb{E}\!\left[\frac{f(X) + f(X')}{2}\right] = \mu since XX' has the same marginal distribution as XX.

Variance Analysis

Var ⁣(f(X)+f(X)2)=Var(f(X))+Var(f(X))+2Cov(f(X),f(X))4.\mathrm{Var}\!\left(\frac{f(X) + f(X')}{2}\right) = \frac{\mathrm{Var}(f(X)) + \mathrm{Var}(f(X')) + 2\,\mathrm{Cov}(f(X), f(X'))}{4}.

Since XX and XX' are identically distributed, Var(f(X))=Var(f(X))=σf2\mathrm{Var}(f(X)) = \mathrm{Var}(f(X')) = \sigma_f^2. Therefore:

Var ⁣(f(X)+f(X)2)=σf2+Cov(f(X),f(X))2.\mathrm{Var}\!\left(\frac{f(X) + f(X')}{2}\right) = \frac{\sigma_f^2 + \mathrm{Cov}(f(X), f(X'))}{2}.

Variance is reduced whenever Cov(f(X),f(X))<0\mathrm{Cov}(f(X), f(X')) < 0. Compared to the standard estimator variance σf2/N\sigma_f^2 / N using 2N2N samples (matching the computational cost — two paths per antithetic pair), the antithetic estimator variance is:

1Nσf2+Cov(f(X),f(X))2vs.σf22N.\frac{1}{N} \cdot \frac{\sigma_f^2 + \mathrm{Cov}(f(X), f(X'))}{2} \quad \text{vs.} \quad \frac{\sigma_f^2}{2N}.

The efficiency gain is:

Efficiency gain=σf2/(2N)(σf2+Cov)/2N=σf2σf2+Cov(f(X),f(X))=11+ρf,f,\text{Efficiency gain} = \frac{\sigma_f^2 / (2N)}{(\sigma_f^2 + \mathrm{Cov})/2N} = \frac{\sigma_f^2}{\sigma_f^2 + \mathrm{Cov}(f(X), f(X'))} = \frac{1}{1 + \rho_{f,f'}},

where ρf,f\rho_{f,f'} is the correlation between f(X)f(X) and f(X)f(X'). For this to be a gain, we need ρf,f<0\rho_{f,f'} < 0.

Application to GBM Call Option

Under GBM, one path realises WT=TZW_T = \sqrt{T} Z for ZN(0,1)Z \sim \mathcal{N}(0,1). The antithetic path uses Z-Z:

ST=S0exp ⁣((r12σ2)T+σTZ),ST=S0exp ⁣((r12σ2)TσTZ).S_T = S_0 \exp\!\left((r - \tfrac{1}{2}\sigma^2)T + \sigma\sqrt{T}\, Z\right), \qquad S_T' = S_0 \exp\!\left((r - \tfrac{1}{2}\sigma^2)T - \sigma\sqrt{T}\, Z\right).

The payoff f(Z)=erT(STK)+f(Z) = e^{-rT}(S_T - K)^+ is an increasing function of ZZ. When ZZ is large (call in the money), Z-Z is large and negative (call out of the money). So Cov(f(Z),f(Z))<0\mathrm{Cov}(f(Z), f(-Z)) < 0 for a convex, monotone payoff — antithetic variates are effective.

For path-dependent options, antithetic variates require generating the full antithetic path {Z1,,Zn}n=1Nsteps\{-Z_1, \ldots, -Z_n\}_{n=1}^{N_\text{steps}}.


Control Variates

Principle

Let YY be a random variable correlated with f(X)f(X) such that E[Y]=μY\mathbb{E}[Y] = \mu_Y is known analytically. The control variate estimator is:

μ^NCV(c)=1Ni=1N[f(Xi)c(YiμY)].\hat{\mu}_N^{\mathrm{CV}}(c) = \frac{1}{N}\sum_{i=1}^N \left[f(X_i) - c(Y_i - \mu_Y)\right].

This is unbiased for any constant cc: the correction term c(YiμY)c(Y_i - \mu_Y) has mean zero.

Optimal Control Coefficient

The variance of f(X)c(YμY)f(X) - c(Y - \mu_Y) is:

Var(f(X)cY)=σf22cCov(f,Y)+c2σY2.\mathrm{Var}(f(X) - c Y) = \sigma_f^2 - 2c\,\mathrm{Cov}(f,Y) + c^2 \sigma_Y^2.

Minimising over cc:

c=Cov(f(X),Y)Var(Y).c^* = \frac{\mathrm{Cov}(f(X), Y)}{\mathrm{Var}(Y)}.

The variance after control variate at c=cc = c^*:

Var=σf2(1ρf,Y2),\mathrm{Var}^* = \sigma_f^2\left(1 - \rho_{f,Y}^2\right),

where ρf,Y\rho_{f,Y} is the correlation between f(X)f(X) and YY. The variance reduction factor is 1ρf,Y21 - \rho_{f,Y}^2; a correlation of ρ=0.9|\rho| = 0.9 gives 81%81\% variance reduction.

In practice, cc^* is estimated from a pilot sample or computed jointly with the main estimator using the sample covariance. The bias from estimating cc^* is O(1/N)O(1/N) — negligible for large NN.

Common Control Variates in Option Pricing

Underlying as control. Under Q\mathbb{Q}, EQ[ST]=S0erT\mathbb{E}^{\mathbb{Q}}[S_T] = S_0 e^{rT}. Set Y=erTSTY = e^{-rT} S_T; then μY=S0\mu_Y = S_0. For a call option, Corr(f(ST),ST)\mathrm{Corr}(f(S_T), S_T) is high (near 1 for at-the-money options), giving substantial reduction.

European as control for path-dependent. For an Asian option on an arithmetic average, the geometric-average Asian has a known closed form. The correlation between arithmetic and geometric payoffs is high (0.99\approx 0.99 in typical parameters), yielding near-perfect variance reduction.

Delta-hedged portfolio. For any smooth payoff, the discounted stock price increment 0TertSCdSt\int_0^T e^{-rt}\partial_S C \, dS_t has mean zero (it is a martingale under Q\mathbb{Q}). Using this as a control variate is known as martingale control variates and can achieve variance reduction close to 1ρ20.991 - \rho^2 \approx 0.99 for smooth payoffs.


Quasi-Monte Carlo

Motivation

Standard MC uses pseudo-random numbers, which are i.i.d. uniform on [0,1]d[0,1]^d. The expected error is O(N1/2)O(N^{-1/2}). Can we do better by replacing pseudo-random sequences with deterministic low-discrepancy sequences that fill the hypercube more uniformly?

Discrepancy

The star discrepancy of a sequence {x1,,xN}[0,1]d\{x_1, \ldots, x_N\} \subset [0,1]^d is:

DN=sup[a,b][0,1]d#{i:xi[a,b]}NVol([a,b]).D_N^* = \sup_{[a,b] \subseteq [0,1]^d} \left|\frac{\#\{i : x_i \in [a,b]\}}{N} - \mathrm{Vol}([a,b])\right|.

It measures the worst-case deviation between the empirical distribution of the sequence and the uniform distribution. A sequence is low-discrepancy (LD) if DN=O ⁣((logN)d/N)D_N^* = O\!\left((\log N)^d / N\right) — much smaller than the O(N1/2)O(N^{-1/2}) discrepancy of a random sequence.

Koksma-Hlawka Inequality

For a function ff of bounded Hardy-Krause variation V(f)V(f):

1Ni=1Nf(xi)[0,1]df(u)duV(f)DN.\left|\frac{1}{N}\sum_{i=1}^N f(x_i) - \int_{[0,1]^d} f(u)\, du\right| \leq V(f) \cdot D_N^*.

For a low-discrepancy sequence: error V(f)O ⁣((logN)d/N)\leq V(f) \cdot O\!\left((\log N)^d / N\right), which beats O(N1/2)O(N^{-1/2}) for fixed dd and large NN. In practice, the improvement is dramatic for d10d \leq 10 and smooth ff.

Sobol Sequences

Sobol sequences are the most widely used LD sequences in finance. They are digital nets constructed in base 2 using generating matrices that ensure equidistribution across dyadic subintervals. Key properties:

  • Each dimension is independently constructed (with cross-dimension properties ensured via scrambling).
  • The first 2k2^k points of a Sobol sequence cover [0,1]d[0,1]^d in a maximally stratified manner.
  • Scrambled Sobol (Owen 1995) randomises the sequence while preserving the LD property, enabling unbiased estimation and confidence intervals.

Effective Dimension

The Koksma-Hlawka bound grows with dd. For high-dimensional problems (e.g., simulating a 252-step path = d=252d = 252), the theoretical advantage of QMC may be lost. In practice, what matters is the effective dimension: for a GBM path, the variation in the payoff is dominated by a few directions (early time steps, large moves). The Brownian bridge construction assigns Sobol dimensions to the most influential time points first, concentrating the low-discrepancy property where it matters most.


Convergence Comparison

MethodExpected errorNotes
Standard MCO(N1/2)O(N^{-1/2})Dimension-independent, easy to implement
Antithetic variatesO(N1/2)O(N^{-1/2})Reduces variance by (1+ρ)/2(1+\rho)/2, no extra paths
Control variatesO(N1/2)O(N^{-1/2})Reduces variance by 1ρ21 - \rho^2, needs known E[Y]\mathbb{E}[Y]
Quasi-MC (Sobol)O((logN)d/N)O((\log N)^d / N)Pre-asymptotic gains for d20d \leq 20, smooth integrands
Randomised QMCO(N1/21/d)O(N^{-1/2-1/d}) (theoretical)Confidence intervals available; best for smooth functions

All methods have the same asymptotic O(N1/2)O(N^{-1/2}) rate in the worst case for non-smooth payoffs. The gains are problem-dependent.


Limitations

Non-smooth payoffs. Digital options have a payoff with a jump discontinuity. The Koksma-Hlawka bound requires bounded variation — a digital payoff has V(f)=V(f) = \infty in the Hardy-Krause sense (discontinuous). QMC loses its advantage; standard MC performs comparably. Smoothing the payoff (e.g., replace the digital with a call spread) restores QMC efficiency.

Curse of dimensionality. The (logN)d(\log N)^d factor in QMC becomes prohibitive for d>50d > 50 at typical sample sizes. The effective-dimension reduction via Brownian bridge or principal component analysis (PCA) construction is essential.

Antithetic for discontinuous payoffs. Antithetic variates can increase variance if the payoff is not monotone (e.g., a butterfly spread). Always verify the sign of Cov(f(X),f(X))\mathrm{Cov}(f(X), f(X')) before applying.

Control variate estimation. If cc^* is estimated from the same sample used for the main estimate, the combined estimator is biased (though at rate O(1/N)O(1/N)). For small NN or high-stakes applications, use a separate pilot sample for cc^* estimation.


Interview Angle

L1. State the Monte Carlo convergence rate. How many additional paths are needed to halve the standard error? What is the standard error of a MC estimate of a Black-Scholes call price?

Rate: SE=σf/N\mathrm{SE} = \sigma_f / \sqrt{N}. To halve SE\mathrm{SE}, multiply NN by 4. For a BS call: simulate ST(i)=S0exp((rσ2/2)T+σTZi)S_T^{(i)} = S_0 \exp((r - \sigma^2/2)T + \sigma\sqrt{T} Z_i), compute fi=erT(ST(i)K)+f_i = e^{-rT}(S_T^{(i)} - K)^+. Then μ^N=fˉN\hat{\mu}_N = \bar{f}_N and SE=σ^f/N\mathrm{SE} = \hat{\sigma}_f / \sqrt{N} where σ^f2=1N1(fifˉN)2\hat{\sigma}_f^2 = \frac{1}{N-1}\sum(f_i - \bar{f}_N)^2.

L2. Derive the optimal control variate coefficient cc^* and the resulting variance reduction formula 1ρf,Y21 - \rho_{f,Y}^2. Explain why EQ[ST]=S0erT\mathbb{E}^{\mathbb{Q}}[S_T] = S_0 e^{rT} makes the discounted stock a valid control variate.

Derivation: Var(fcY)=σf22cCov(f,Y)+c2σY2\mathrm{Var}(f - cY) = \sigma_f^2 - 2c\,\mathrm{Cov}(f,Y) + c^2\sigma_Y^2. Differentiate in cc and set to zero: c=Cov(f,Y)/σY2c^* = \mathrm{Cov}(f,Y)/\sigma_Y^2. Substituting back: Var=σf2[Cov(f,Y)]2/σY2=σf2(1ρf,Y2)\mathrm{Var}^* = \sigma_f^2 - [\mathrm{Cov}(f,Y)]^2/\sigma_Y^2 = \sigma_f^2(1 - \rho_{f,Y}^2).

Why STS_T is valid: Under Q\mathbb{Q}, the discounted stock erTSTe^{-rT}S_T is a martingale, so EQ[erTST]=S0\mathbb{E}^{\mathbb{Q}}[e^{-rT}S_T] = S_0 — this is known analytically. Setting Yi=erTST(i)Y_i = e^{-rT}S_T^{(i)} with μY=S0\mu_Y = S_0, the correction term c(YiS0)c^*(Y_i - S_0) has zero mean, making the control variate estimator unbiased.

L3. State the Koksma-Hlawka inequality. Under what conditions does quasi-MC outperform standard MC? Explain the role of the Brownian bridge construction in making Sobol sequences effective for path-dependent options.

Koksma-Hlawka: 1Nf(xi)fV(f)DN|\frac{1}{N}\sum f(x_i) - \int f| \leq V(f) \cdot D_N^*. For LD sequences DN=O((logN)d/N)D_N^* = O((\log N)^d/N), so the error is O(V(f)(logN)d/N)O(V(f)(\log N)^d/N) vs. O(σf/N)O(\sigma_f/\sqrt{N}) for standard MC. QMC wins when: (1) ff has bounded variation V(f)<V(f) < \infty (smooth or Lipschitz payoffs), (2) dd is moderate (the (logN)d(\log N)^d factor is manageable), and (3) NN is large enough for the pre-asymptotic advantage to materialise.

Brownian bridge: For a path {Wt1,,Wtn}\{W_{t_1}, \ldots, W_{t_n}\}, the standard construction maps Sobol dimension jj to time step tjt_j in order. But the payoff variance is dominated by the terminal value WTW_T and large-scale moves, not fine-scale increments. The Brownian bridge construction uses Sobol dimension 1 for WTW_T, dimension 2 for WT/2W_{T/2} (interpolated given W0W_0 and WTW_T), then fills in midpoints recursively. This assigns the lowest-discrepancy dimensions to the highest-variance components of the path, making the effective dimension of the integral much smaller than nn and restoring QMC efficiency.

Verify your understanding before moving on.

Start Quiz →