ProbabilityMeasure TheoryLp SpacesConvergenceUniform Integrability

Lp Spaces and Modes of Convergence

Module 5 of 520 min readLevel: Medium

Setup

Mathematical context

The theory built in Modules 1–4 requires a precise language for comparing random variables and talking about sequences converging to a limit. Four distinct notions of convergence appear naturally in quantitative finance: almost sure convergence, convergence in probability, LpL^p convergence, and convergence in distribution. These are not equivalent, and confusing them produces incorrect results — in Monte Carlo error bounds, in Central Limit Theorem applications, and in the conditions required for the Optional Stopping Theorem.

This module establishes the LpL^p function spaces that are the natural home for random variables, derives the fundamental inequalities (Jensen, Hölder, Markov), characterises the four modes of convergence, and introduces uniform integrability — the concept that bridges L1L^1 and almost sure convergence and appears explicitly in the OST conditions of Module 4.

Stated assumptions

  • (Ω,F,P)(\Omega, \mathcal{F}, \mathbb{P}) is a complete probability space (Module 1).
  • Random variables are real-valued measurable functions X:ΩRX : \Omega \to \mathbb{R} (or R{+}\mathbb{R} \cup \{+\infty\} where noted).
  • Lebesgue integration is used throughout (Module 2). The results hold under any σ\sigma-finite measure; on a probability space the proofs simplify.
  • Conventions: p[1,)p \in [1, \infty) unless otherwise stated; the case p=p = \infty is treated separately.
INSIGHT

Financial Insight. On a trading desk, the choice of LpL^p space is a modelling assumption, not an abstraction. An option payoff in L2(Q)L^2(\mathbb{Q}) has finite variance under the risk-neutral measure — a necessary condition for the Black-Scholes delta hedge to be well-defined. A process in L1L^1 only has a finite price but not necessarily finite hedging error. Exotic payoffs (e.g., power options S1.5S^{1.5}) may fail to be in L2L^2 under log-normal dynamics, invalidating standard Greeks formulas. Uniform integrability appears every time you pass a limit through an expectation — which Monte Carlo methods do at every step.


Theory

1. Lp spaces

DEFINITION

Definition 5.1 (LpL^p space). For p[1,)p \in [1, \infty), define

Xp:=(E[Xp])1/p,Lp(Ω,F,P):={X:Xp<}.\|X\|_p := \left(\mathbb{E}[|X|^p]\right)^{1/p}, \qquad L^p(\Omega, \mathcal{F}, \mathbb{P}) := \{X : \|X\|_p < \infty\}.

For p=p = \infty: X:=esssupX\|X\|_\infty := \mathrm{ess\,sup}|X|, the essential supremum (smallest MM such that P(X>M)=0\mathbb{P}(|X| > M) = 0).

Elements of LpL^p are equivalence classes of random variables that agree P\mathbb{P}-almost everywhere.

The LpL^p norm measures the average magnitude of XX raised to the pp-th power. For p=1p = 1: X1=E[X]\|X\|_1 = \mathbb{E}[|X|] — the mean absolute value. For p=2p = 2: X2=E[X2]\|X\|_2 = \sqrt{\mathbb{E}[X^2]} — the root mean square, the natural norm for variance and hedging error.

THEOREM

Theorem 5.1 (Riesz-Fischer — completeness of LpL^p). For p[1,]p \in [1, \infty], Lp(Ω,F,P)L^p(\Omega, \mathcal{F}, \mathbb{P}) is a Banach space (complete normed vector space). In particular, every Cauchy sequence in LpL^p has a limit in LpL^p.

Completeness is what guarantees that Itô integrals (defined as L2L^2 limits of simple integrands) actually exist in L2L^2.

THEOREM

Theorem 5.2 (LpL^p inclusions on probability spaces). For a probability space (Ω,F,P)(\Omega, \mathcal{F}, \mathbb{P}) with 1pq1 \leq p \leq q \leq \infty:

Lq(Ω,F,P)Lp(Ω,F,P),XpXq.L^q(\Omega, \mathcal{F}, \mathbb{P}) \subseteq L^p(\Omega, \mathcal{F}, \mathbb{P}), \qquad \|X\|_p \leq \|X\|_q.

This inclusion is specific to probability spaces (i.e., P(Ω)=1\mathbb{P}(\Omega) = 1). On a general measure space the inclusion reverses. The implication: L2L^2 is a strictly better-behaved space than L1L^1 — finite variance implies finite mean, but not vice versa.

2. Fundamental inequalities

THEOREM

Theorem 5.3 (Markov's inequality). For X0X \geq 0 and λ>0\lambda > 0:

P(Xλ)E[X]λ.\mathbb{P}(X \geq \lambda) \leq \frac{\mathbb{E}[X]}{\lambda}.

Setting X=YpX = |Y|^p gives Chebyshev's inequality: P(Yλ)E[Yp]/λp\mathbb{P}(|Y| \geq \lambda) \leq \mathbb{E}[|Y|^p]/\lambda^p.

Markov is the simplest probabilistic bound. Its proof is a one-line application of monotonicity of integration: E[X]E[X1Xλ]λP(Xλ)\mathbb{E}[X] \geq \mathbb{E}[X \cdot \mathbf{1}_{X \geq \lambda}] \geq \lambda \, \mathbb{P}(X \geq \lambda). It is tight: take P(X=λ)=1/λ\mathbb{P}(X = \lambda) = 1/\lambda, P(X=0)=11/λ\mathbb{P}(X = 0) = 1 - 1/\lambda.

THEOREM

Theorem 5.4 (Jensen's inequality). Let φ:RR\varphi : \mathbb{R} \to \mathbb{R} be convex and XL1X \in L^1. Then:

φ(E[X])E[φ(X)].\varphi(\mathbb{E}[X]) \leq \mathbb{E}[\varphi(X)].

For concave φ\varphi, the inequality reverses.

PROOF

Proof. Since φ\varphi is convex, for any aRa \in \mathbb{R} there exists a supporting hyperplane: φ(x)φ(a)+c(xa)\varphi(x) \geq \varphi(a) + c(x - a) for some cRc \in \mathbb{R} (the subgradient at aa). Set a=E[X]a = \mathbb{E}[X] and take expectations on both sides:

E[φ(X)]φ(E[X])+c(E[X]E[X])=φ(E[X]).\mathbb{E}[\varphi(X)] \geq \varphi(\mathbb{E}[X]) + c(\mathbb{E}[X] - \mathbb{E}[X]) = \varphi(\mathbb{E}[X]). \quad \square

Jensen is omnipresent in finance. Applications: convexity of the option payoff implies the value of an option on the average is less than the average option value (Jensen's inequality in Asian pricing). The log-normal expected value: E[eX]eE[X]\mathbb{E}[e^X] \geq e^{\mathbb{E}[X]} (convexity of exp\exp). The sub-additivity of \sqrt{\cdot} means E[X2]E[X]\sqrt{\mathbb{E}[X^2]} \geq \mathbb{E}[|X|] (i.e., X2X1\|X\|_2 \geq \|X\|_1).

THEOREM

Theorem 5.5 (Hölder's inequality). For p,q(1,)p, q \in (1, \infty) with 1/p+1/q=11/p + 1/q = 1 (Hölder conjugates):

E[XY]XpYq.\mathbb{E}[|XY|] \leq \|X\|_p \|Y\|_q.

The case p=q=2p = q = 2 is the Cauchy-Schwarz inequality: E[XY]X2Y2\mathbb{E}[|XY|] \leq \|X\|_2 \|Y\|_2.

Hölder's inequality is used to bound covariance terms in option pricing (e.g., proving that E[ΔS]Δ2S2\mathbb{E}[\Delta \cdot S] \leq \|\Delta\|_2 \|S\|_2 for a self-financing portfolio with square-integrable delta), and in the theory of stochastic integration where it controls the cross-terms.

THEOREM

Theorem 5.6 (Minkowski's inequality). For p[1,)p \in [1, \infty):

X+YpXp+Yp.\|X + Y\|_p \leq \|X\|_p + \|Y\|_p.

This is the triangle inequality for the LpL^p norm — what makes LpL^p a normed space. It is the key step in verifying that LpL^p is a vector space under the LpL^p norm.

3. Modes of convergence

We consider a sequence (Xn)n1(X_n)_{n \geq 1} and a limit XX, all defined on (Ω,F,P)(\Omega, \mathcal{F}, \mathbb{P}).

DEFINITION

Definition 5.2 (Four modes of convergence).

(AS) Almost sure convergence: Xna.s.XX_n \xrightarrow{\mathrm{a.s.}} X if P(ω:Xn(ω)X(ω))=1\mathbb{P}(\omega : X_n(\omega) \to X(\omega)) = 1.

(P) Convergence in probability: XnPXX_n \xrightarrow{\mathbb{P}} X if P(XnX>ε)0\mathbb{P}(|X_n - X| > \varepsilon) \to 0 for every ε>0\varepsilon > 0.

(Lp) LpL^p convergence: XnLpXX_n \xrightarrow{L^p} X if XnXp0\|X_n - X\|_p \to 0.

(D) Convergence in distribution: XnDXX_n \xrightarrow{D} X if E[f(Xn)]E[f(X)]\mathbb{E}[f(X_n)] \to \mathbb{E}[f(X)] for all bounded continuous ff.

The four modes are strictly ordered in strength. The implication diagram is:

LpPa.s.L^p \Rightarrow \mathbb{P} \Leftarrow \mathrm{a.s.}

and both LpL^p and a.s. convergence imply convergence in probability, which implies convergence in distribution. No other general implication holds.

EXAMPLE

Example 5.1 (Implication failures — the standard counterexamples).

(a) A.S. does not imply L1L^1. Let Ω=[0,1]\Omega = [0,1] with Lebesgue measure. Set Xn=n1[0,1/n]X_n = n \cdot \mathbf{1}_{[0, 1/n]}. Then Xn(ω)0X_n(\omega) \to 0 for every ω>0\omega > 0 (i.e., a.s.), but E[Xn]=n(1/n)=1↛0\mathbb{E}[X_n] = n \cdot (1/n) = 1 \not\to 0. So Xn0X_n \to 0 a.s. but Xn↛0X_n \not\to 0 in L1L^1.

(b) L1L^1 does not imply a.s. The typewriter sequence: partition [0,1][0,1] into intervals [k/2m,(k+1)/2m)[k/2^m, (k+1)/2^m) for 0k<2m0 \leq k < 2^m, indexed sequentially. Set Xn=1InX_n = \mathbf{1}_{I_n}. Then E[Xn]=2m0\mathbb{E}[X_n] = 2^{-m} \to 0 (so Xn0X_n \to 0 in L1L^1), but for every ω[0,1]\omega \in [0,1], Xn(ω)=1X_n(\omega) = 1 infinitely often and =0= 0 infinitely often — the sequence does not converge a.s.

(c) Convergence in probability does not imply a.s. The typewriter sequence also serves here: Xn0X_n \to 0 in probability (same argument as L1L^1) but not a.s.

REMARK

Remark (Subsequence criterion). XnPXX_n \xrightarrow{\mathbb{P}} X if and only if every subsequence (Xnk)(X_{n_k}) has a further subsequence (Xnkj)(X_{n_{k_j}}) with Xnkja.s.XX_{n_{k_j}} \xrightarrow{\mathrm{a.s.}} X. This is one of the most useful tools in stochastic analysis for promoting convergence in probability to almost sure convergence along a subsequence.

4. Uniform integrability

DEFINITION

Definition 5.3 (Uniform integrability). A family {Xα}\{X_\alpha\} of random variables is uniformly integrable (UI) if

limMsupαE ⁣[Xα1Xα>M]=0.\lim_{M \to \infty} \sup_\alpha \, \mathbb{E}\!\left[|X_\alpha| \cdot \mathbf{1}_{|X_\alpha| > M}\right] = 0.

Intuitively: the tails of all XαX_\alpha simultaneously become negligible as the truncation level MM grows. Uniform integrability ensures that convergence in probability "controls the tails" well enough to imply L1L^1 convergence.

THEOREM

Theorem 5.7 (Vitali convergence theorem). Let (Xn)(X_n) be a UI family and XnPXX_n \xrightarrow{\mathbb{P}} X. Then XL1X \in L^1 and XnXX_n \to X in L1L^1.

This is the definitive statement bridging convergence in probability and L1L^1 convergence. The Dominated Convergence Theorem (Module 2) is a special case: if XnY|X_n| \leq Y with YL1Y \in L^1, then {Xn}\{X_n\} is UI by E[Xn1Xn>M]E[Y1Y>M]0\mathbb{E}[|X_n| \cdot \mathbf{1}_{|X_n|>M}] \leq \mathbb{E}[Y \cdot \mathbf{1}_{Y > M}] \to 0.

REMARK

Remark (UI and martingale theory). A martingale (Mt)(M_t) is UI if and only if it converges a.s. and in L1L^1 to a terminal variable MM_\infty with Mt=E[MFt]M_t = \mathbb{E}[M_\infty \mid \mathcal{F}_t]. This is the Doob L1L^1 martingale convergence theorem. The connection to Module 4: the OST condition "the stopped martingale is UI" is precisely saying that the stopped process has a well-behaved L1L^1 limit.

EXAMPLE

Example 5.2 (UI in option pricing). Under the risk-neutral measure Q\mathbb{Q}, the family {erTg(ST):T0}\{e^{-rT} g(S_T) : T \geq 0\} for a bounded payoff gg (e.g., a call with notional cap) is UI — the payoff is dominated by a constant. For an unbounded payoff such as g(ST)=ST2g(S_T) = S_T^2 (power option), the family is UI only if STS_T has sufficiently thin tails under Q\mathbb{Q} (e.g., finite higher moments under log-normal dynamics). Failing to verify UI when applying DCT in a simulation loop is one cause of Monte Carlo bias that does not diminish with sample size.


Validation

The companion notebook verifies:

  1. LpL^p norms on a finite discrete probability space using exact rational arithmetic: X1\|X\|_1, X2\|X\|_2, X\|X\|_\infty, and confirms the LqLpL^q \subseteq L^p inclusion numerically.
  2. Jensen's inequality: for φ(x)=ex\varphi(x) = e^x (convex) and φ(x)=logx\varphi(x) = \log x (concave), verifies φ(E[X])E[φ(X)]\varphi(\mathbb{E}[X]) \leq \mathbb{E}[\varphi(X)] and the reverse.
  3. Hölder's and Cauchy-Schwarz inequalities: verified on explicit numerical examples.
  4. Convergence counterexamples: simulates the Xn=n1[0,1/n]X_n = n \cdot \mathbf{1}_{[0,1/n]} sequence, confirms a.s. convergence to 0 while E[Xn]=1\mathbb{E}[X_n] = 1 for all nn.
  5. Uniform integrability: empirically verifies that a bounded family is UI while an unbounded family fails the UI criterion at each truncation level.
PRACTICE

Hand exercise.

(a) Let XX be uniform on {1,2,3,4}\{1, 2, 3, 4\} with equal probability 1/41/4. Compute X1\|X\|_1, X2\|X\|_2, and X\|X\|_\infty exactly. Confirm X1X2X\|X\|_1 \leq \|X\|_2 \leq \|X\|_\infty.

(b) Let φ(x)=x2\varphi(x) = x^2 (convex). With XX as above, verify Jensen's inequality φ(E[X])E[φ(X)]\varphi(\mathbb{E}[X]) \leq \mathbb{E}[\varphi(X)] by direct computation.

(c) Give an explicit example of a sequence (Xn)(X_n) on [0,1][0,1] that converges to 0 in L2L^2 but not almost surely.


Limitations

LpL^p inclusions reverse on infinite measure spaces. On (R,B(R),λ)(\mathbb{R}, \mathcal{B}(\mathbb{R}), \lambda) (Lebesgue measure), L2⊈L1L^2 \not\subseteq L^1: the function f(x)=1/(1+x)f(x) = 1/(1 + |x|) is in L2(R)L^2(\mathbb{R}) but not in L1(R)L^1(\mathbb{R}). The inclusion LqLpL^q \subseteq L^p for qpq \geq p is specific to finite measure spaces. Applying the probability-space inclusion argument to general measure spaces is a common error when working with unnormalised Monte Carlo weights.

WARNING

Warning (confusing convergence modes in Monte Carlo). The Strong Law of Large Numbers gives a.s. convergence of sample means: Xˉna.s.μ\bar{X}_n \xrightarrow{\mathrm{a.s.}} \mu. The Central Limit Theorem gives convergence in distribution: n(Xˉnμ)DN(0,σ2)\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{D} N(0, \sigma^2). The Monte Carlo standard error σ/n\sigma/\sqrt{n} is an L2L^2 convergence rate. These three statements are about different modes of convergence and cannot be combined without justification. In particular, the CLT rate does not imply a.s. convergence at rate 1/n1/\sqrt{n}; the a.s. rate is O(logn/n)O(\sqrt{\log n / n}) (law of the iterated logarithm).

Uniform integrability requires verification, not assumption. UI is often claimed without proof. A sufficient condition: supαE[Xα1+ε]<\sup_\alpha \mathbb{E}[|X_\alpha|^{1+\varepsilon}] < \infty for some ε>0\varepsilon > 0 (a bounded L1+εL^{1+\varepsilon} family is UI). Checking this requires knowing the tail behaviour of XαX_\alpha — for path-dependent payoffs under stochastic volatility models, this is not always available in closed form and must be verified numerically.

Jensen's inequality is not a pricing shortcut. The inequality φ(E[X])E[φ(X)]\varphi(\mathbb{E}[X]) \leq \mathbb{E}[\varphi(X)] shows that the price of a convex payoff exceeds the payoff at the expected value. It does not compute the price. The gap E[φ(X)]φ(E[X])\mathbb{E}[\varphi(X)] - \varphi(\mathbb{E}[X]) depends on the full distribution of XX, not just its mean.

Convergence in distribution does not imply anything about the random variables themselves. Two sequences (Xn)(X_n) and (Yn)(Y_n) can converge in distribution to the same limit while being defined on completely different probability spaces. In particular, XnDXX_n \xrightarrow{D} X and YnDXY_n \xrightarrow{D} X does not imply XnYnD0X_n - Y_n \xrightarrow{D} 0. This is the source of the Skorokhod representation theorem: to recover path properties from distributional convergence, one must construct a coupling on a common probability space.


Interview Angle

PRACTICE

L1 — Junior. Expected: definitions and basic implications.

  1. "What is the difference between almost sure convergence and convergence in probability?" A.s.: the event {XnX}\{X_n \to X\} has probability 1 — every individual path converges except on a null set. In probability: for each ε>0\varepsilon > 0, the probability of being ε\varepsilon-away from the limit goes to 0. A.s. is a pointwise condition; in probability is a distributional condition. A.s. implies in probability; the converse fails (typewriter sequence counterexample).

  2. "State Jensen's inequality and give one application in option pricing." For convex φ\varphi: φ(E[X])E[φ(X)]\varphi(\mathbb{E}[X]) \leq \mathbb{E}[\varphi(X)]. Application: the price of a call option exceeds the payoff at the forward price, EQ[(STK)+](EQ[ST]K)+=(Fe(rq)TK)+\mathbb{E}^\mathbb{Q}[(S_T - K)^+] \geq (\mathbb{E}^\mathbb{Q}[S_T] - K)^+ = (Fe^{(r-q)T} - K)^+. This gives a model-free lower bound on call prices.

  3. "Which LpL^p space is standard for defining the Itô integral, and why?" L2L^2. The Itô integral 0THsdBs\int_0^T H_s \, dB_s is defined as the L2L^2 limit of simple integrands, and the Itô isometry E ⁣[(0THsdBs)2]=E ⁣[0THs2ds]\mathbb{E}\!\left[\left(\int_0^T H_s \, dB_s\right)^2\right] = \mathbb{E}\!\left[\int_0^T H_s^2 \, ds\right] requires HL2([0,T]×Ω)H \in L^2([0,T] \times \Omega).

PRACTICE

L2 — Senior. Expected: proofs and counterexamples.

  1. "Give an example of a sequence that converges almost surely but not in L1L^1. What property is it missing?" Xn=n1[0,1/n]X_n = n \cdot \mathbf{1}_{[0,1/n]} on [0,1][0,1]. Converges to 0 a.s. (for ω>0\omega > 0) but E[Xn]=1\mathbb{E}[X_n] = 1 for all nn. Missing property: uniform integrability. The tails E[Xn1Xn>M]=1\mathbb{E}[X_n \cdot \mathbf{1}_{X_n > M}] = 1 for n>Mn > M do not go to 0 uniformly.

  2. "Prove the Markov inequality in one line." For X0X \geq 0: E[X]=E[X1X<λ]+E[X1Xλ]0+λP(Xλ)\mathbb{E}[X] = \mathbb{E}[X \cdot \mathbf{1}_{X < \lambda}] + \mathbb{E}[X \cdot \mathbf{1}_{X \geq \lambda}] \geq 0 + \lambda \cdot \mathbb{P}(X \geq \lambda). Divide by λ\lambda. \square

  3. "What is uniform integrability and why does it appear in the Optional Stopping Theorem?" UI means supαE[Xα1Xα>M]0\sup_\alpha \mathbb{E}[|X_\alpha| \cdot \mathbf{1}_{|X_\alpha| > M}] \to 0 as MM \to \infty. In the OST: for an unbounded stopping time τ\tau, the stopped process MτnM_{\tau \wedge n} converges a.s. to MτM_\tau. To conclude E[Mτ]=E[M0]\mathbb{E}[M_\tau] = \mathbb{E}[M_0], we need to pass the expectation through the limit, which requires UI of (Mτn)(M_{\tau \wedge n}). Without UI, a.s. convergence does not imply L1L^1 convergence (as the Xn=n1[0,1/n]X_n = n \cdot \mathbf{1}_{[0,1/n]} example shows).

PRACTICE

L3 — Researcher. Expected: model critique and deeper structure.

  1. "The Dominated Convergence Theorem is a special case of which more general theorem? State the generalisation." The Vitali convergence theorem: if XnPXX_n \xrightarrow{\mathbb{P}} X and {Xn}\{X_n\} is UI, then XnXX_n \to X in L1L^1. DCT is the case where UI follows from the domination XnYL1|X_n| \leq Y \in L^1 (since E[Xn1Xn>M]E[Y1Y>M]0\mathbb{E}[|X_n| \cdot \mathbf{1}_{|X_n|>M}] \leq \mathbb{E}[Y \cdot \mathbf{1}_{Y>M}] \to 0). The Vitali theorem is strictly more general: it applies when no dominating function exists but UI holds by other means (e.g., bounded L1+εL^{1+\varepsilon} norm).

  2. "In what sense is L2(Ω,FT,Q)L^2(\Omega, \mathcal{F}_T, \mathbb{Q}) the natural space for contingent claims? When does a claim fail to be in L2L^2?" L2(Q)L^2(\mathbb{Q}) is a Hilbert space — it has an inner product X,Y=EQ[XY]\langle X, Y \rangle = \mathbb{E}^\mathbb{Q}[XY]. This structure is what makes the Itô isometry, the projection theorem for conditional expectation, and the martingale representation theorem all work cleanly. A claim fails L2L^2 when EQ[(g(ST))2]=\mathbb{E}^\mathbb{Q}[(g(S_T))^2] = \infty — e.g., a power option STαS_T^\alpha with α>1\alpha > 1 under log-normal dynamics requires E[ST2α]<\mathbb{E}[S_T^{2\alpha}] < \infty, which holds iff 2αμ+α(2α1)σ2/2<2\alpha\mu + \alpha(2\alpha-1)\sigma^2/2 < \infty (always true for finite TT, but the L2L^2 norm grows rapidly with α\alpha). In rough volatility models with Hurst parameter H<1/2H < 1/2, the quadratic variation is not a standard Brownian motion and the L2L^2 theory requires extension.

  3. "State the Skorokhod representation theorem and explain why convergence in distribution does not imply convergence in probability on the original probability space." Skorokhod: if XnDXX_n \xrightarrow{D} X, there exist random variables X~n,X~\tilde{X}_n, \tilde{X} defined on a common probability space ([0,1],B,λ)([0,1], \mathcal{B}, \lambda) such that X~nDXn\tilde{X}_n \xrightarrow{D} X_n, X~DX\tilde{X} \xrightarrow{D} X, and X~na.s.X~\tilde{X}_n \xrightarrow{\mathrm{a.s.}} \tilde{X}. Distributional convergence is a property of the laws, not of the random variables themselves: XnX_n and XX may live on different probability spaces. Two sequences with the same distributional limit need not be close pathwise. The Skorokhod construction works by embedding the laws into the same space via quantile functions — the joint law (coupling) is changed, not just the marginals.