Setup
Mathematical context
The theory built in Modules 1–4 requires a precise language for comparing random variables and talking about sequences converging to a limit. Four distinct notions of convergence appear naturally in quantitative finance: almost sure convergence, convergence in probability, convergence, and convergence in distribution. These are not equivalent, and confusing them produces incorrect results — in Monte Carlo error bounds, in Central Limit Theorem applications, and in the conditions required for the Optional Stopping Theorem.
This module establishes the function spaces that are the natural home for random variables, derives the fundamental inequalities (Jensen, Hölder, Markov), characterises the four modes of convergence, and introduces uniform integrability — the concept that bridges and almost sure convergence and appears explicitly in the OST conditions of Module 4.
Stated assumptions
- is a complete probability space (Module 1).
- Random variables are real-valued measurable functions (or where noted).
- Lebesgue integration is used throughout (Module 2). The results hold under any -finite measure; on a probability space the proofs simplify.
- Conventions: unless otherwise stated; the case is treated separately.
Financial Insight. On a trading desk, the choice of space is a modelling assumption, not an abstraction. An option payoff in has finite variance under the risk-neutral measure — a necessary condition for the Black-Scholes delta hedge to be well-defined. A process in only has a finite price but not necessarily finite hedging error. Exotic payoffs (e.g., power options ) may fail to be in under log-normal dynamics, invalidating standard Greeks formulas. Uniform integrability appears every time you pass a limit through an expectation — which Monte Carlo methods do at every step.
Theory
1. Lp spaces
Definition 5.1 ( space). For , define
For : , the essential supremum (smallest such that ).
Elements of are equivalence classes of random variables that agree -almost everywhere.
The norm measures the average magnitude of raised to the -th power. For : — the mean absolute value. For : — the root mean square, the natural norm for variance and hedging error.
Theorem 5.1 (Riesz-Fischer — completeness of ). For , is a Banach space (complete normed vector space). In particular, every Cauchy sequence in has a limit in .
Completeness is what guarantees that Itô integrals (defined as limits of simple integrands) actually exist in .
Theorem 5.2 ( inclusions on probability spaces). For a probability space with :
This inclusion is specific to probability spaces (i.e., ). On a general measure space the inclusion reverses. The implication: is a strictly better-behaved space than — finite variance implies finite mean, but not vice versa.
2. Fundamental inequalities
Theorem 5.3 (Markov's inequality). For and :
Setting gives Chebyshev's inequality: .
Markov is the simplest probabilistic bound. Its proof is a one-line application of monotonicity of integration: . It is tight: take , .
Theorem 5.4 (Jensen's inequality). Let be convex and . Then:
For concave , the inequality reverses.
Proof. Since is convex, for any there exists a supporting hyperplane: for some (the subgradient at ). Set and take expectations on both sides:
Jensen is omnipresent in finance. Applications: convexity of the option payoff implies the value of an option on the average is less than the average option value (Jensen's inequality in Asian pricing). The log-normal expected value: (convexity of ). The sub-additivity of means (i.e., ).
Theorem 5.5 (Hölder's inequality). For with (Hölder conjugates):
The case is the Cauchy-Schwarz inequality: .
Hölder's inequality is used to bound covariance terms in option pricing (e.g., proving that for a self-financing portfolio with square-integrable delta), and in the theory of stochastic integration where it controls the cross-terms.
Theorem 5.6 (Minkowski's inequality). For :
This is the triangle inequality for the norm — what makes a normed space. It is the key step in verifying that is a vector space under the norm.
3. Modes of convergence
We consider a sequence and a limit , all defined on .
Definition 5.2 (Four modes of convergence).
(AS) Almost sure convergence: if .
(P) Convergence in probability: if for every .
(Lp) convergence: if .
(D) Convergence in distribution: if for all bounded continuous .
The four modes are strictly ordered in strength. The implication diagram is:
and both and a.s. convergence imply convergence in probability, which implies convergence in distribution. No other general implication holds.
Example 5.1 (Implication failures — the standard counterexamples).
(a) A.S. does not imply . Let with Lebesgue measure. Set . Then for every (i.e., a.s.), but . So a.s. but in .
(b) does not imply a.s. The typewriter sequence: partition into intervals for , indexed sequentially. Set . Then (so in ), but for every , infinitely often and infinitely often — the sequence does not converge a.s.
(c) Convergence in probability does not imply a.s. The typewriter sequence also serves here: in probability (same argument as ) but not a.s.
Remark (Subsequence criterion). if and only if every subsequence has a further subsequence with . This is one of the most useful tools in stochastic analysis for promoting convergence in probability to almost sure convergence along a subsequence.
4. Uniform integrability
Definition 5.3 (Uniform integrability). A family of random variables is uniformly integrable (UI) if
Intuitively: the tails of all simultaneously become negligible as the truncation level grows. Uniform integrability ensures that convergence in probability "controls the tails" well enough to imply convergence.
Theorem 5.7 (Vitali convergence theorem). Let be a UI family and . Then and in .
This is the definitive statement bridging convergence in probability and convergence. The Dominated Convergence Theorem (Module 2) is a special case: if with , then is UI by .
Remark (UI and martingale theory). A martingale is UI if and only if it converges a.s. and in to a terminal variable with . This is the Doob martingale convergence theorem. The connection to Module 4: the OST condition "the stopped martingale is UI" is precisely saying that the stopped process has a well-behaved limit.
Example 5.2 (UI in option pricing). Under the risk-neutral measure , the family for a bounded payoff (e.g., a call with notional cap) is UI — the payoff is dominated by a constant. For an unbounded payoff such as (power option), the family is UI only if has sufficiently thin tails under (e.g., finite higher moments under log-normal dynamics). Failing to verify UI when applying DCT in a simulation loop is one cause of Monte Carlo bias that does not diminish with sample size.
Validation
The companion notebook verifies:
- norms on a finite discrete probability space using exact rational arithmetic: , , , and confirms the inclusion numerically.
- Jensen's inequality: for (convex) and (concave), verifies and the reverse.
- Hölder's and Cauchy-Schwarz inequalities: verified on explicit numerical examples.
- Convergence counterexamples: simulates the sequence, confirms a.s. convergence to 0 while for all .
- Uniform integrability: empirically verifies that a bounded family is UI while an unbounded family fails the UI criterion at each truncation level.
Hand exercise.
(a) Let be uniform on with equal probability . Compute , , and exactly. Confirm .
(b) Let (convex). With as above, verify Jensen's inequality by direct computation.
(c) Give an explicit example of a sequence on that converges to 0 in but not almost surely.
Limitations
inclusions reverse on infinite measure spaces. On (Lebesgue measure), : the function is in but not in . The inclusion for is specific to finite measure spaces. Applying the probability-space inclusion argument to general measure spaces is a common error when working with unnormalised Monte Carlo weights.
Warning (confusing convergence modes in Monte Carlo). The Strong Law of Large Numbers gives a.s. convergence of sample means: . The Central Limit Theorem gives convergence in distribution: . The Monte Carlo standard error is an convergence rate. These three statements are about different modes of convergence and cannot be combined without justification. In particular, the CLT rate does not imply a.s. convergence at rate ; the a.s. rate is (law of the iterated logarithm).
Uniform integrability requires verification, not assumption. UI is often claimed without proof. A sufficient condition: for some (a bounded family is UI). Checking this requires knowing the tail behaviour of — for path-dependent payoffs under stochastic volatility models, this is not always available in closed form and must be verified numerically.
Jensen's inequality is not a pricing shortcut. The inequality shows that the price of a convex payoff exceeds the payoff at the expected value. It does not compute the price. The gap depends on the full distribution of , not just its mean.
Convergence in distribution does not imply anything about the random variables themselves. Two sequences and can converge in distribution to the same limit while being defined on completely different probability spaces. In particular, and does not imply . This is the source of the Skorokhod representation theorem: to recover path properties from distributional convergence, one must construct a coupling on a common probability space.
Interview Angle
L1 — Junior. Expected: definitions and basic implications.
-
"What is the difference between almost sure convergence and convergence in probability?" A.s.: the event has probability 1 — every individual path converges except on a null set. In probability: for each , the probability of being -away from the limit goes to 0. A.s. is a pointwise condition; in probability is a distributional condition. A.s. implies in probability; the converse fails (typewriter sequence counterexample).
-
"State Jensen's inequality and give one application in option pricing." For convex : . Application: the price of a call option exceeds the payoff at the forward price, . This gives a model-free lower bound on call prices.
-
"Which space is standard for defining the Itô integral, and why?" . The Itô integral is defined as the limit of simple integrands, and the Itô isometry requires .
L2 — Senior. Expected: proofs and counterexamples.
-
"Give an example of a sequence that converges almost surely but not in . What property is it missing?" on . Converges to 0 a.s. (for ) but for all . Missing property: uniform integrability. The tails for do not go to 0 uniformly.
-
"Prove the Markov inequality in one line." For : . Divide by .
-
"What is uniform integrability and why does it appear in the Optional Stopping Theorem?" UI means as . In the OST: for an unbounded stopping time , the stopped process converges a.s. to . To conclude , we need to pass the expectation through the limit, which requires UI of . Without UI, a.s. convergence does not imply convergence (as the example shows).
L3 — Researcher. Expected: model critique and deeper structure.
-
"The Dominated Convergence Theorem is a special case of which more general theorem? State the generalisation." The Vitali convergence theorem: if and is UI, then in . DCT is the case where UI follows from the domination (since ). The Vitali theorem is strictly more general: it applies when no dominating function exists but UI holds by other means (e.g., bounded norm).
-
"In what sense is the natural space for contingent claims? When does a claim fail to be in ?" is a Hilbert space — it has an inner product . This structure is what makes the Itô isometry, the projection theorem for conditional expectation, and the martingale representation theorem all work cleanly. A claim fails when — e.g., a power option with under log-normal dynamics requires , which holds iff (always true for finite , but the norm grows rapidly with ). In rough volatility models with Hurst parameter , the quadratic variation is not a standard Brownian motion and the theory requires extension.
-
"State the Skorokhod representation theorem and explain why convergence in distribution does not imply convergence in probability on the original probability space." Skorokhod: if , there exist random variables defined on a common probability space such that , , and . Distributional convergence is a property of the laws, not of the random variables themselves: and may live on different probability spaces. Two sequences with the same distributional limit need not be close pathwise. The Skorokhod construction works by embedding the laws into the same space via quantile functions — the joint law (coupling) is changed, not just the marginals.