Setup
Why the naive definition breaks
You likely learned conditional probability as , valid when . This is perfectly adequate for discrete problems. It fails the moment you work with continuous random variables.
Fix a continuous random variable on (Module 1) and ask: what is for a specific real value ? The event has probability zero for any . The ratio is — undefined, not merely small. Yet "the expected value of given " is a perfectly natural and computationally important quantity.
The same issue arises throughout stochastic analysis. A Brownian filtration is a -algebra, not an event. Conditioning on means "conditioning on the information available at time " — a sub--algebra of . There is no event to plug into the ratio formula.
The modern resolution — due independently to Kolmogorov (1933) and developed through the Radon-Nikodym theorem — defines conditional expectation as a random variable characterised by an integral identity, not a ratio.
Conventions
Throughout this module:
- is the probability space from Module 1: the sample space, a -algebra on , a probability measure.
- denotes the space of -measurable functions with . adds .
- denotes a sub--algebra — a -algebra in its own right, but coarser than .
- "a.s." means -almost surely: except on a set of probability zero.
- denotes a filtration: an increasing family of sub--algebras, for , representing the accumulation of market information over time.
- denotes the risk-neutral measure; the continuously compounded risk-free rate.
The pricing motivation
The risk-neutral price of a derivative with payoff at time is:
This formula appears in Black-Scholes, Heston, Hull-White, and every other risk-neutral pricing model. The object is not a number: it is a random variable — one that depends on which path the market has taken up to time , i.e. on . To manipulate it — to argue that is a martingale, to apply Itô's lemma to it, to use it in hedging arguments — you need the measure-theoretic definition.
Why this matters on a desk. Valuations computed inside a risk engine are conditional expectations: "given today's market state (the information), what is the expected discounted payoff?" XVA desks compute across thousands of Monte Carlo paths. The tower property (proved below) is the mathematical identity that guarantees path-wise consistency when you aggregate over nested simulation steps. When a model produces inconsistent valuations across time steps, the root cause is almost always a violation of the tower property — typically caused by approximating the conditional expectation with the wrong conditioning set.
Theory
1. Definition via Radon-Nikodym
Definition 1.1 (Conditional Expectation). Let and let be a sub--algebra. The conditional expectation of given , written , is defined as any -measurable random variable satisfying:
Such a exists and is unique -a.s. We call this the defining property of conditional expectation.
Two conditions are imposed: (i) must be -measurable — it must be determinable from the information in alone; (ii) must integrate to the same value as over every set . Together these say: is the best guess of given the information in , calibrated so that its integral always matches 's.
Existence and uniqueness via Radon-Nikodym. Define a signed measure by . Since , is a finite signed measure on , and (absolutely continuous with respect to restricted to ). By the Radon-Nikodym theorem, there exists a -measurable function satisfying for all . This is exactly the defining property. Any two such functions agree -a.s. by the uniqueness clause of the Radon-Nikodym theorem.
2. Geometric interpretation (L² case)
Orthogonal projection in . When , the conditional expectation is the orthogonal projection of onto the closed subspace .
Orthogonality here means: the residual is orthogonal to every -measurable :
This is equivalent to the defining property: set (or use a linearity/density argument). The projection interpretation implies that is the best -measurable predictor of in mean-square sense — minimising over all -measurable .
The geometric picture makes several properties obvious. The projection of a projection onto the same space is the same projection — this is a restatement of the tower property. A vector already lying in the subspace projects to itself — this is . A vector orthogonal to the subspace projects to zero — this corresponds to the independence case .
3. Key properties
Theorem 3.1 (Properties of Conditional Expectation). Let , , and sub--algebras. Then:
(i) Linearity: a.s.
(ii) Tower property: If , then a.s.
(iii) Pulling out known factors: If is -measurable and , then a.s.
(iv) Trivial conditioning: a.s. (constant random variable).
(v) Full conditioning: a.s.
(vi) Independence: If is independent of (i.e. is independent of every ), then a.s.
(vii) Jensen's inequality: If is convex and , then a.s.
Proof of (ii) — Tower property. We must show that satisfies the defining property of . The candidate is , which is already -measurable (by definition of ). It remains to check the integral condition: for every :
Since , every is also in . By the defining property applied to with :
By the defining property applied to with :
Both sides equal , so a.s.
Proof sketch of (iii) — Pulling out known factors. Verify that satisfies both conditions of Definition 1.1. (a) It is -measurable since is -measurable and is -measurable. (b) For any :
by the key identity when — which follows from approximating by -measurable simple functions and using linearity and the defining property.
The tower property — qualitative reading. Conditioning on less information than you already have can only reduce information. If (coarser filtration), then conditioning on after already conditioning on collapses back to what the coarser would have given you directly. Iterated conditioning always loses information to the smallest -algebra.
Common mistake. Candidates confuse the direction: requires . The other direction — when — gives for a different reason: is already -measurable, so further conditioning on returns it unchanged (property (v) applied to the sub-space ). In either case, the result is conditioning on the coarser -algebra.
4. Conditioning on a random variable
Definition 4.1. For a random variable , define
where is the -algebra generated by (Module 1).
Since is -measurable, the Doob-Dynkin lemma guarantees the existence of a Borel function such that . The function evaluated at gives the conditional expectation in the sense of regular conditional distributions.
Regular conditional distributions: for most practical spaces (Polish spaces, which include and ), there exists a probability kernel such that . This is the rigorous version of "the conditional distribution of given ." Existence on general measurable spaces is not guaranteed but holds in all practically relevant cases.
Example 4.2 (Bivariate Gaussian). Let be jointly Gaussian with means , standard deviations , and correlation . Then:
Derivation. Write where we choose to make uncorrelated with (and hence, since jointly Gaussian, independent of ). Setting :
Then is Gaussian and independent of , so by property (vi). Applying linearity and property (iii):
In finance: if and are jointly Gaussian log-returns, this formula is the best linear predictor of from — the foundation of factor models and regression-based hedging.
5. Martingales via conditional expectation
Theorem 5.1 (Doob Martingale). Let and a filtration with . Define . Then is a martingale with respect to .
Proof. We check the three martingale conditions.
-
Adaptedness: is -measurable by definition. ✓
-
Integrability: , using Jensen's inequality (property (vii)) and the tower property. ✓
-
Martingale property: For , since (filtration is increasing), the tower property gives:
Martingales and pricing. A Doob martingale is the canonical example showing that "best predictions of a terminal value" form a martingale. In risk-neutral pricing: , so the discounted price process is exactly a Doob martingale under with terminal value . The no-arbitrage condition requires discounted prices to be -martingales — this is the First Fundamental Theorem of Asset Pricing. The tower property is the mathematical engine that makes this time-consistency work.
The converse question — which martingales arise as Doob martingales? — is the content of the Martingale Representation Theorem: under certain conditions (e.g., a Brownian filtration), every square-integrable martingale can be written as a stochastic integral, which is in turn a Doob martingale for some terminal random variable. This will be covered in the Stochastic Calculus course.
Validation
The companion notebook at /notebooks/probability-theory-conditional-expectation.html verifies every claim in this module using pure Python with exact rational arithmetic (cells 0–4) and float arithmetic (cell 5).
The notebook checks:
- Discrete CE from scratch — uniform, : compute and verify the defining property holds on both atoms.
- Tower property — 3-level filtration on : verify exactly.
- Pulling-out-known-factors — verify on the same discrete setup.
- Jensen's inequality — verify for .
- Bivariate Gaussian CE — implement and verify numerically for known parameter values; summarise all checks.
Hand exercise before opening the notebook. Let with uniform (each outcome has probability ). Let , so . Define .
- Compute . Since is -measurable, it is constant on each atom. On : the average of over under the uniform measure is . On : .
- Verify the defining property the defining property: and ✓. Check similarly.
- Verify using the notebook — cell 1 replicates this calculation exactly.
Limitations
Almost-sure uniqueness — not pointwise. The defining property the defining property determines only up to sets of probability zero. Two functions and satisfying the defining property may disagree on a -null set. Always say "a version of " when precision matters. When concatenating conditional expectations (e.g., computing for each in a simulation), a poor choice of version can produce measurability problems.
Simulation trap: the null-set ambiguity. In Monte Carlo, you approximate by regressing on the state variables generating (this is the Longstaff-Schwartz idea for American options). The regression produces one specific version of the conditional expectation. If you then use this version in a subsequent time step — for example, comparing it against a threshold to decide early exercise — you are implicitly assuming this version is "correct" on every simulated path. In practice this is fine for -almost all paths, but it fails on null sets. More subtly: the approximation error from regression and the null-set issue are separate sources of error. Conflating them leads to incorrect convergence analysis.
The vs. gap. The geometric Hilbert-space interpretation (orthogonal projection) requires . For , conditional expectation still exists by the Radon-Nikodym argument, but the "best predictor in mean-square" interpretation breaks down — the mean-square error may be infinite. In practice, all bounded payoffs are in for all , so this distinction rarely bites in pricing, but it does matter for heavy-tailed distributions (e.g., stable processes, power-law returns).
Regular conditional distributions. On general measurable spaces, regular conditional distributions (the kernels ) need not exist. They do exist when is a Polish space (complete separable metric space) — which covers , , and (càdlàg paths). For stochastic processes in finance this is always satisfied. But on exotic path spaces (non-separable function spaces, abstract probability spaces without metric structure) the existence of regular conditional distributions is not automatic.
Monte Carlo regression error. In numerical computation (LSMC, nested simulation), is approximated by regression: fit where are basis functions. This introduces: (1) approximation error from truncating the basis expansion; (2) statistical error from estimating coefficients on finitely many paths; (3) model error from choosing the wrong state variables to condition on. The tower property holds exactly in theory; in a simulation, small violations of it are a direct measure of regression quality.
Interview Angle
L1 — Junior quant / quant developer
Expected depth: State the defining property, identify CE as a random variable not a number, state the tower property correctly, apply independence.
Q1. "What is in the context of option pricing?"
Expected: it is the time- risk-neutral value (up to discounting) — a random variable representing the expected payoff given the market information available at time . It is not a single number; it depends on the realised path up to .
Common mistake: saying "it is the conditional probability" (it is an expectation) or "it is a number" (it is a random variable indexed by ).
Q2. "State the tower property."
Expected: if , then a.s. Strong answer adds: "The smaller (coarser) -algebra wins."
Common mistake: reversing the direction — stating the outer conditioning is on and the inner on without checking the inclusion order.
Q3. "If and are independent, what is ?"
Expected: a.s. — the constant random variable equal to the unconditional mean. Knowing gives no information about .
L2 — Senior quant
Expected depth: Prove the tower property from the defining property, explain why CE is not a number, prove the pulling-out-known-factors property.
Q1. "Prove the tower property from the defining property of conditional expectation."
Expected: the proof in §3 above — verify that satisfies the defining integral identity for by using and applying the defining property twice.
Q2. "Why is a random variable and not a single number?"
Expected: it is a function , -measurable. Its value at depends on which -atom contains . On a continuous space, it varies continuously with the conditioning information. Saying "it is a number" is the mistake of treating as a single event rather than a -algebra.
Q3. "State and prove: if is -measurable, then a.s."
Expected: verify the two conditions of Definition 1.1 — -measurability (product of -measurable functions) and the integral identity. For the integral identity: approximate by simple functions , use linearity and the MCT; or use the functional form argument for bounded and extend by density. Strong answers note that bounded or are sufficient conditions; the general case requires .
L3 — Quant researcher
Expected depth: Radon-Nikodym theorem statement and role in CE existence, Monte Carlo regression error decomposition, failure conditions for CE.
Q1. "What is the Radon-Nikodym theorem and why does it guarantee the existence of conditional expectation?"
Expected: RN theorem states that if (absolutely continuous signed measures on ), there exists a -measurable with for all . Apply with and : the density is .
Q2. "In Monte Carlo simulation, you approximate by regressing on the state variables that generate . What are the sources of error in this approximation?"
Expected: (1) Approximation/basis error — the regression function space (e.g., polynomials of degree ) may not contain the true conditional expectation; (2) Statistical/variance error — finite sample size means the regression coefficients are estimated with noise, creating error; (3) State variable misspecification — if the chosen state variables do not generate exactly (e.g., omitting a dimension of the Markov state), the regression targets the wrong conditional expectation entirely; (4) Nested simulation error — if itself is estimated by an inner simulation, its noise feeds into the outer regression.
Q3. "Can conditional expectation fail to exist? Under what conditions?"
Expected: CE always exists for on any probability space (by Radon-Nikodym, which holds for any -finite measure). The subtlety is regular conditional distributions — these can fail on non-separable measurable spaces. On Polish spaces they always exist. A deeper failure mode: if is not -finite (unusual in probability, where always implies -finiteness), the Radon-Nikodym theorem may not apply in full generality. In all standard finance applications, existence is guaranteed.