ProbabilityHilbert SpacesConditional Expectation

Conditional Expectation as an L² Projection

12 min readLevel: Medium

Context

Let (Ω,F,P)(\Omega,\mathcal F,\mathbb P) be a probability space and
GF\mathcal G \subseteq \mathcal F a sub-σ\sigma-algebra.

We work in the Hilbert space L2(Ω,F,P)L^2(\Omega,\mathcal F,\mathbb P) with inner product X,Y=E[XY].\langle X,Y\rangle = \mathbb E[XY].

Define the closed subspace L2(G)={YL2(F):Y is G-measurable}.L^2(\mathcal G) = \{Y \in L^2(\mathcal F) : Y \text{ is } \mathcal G\text{-measurable}\}.


Theorem (Projection Characterization)

For any XL2(F)X \in L^2(\mathcal F), the conditional expectation
E[XG]\mathbb E[X \mid \mathcal G] is the unique orthogonal projection of XX onto L2(G)L^2(\mathcal G).

Equivalently: E[XG]=argminYL2(G)E[(XY)2].\mathbb E[X \mid \mathcal G] = \arg\min_{Y \in L^2(\mathcal G)} \mathbb E[(X-Y)^2].


Orthogonality Property

For all ZL2(G)Z \in L^2(\mathcal G), E[(XE[XG])Z]=0.\mathbb E\big[(X - \mathbb E[X\mid\mathcal G])Z\big] = 0.

In particular, it is sufficient to check this for
Z=1AZ = \mathbf 1_A, AGA \in \mathcal G.


Proof Sketch

  1. L2(F)L^2(\mathcal F) is a Hilbert space.
  2. L2(G)L^2(\mathcal G) is a closed subspace.
  3. Every element of a Hilbert space admits a unique orthogonal projection onto a closed subspace.
  4. The projection satisfies the defining property of conditional expectation.

Hence: E[XG]=ΠL2(G)(X).\mathbb E[X\mid\mathcal G] = \Pi_{L^2(\mathcal G)}(X).


Interpretation

  • XX: future payoff or random outcome
  • G\mathcal G: available information
  • E[XG]\mathbb E[X\mid\mathcal G]: best mean-square predictor using only G\mathcal G

This underlies:

  • filtering
  • least-squares Monte Carlo
  • regression-based pricing
  • risk-neutral valuation

Finite Partition Example

If G=σ(A1,,An)\mathcal G = \sigma(A_1,\dots,A_n), E[XG]=i=1nE[XAi]1Ai.\mathbb E[X\mid\mathcal G] = \sum_{i=1}^n \mathbb E[X\mid A_i]\mathbf 1_{A_i}.

The projection is piecewise constant on the partition.


Quant Finance Example

Let X=(STK)+X=(S_T-K)^+ and G=σ(St)\mathcal G=\sigma(S_t).

Then: E[XSt]\mathbb E[X\mid S_t] is the Black–Scholes call price function.

This is the best L2L^2 estimator of the payoff given today’s spot.


Key Takeaways

  • Conditional expectation = orthogonal projection in L2L^2
  • Optimal under squared loss
  • Foundational for pricing, filtering, and regression methods

Interview Angle

L1: What does it mean that conditional expectation is the best L2L^2 predictor? Why do we use squared loss rather than absolute loss in this context?

E[XG]\mathbb{E}[X \mid \mathcal{G}] is the unique G\mathcal{G}-measurable random variable minimising E[(XY)2]\mathbb{E}[(X - Y)^2] over all G\mathcal{G}-measurable YY. "Best" means no other predictor using only the information in G\mathcal{G} achieves smaller expected squared error. The optimality is global: it holds simultaneously for all realisations in the L2L^2 sense, not just on average.

Squared loss is preferred in this Hilbert space framework for structural reasons. L2L^2 is a Hilbert space: it has an inner product X,Y=E[XY]\langle X, Y\rangle = \mathbb{E}[XY], an induced norm X2=E[X2]1/2\|X\|_2 = \mathbb{E}[X^2]^{1/2}, and — crucially — the projection theorem holds. Every element has a unique orthogonal projection onto any closed subspace, and this projection is exactly the conditional expectation. Under absolute loss (L1L^1), the optimal predictor is the conditional median, not the conditional mean; L1L^1 is a Banach space but not a Hilbert space, so the orthogonal projection structure does not apply and the mathematics is considerably more complex.

In quant finance, squared loss also has a natural economic interpretation: it penalises large hedging errors more than small ones, matching the quadratic P&L structure of a delta-hedged position.

L2: Prove that L2(G)L^2(\mathcal{G}) is a closed subspace of L2(F)L^2(\mathcal{F}). Why does closedness matter for the projection theorem?

L2(G)L^2(\mathcal{G}) is a subspace. It is clearly non-empty (contains 0). If Y1,Y2L2(G)Y_1, Y_2 \in L^2(\mathcal{G}) and α,βR\alpha, \beta \in \mathbb{R}, then αY1+βY2\alpha Y_1 + \beta Y_2 is G\mathcal{G}-measurable (measurability is preserved under linear combinations) and square-integrable (by the triangle inequality in L2L^2), so it belongs to L2(G)L^2(\mathcal{G}).

Closedness. Let (Yn)L2(G)(Y_n) \subset L^2(\mathcal{G}) be a sequence converging to YL2(F)Y \in L^2(\mathcal{F}) in L2L^2-norm: E[(YnY)2]0\mathbb{E}[(Y_n - Y)^2] \to 0. We need YL2(G)Y \in L^2(\mathcal{G}), i.e., YY is G\mathcal{G}-measurable. L2L^2-convergence implies a subsequence YnkYY_{n_k} \to Y almost surely. Since each YnkY_{n_k} is G\mathcal{G}-measurable, and a.s. limits of G\mathcal{G}-measurable functions are G\mathcal{G}-measurable (by completeness of the probability space and the fact that G\mathcal{G} is closed under a.s. limits of measurable functions), we conclude YL2(G)Y \in L^2(\mathcal{G}).

Why closedness matters. The projection theorem in Hilbert spaces states: for every XHX \in \mathcal{H} and every closed subspace KH\mathcal{K} \subseteq \mathcal{H}, there exists a unique X^K\hat{X} \in \mathcal{K} with XX^XY\|X - \hat{X}\| \leq \|X - Y\| for all YKY \in \mathcal{K}. Without closedness, the infimum infYKXY\inf_{Y \in \mathcal{K}} \|X - Y\| may not be attained: a minimising sequence converges to a limit outside K\mathcal{K}, and no projection exists. Closedness ensures the minimiser is achieved inside the subspace.

L3: How does the L2L^2 projection interpretation of conditional expectation underpin least-squares Monte Carlo (Longstaff-Schwartz)? What is the connection to the martingale representation theorem?

Least-squares Monte Carlo (Longstaff-Schwartz). For an American option, the continuation value at time tkt_k along path ii is EQ[Ck+1Stk(i)]\mathbb{E}^{\mathbb{Q}}[C_{k+1} \mid S_{t_k}^{(i)}], where Ck+1C_{k+1} is the (discounted) continuation value at the next step. This conditional expectation is a function of StkS_{t_k} alone (by the Markov property of GBM), so it lies in L2(σ(Stk))L^2(\sigma(S_{t_k})). Longstaff-Schwartz approximates this function by projecting Ck+1C_{k+1} onto a finite-dimensional subspace of L2(σ(Stk))L^2(\sigma(S_{t_k})) — spanned by a chosen basis of functions ψ1(Stk),,ψm(Stk)\psi_1(S_{t_k}), \ldots, \psi_m(S_{t_k}) (e.g., Laguerre polynomials). The L2L^2-projection coefficient vector β=argminβE[(Ck+1βψ(Stk))2]\beta^* = \arg\min_\beta \mathbb{E}[(C_{k+1} - \beta^\top\psi(S_{t_k}))^2] is estimated by OLS regression across Monte Carlo paths. The algorithm is exactly the sample-path approximation of the L2L^2 projection theorem.

The key structural insight: the projection interpretation guarantees that the OLS estimator β^\hat{\beta} converges (as the number of paths \to \infty and basis dimension \to \infty at appropriate rates) to the true conditional expectation. Convergence is measured in L2L^2, matching the norm of the underlying Hilbert space.

Connection to the martingale representation theorem (MRT). The MRT states that in a Brownian filtration, any Q\mathbb{Q}-martingale MtM_t can be written as: Mt=M0+0tϕsdW~sM_t = M_0 + \int_0^t \phi_s \, d\widetilde{W}_s for a unique adapted process ϕ\phi. This is a representation theorem in L2L^2: the space of square-integrable Q\mathbb{Q}-martingales is isometric (via the Itô isometry) to the space of square-integrable adapted processes ϕ\phi. The MRT says the projection of any L2L^2 functional of the Brownian path onto the subspace of stochastic integrals is surjective — every such functional is reachable. In pricing terms: the option payoff φ(ST)\varphi(S_T) can be projected onto the subspace of self-financing portfolios (stochastic integrals), and the resulting ϕt\phi_t is the delta hedge. The L2L^2 projection that is conditional expectation produces the option price; the L2L^2 projection that is MRT produces the hedge. Both are faces of the same Hilbert space geometry.

Verify your understanding before moving on.

Start Quiz →