Brownian Bridge™

Context

Let $(\Omega,\mathcal F,\mathbb P)$ be a probability space and
$\mathcal G \subseteq \mathcal F$ a sub- $\sigma$ -algebra.

We work in the Hilbert space $L^2(\Omega,\mathcal F,\mathbb P)$ with inner product $\langle X,Y\rangle = \mathbb E[XY].$

Define the closed subspace $L^2(\mathcal G) = \{Y \in L^2(\mathcal F) : Y \text{ is } \mathcal G\text{-measurable}\}.$

Theorem (Projection Characterization)

For any $X \in L^2(\mathcal F)$ , the conditional expectation
$\mathbb E[X \mid \mathcal G]$ is the unique orthogonal projection of $X$ onto $L^2(\mathcal G)$ .

Equivalently: $\mathbb E[X \mid \mathcal G] = \arg\min_{Y \in L^2(\mathcal G)} \mathbb E[(X-Y)^2].$

Orthogonality Property

For all $Z \in L^2(\mathcal G)$ , $\mathbb E\big[(X - \mathbb E[X\mid\mathcal G])Z\big] = 0.$

In particular, it is sufficient to check this for
$Z = \mathbf 1_A$ , $A \in \mathcal G$ .

Proof Sketch

$L^2(\mathcal F)$ is a Hilbert space.
$L^2(\mathcal G)$ is a closed subspace.
Every element of a Hilbert space admits a unique orthogonal projection onto a closed subspace.
The projection satisfies the defining property of conditional expectation.

Hence: $\mathbb E[X\mid\mathcal G] = \Pi_{L^2(\mathcal G)}(X).$

Interpretation

$X$ : future payoff or random outcome
$\mathcal G$ : available information
$\mathbb E[X\mid\mathcal G]$ : best mean-square predictor using only $\mathcal G$

This underlies:

filtering
least-squares Monte Carlo
regression-based pricing
risk-neutral valuation

Finite Partition Example

If $\mathcal G = \sigma(A_1,\dots,A_n)$ , $\mathbb E[X\mid\mathcal G] = \sum_{i=1}^n \mathbb E[X\mid A_i]\mathbf 1_{A_i}.$

The projection is piecewise constant on the partition.

Quant Finance Example

Let $X=(S_T-K)^+$ and $\mathcal G=\sigma(S_t)$ .

Then: $\mathbb E[X\mid S_t]$ is the Black–Scholes call price function.

This is the best $L^2$ estimator of the payoff given today’s spot.

Key Takeaways

Conditional expectation = orthogonal projection in $L^2$
Optimal under squared loss
Foundational for pricing, filtering, and regression methods

Interview Angle

L1: What does it mean that conditional expectation is the best $L^2$ predictor? Why do we use squared loss rather than absolute loss in this context?

$\mathbb{E}[X \mid \mathcal{G}]$ is the unique $\mathcal{G}$ -measurable random variable minimising $\mathbb{E}[(X - Y)^2]$ over all $\mathcal{G}$ -measurable $Y$ . "Best" means no other predictor using only the information in $\mathcal{G}$ achieves smaller expected squared error. The optimality is global: it holds simultaneously for all realisations in the $L^2$ sense, not just on average.

Squared loss is preferred in this Hilbert space framework for structural reasons. $L^2$ is a Hilbert space: it has an inner product $\langle X, Y\rangle = \mathbb{E}[XY]$ , an induced norm $\|X\|_2 = \mathbb{E}[X^2]^{1/2}$ , and — crucially — the projection theorem holds. Every element has a unique orthogonal projection onto any closed subspace, and this projection is exactly the conditional expectation. Under absolute loss ( $L^1$ ), the optimal predictor is the conditional median, not the conditional mean; $L^1$ is a Banach space but not a Hilbert space, so the orthogonal projection structure does not apply and the mathematics is considerably more complex.

In quant finance, squared loss also has a natural economic interpretation: it penalises large hedging errors more than small ones, matching the quadratic P&L structure of a delta-hedged position.

L2: Prove that $L^2(\mathcal{G})$ is a closed subspace of $L^2(\mathcal{F})$ . Why does closedness matter for the projection theorem?

$L^2(\mathcal{G})$ is a subspace. It is clearly non-empty (contains 0). If $Y_1, Y_2 \in L^2(\mathcal{G})$ and $\alpha, \beta \in \mathbb{R}$ , then $\alpha Y_1 + \beta Y_2$ is $\mathcal{G}$ -measurable (measurability is preserved under linear combinations) and square-integrable (by the triangle inequality in $L^2$ ), so it belongs to $L^2(\mathcal{G})$ .

Closedness. Let $(Y_n) \subset L^2(\mathcal{G})$ be a sequence converging to $Y \in L^2(\mathcal{F})$ in $L^2$ -norm: $\mathbb{E}[(Y_n - Y)^2] \to 0$ . We need $Y \in L^2(\mathcal{G})$ , i.e., $Y$ is $\mathcal{G}$ -measurable. $L^2$ -convergence implies a subsequence $Y_{n_k} \to Y$ almost surely. Since each $Y_{n_k}$ is $\mathcal{G}$ -measurable, and a.s. limits of $\mathcal{G}$ -measurable functions are $\mathcal{G}$ -measurable (by completeness of the probability space and the fact that $\mathcal{G}$ is closed under a.s. limits of measurable functions), we conclude $Y \in L^2(\mathcal{G})$ .

Why closedness matters. The projection theorem in Hilbert spaces states: for every $X \in \mathcal{H}$ and every closed subspace $\mathcal{K} \subseteq \mathcal{H}$ , there exists a unique $\hat{X} \in \mathcal{K}$ with $\|X - \hat{X}\| \leq \|X - Y\|$ for all $Y \in \mathcal{K}$ . Without closedness, the infimum $\inf_{Y \in \mathcal{K}} \|X - Y\|$ may not be attained: a minimising sequence converges to a limit outside $\mathcal{K}$ , and no projection exists. Closedness ensures the minimiser is achieved inside the subspace.

L3: How does the $L^2$ projection interpretation of conditional expectation underpin least-squares Monte Carlo (Longstaff-Schwartz)? What is the connection to the martingale representation theorem?

Least-squares Monte Carlo (Longstaff-Schwartz). For an American option, the continuation value at time $t_k$ along path $i$ is $\mathbb{E}^{\mathbb{Q}}[C_{k+1} \mid S_{t_k}^{(i)}]$ , where $C_{k+1}$ is the (discounted) continuation value at the next step. This conditional expectation is a function of $S_{t_k}$ alone (by the Markov property of GBM), so it lies in $L^2(\sigma(S_{t_k}))$ . Longstaff-Schwartz approximates this function by projecting $C_{k+1}$ onto a finite-dimensional subspace of $L^2(\sigma(S_{t_k}))$ — spanned by a chosen basis of functions $\psi_1(S_{t_k}), \ldots, \psi_m(S_{t_k})$ (e.g., Laguerre polynomials). The $L^2$ -projection coefficient vector $\beta^* = \arg\min_\beta \mathbb{E}[(C_{k+1} - \beta^\top\psi(S_{t_k}))^2]$ is estimated by OLS regression across Monte Carlo paths. The algorithm is exactly the sample-path approximation of the $L^2$ projection theorem.

The key structural insight: the projection interpretation guarantees that the OLS estimator $\hat{\beta}$ converges (as the number of paths $\to \infty$ and basis dimension $\to \infty$ at appropriate rates) to the true conditional expectation. Convergence is measured in $L^2$ , matching the norm of the underlying Hilbert space.

Connection to the martingale representation theorem (MRT). The MRT states that in a Brownian filtration, any $\mathbb{Q}$ -martingale $M_t$ can be written as: $M_t = M_0 + \int_0^t \phi_s \, d\widetilde{W}_s$ for a unique adapted process $\phi$ . This is a representation theorem in $L^2$ : the space of square-integrable $\mathbb{Q}$ -martingales is isometric (via the Itô isometry) to the space of square-integrable adapted processes $\phi$ . The MRT says the projection of any $L^2$ functional of the Brownian path onto the subspace of stochastic integrals is surjective — every such functional is reachable. In pricing terms: the option payoff $\varphi(S_T)$ can be projected onto the subspace of self-financing portfolios (stochastic integrals), and the resulting $\phi_t$ is the delta hedge. The $L^2$ projection that is conditional expectation produces the option price; the $L^2$ projection that is MRT produces the hedge. Both are faces of the same Hilbert space geometry.