We work on (Ω,F,P) with filtration F=(Ft)t≥0 satisfying the usual conditions. Let (Wt)t≥0 be a standard P-Brownian motion.
An Itô process is a continuous adapted process of the form:
Xt=X0+∫0tμsds+∫0tσsdWs,
where μ (drift) and σ (diffusion) are F-progressively measurable and satisfy:
∫0T∣μs∣ds<∞and∫0Tσs2ds<∞a.s.
In differential notation: dXt=μtdt+σtdWt.
The Itô Integral
The stochastic integral ∫0TσsdWs is defined as the L2 limit of non-anticipating Riemann sums:
∫0TσsdWs=L2-lim∣π∣→0∑iσti−1(Wti−Wti−1).
The requirement to evaluate σ at the left endpointti−1 — not the midpoint — is what makes the integral Itô. Left-endpoint evaluation ensures σti−1 is Fti−1-measurable (non-anticipating). The choice of evaluation point matters: different conventions yield different integrals (see Itô vs Stratonovich below).
Itô isometry. For square-integrable adapted σ:
E[(∫0TσsdWs)2]=E[∫0Tσs2ds].
This is the key L2 norm identity. It follows from expanding the square and using independence of non-overlapping Brownian increments: cross terms E[σti−1(Wti−Wti−1)⋅σtj−1(Wtj−Wtj−1)]=0 for i=j.
A square-integrable adapted integrand produces a martingale:
E[∫0tσsdWsFr]=∫0rσsdWs,r≤t.
Itô's Lemma: Statement
Let f:R+×R→R be C1,2 (once continuously differentiable in t, twice in x). Define Yt=f(t,Xt). Then Y is again an Itô process and:
dYt=∂t∂f(t,Xt)dt+∂x∂f(t,Xt)dXt+21∂x2∂2f(t,Xt)(dXt)2,
where (dXt)2 is evaluated using the Itô multiplication table:
dt⋅dt=0,dt⋅dWt=0,dWt⋅dWt=dt.
Substituting dXt=μtdt+σtdWt and (dXt)2=σt2dt:
dYt=(∂t∂f+μt∂x∂f+21σt2∂x2∂2f)dt+σt∂x∂fdWt.
The term 21σt2fxx is the Itô correction. It has no analogue in classical calculus and arises solely from the non-zero quadratic variation of Brownian motion.
Derivation
Apply Taylor's theorem to f(t+dt,Xt+dt) around (t,Xt):
df=ftdt+fxdX+21fxx(dX)2+21ftt(dt)2+fxtdtdX+⋯
Substitute dX=μdt+σdW and expand (dX)2:
(dX)2=μ2(dt)2+2μσdtdW+σ2(dW)2.
Apply the Itô multiplication table. The key step: (dW)2=dt is the quadratic variation result [W]t=t in differential form. The other products vanish as o(dt):
(dX)2=σ2dt+O((dt)3/2).
Similarly, (dt)2=0 and dt⋅dW=0 in the L2 sense. Retaining only terms of order dt:
df=ftdt+fx(μdt+σdW)+21fxxσ2dt.
Collecting drift and diffusion terms:
df=(ft+μfx+21σ2fxx)dt+σfxdW.
This is Itô's lemma. The heuristic argument is exact in identifying the correct terms; the rigorous version replaces the Taylor remainder analysis with an L2 convergence argument using the Itô isometry.
Application: Solving the GBM SDE
Let St satisfy the geometric Brownian motion SDE:
dSt=μStdt+σStdWt.
Apply Itô's lemma to f(St)=lnSt, with fx=1/x, fxx=−1/x2, ft=0:
d(lnSt)=St1dSt−21⋅St21⋅σ2St2dt=(μ−2σ2)dt+σdWt.
Integrating from 0 to T:
lnST−lnS0=(μ−2σ2)T+σWT.
Therefore:
ST=S0exp((μ−2σ2)T+σWT).
The Role of the Itô Correction
The term −σ2/2 is not a typo. It is the Itô correction from fxx=−1/S2. Without it — if one naively wrote lnST=lnS0+μT+σWT — the expectation would be wrong:
E[ST]=S0eμT,correct from the SDE:dE[St]=μE[St]dt.
But E[e(μ+σ2/2)T+σWT]=e(μ+σ2/2)T⋅eσ2T/2=e(μ+σ2)T=eμT. The Itô correction −σ2/2 is what reconciles the arithmetic mean growth μ of the SDE with the geometric mean growth μ−σ2/2 of the log process.
Itô vs Stratonovich
The Stratonovich integral evaluates the integrand at the midpoint:
∫0Tσs∘dWs=L2-lim∣π∣→0∑i2σti+σti−1(Wti−Wti−1).
The Stratonovich integral obeys the classical chain rule — no correction term:
df(Wt)=f′(Wt)∘dWt.
The two integrals are related by:
∫0Tσs∘dWs=∫0TσsdWs+21∫0T∂x∂σ(Xs)σsds.
The Stratonovich convention is preferred in physics and differential geometry (because it obeys the classical chain rule and behaves well under coordinate changes). The Itô convention is standard in financial mathematics because:
The Itô integral of an adapted square-integrable process is a martingale — essential for risk-neutral pricing.
The Stratonovich integral anticipates future information through the midpoint evaluation, which is economically meaningless.
Multidimensional Itô Lemma
For a vector of Itô processes X=(X1,…,Xd) driven by correlated Brownian motions W1,…,Wm with d⟨Wi,Wj⟩t=ρijdt, and a function f∈C1,2:
df(t,Xt)=ftdt+∑i=1dfxidXti+21∑i,j=1dfxixjd⟨Xi,Xj⟩t.
Here d⟨Xi,Xj⟩t=∑k=1mσikσjkdt is the quadratic covariation, where σik is the diffusion coefficient of Xi with respect to Wk.
Limitations
Regularity. Itô's lemma requires f∈C1,2. For payoffs with kinks — the call payoff (x−K)+ has fxx=δx=K as a distribution — the formula breaks down. The correct generalization uses Tanaka's formula and local time.
Pathwise inapplicability. The Itô integral cannot be defined sample-path by sample-path: the paths of W have infinite total variation. The L2 construction is inherently probabilistic. All Itô calculus identities hold in the almost-sure or L2 sense, not pointwise in ω.
Itô correction in parameter estimation. If one observes a log-price process and estimates μ from the empirical mean of log-returns, the estimate targets μ−σ2/2, not μ. The distinction matters for maximum likelihood estimation of drift in GBM models.
Interview Angle
L1: State Itô's lemma. Apply it to f(St)=lnSt for dSt=μStdt+σStdWt. What is the economic meaning of the −σ2/2 term?
Statement. For f∈C1,2 and dXt=μtdt+σtdWt:
df(t,Xt)=(ft+μtfx+21σt2fxx)dt+σtfxdWt.
Application to lnSt. With f(x)=lnx: fx=1/x, fxx=−1/x2, ft=0, μt=μSt, σt=σSt:
d(lnSt)=St1⋅μStdt−21⋅St21⋅σ2St2dt+StσStdWt=(μ−2σ2)dt+σdWt.
Integrating: ST=S0exp((μ−σ2/2)T+σWT).
Economic meaning of −σ2/2. This is the volatility drag — the gap between arithmetic and geometric growth. The SDE dS=μSdt+σSdW implies E[ST]=S0eμT: the stock grows arithmetically at rate μ. But E[ln(ST/S0)]=(μ−σ2/2)T: log-returns grow at the lower rate μ−σ2/2. The gap σ2/2 arises from the convexity of exp (Jensen's inequality): because log-returns are normally distributed, the average of the exponent exceeds the exponent of the average by exactly σ2/2. In practice: a fund with annualised vol σ=20% suffers 2% drag annually — its compound growth rate is 2% below its average return. Practitioners who confuse arithmetic and geometric returns misprice long-horizon options and misreport expected performance.
L2: Derive Itô's lemma from a Taylor expansion. Why does (dW)2=dt in the Itô table? What is the Itô isometry, and why does it imply the stochastic integral is a martingale?
Derivation. Apply the second-order Taylor expansion to f(t+dt,Xt+dt) around (t,Xt):
df=ftdt+fxdX+21fxx(dX)2+O(dt3/2).
Expand (dX)2=(μdt+σdW)2=μ2(dt)2+2μσdt⋅dW+σ2(dW)2. Apply the multiplication table: (dt)2=0, dt⋅dW=0, and (dW)2=dt. Only the last term survives, giving (dX)2=σ2dt. Substituting:
df=ftdt+fx(μdt+σdW)+21fxxσ2dt=(ft+μfx+21σ2fxx)dt+σfxdW.□
Why (dW)2=dt. This is the differential form of [W]t=t. On a partition of mesh h, the squared increment (Wt+h−Wt)2 has mean h and variance 2h2. Summing over n=T/h intervals: the total has mean T and variance 2T2/n→0 as n→∞. The sum concentrates on T in L2 (not just in expectation) — the fluctuations vanish, and the quadratic variation is deterministically equal to t. In the Taylor expansion this means (dW)2 contributes a deterministic correction of size dt, not a random term: it is absorbed into the drift, not the martingale part.
Itô isometry. For square-integrable adapted σ:
E[(∫0TσsdWs)2]=E[∫0Tσs2ds].Proof. Expand the square of the Riemann-sum approximant ∑iσti−1ΔWi:
E[(∑iσti−1ΔWi)2]=∑i,jE[σti−1σtj−1ΔWiΔWj].
For i=j (say i<j): ΔWj is independent of Ftj−1, which contains σti−1,σtj−1,ΔWi. So E[σti−1σtj−1ΔWiΔWj]=E[σti−1σtj−1ΔWi]⋅E[ΔWj]=0. Only diagonal terms contribute: ∑iE[σti−12(ΔWi)2]=∑iE[σti−12]⋅h→E[∫0Tσs2ds].
Why the stochastic integral is a martingale. For an adapted square-integrable integrand, Mt=∫0tσsdWs is a martingale:
E[Mt∣Fr]=Mr,r≤t.
The key is the non-anticipating (left-endpoint) evaluation: σti−1 is Fti−1-measurable and hence independent of ΔWi=Wti−Wti−1. So E[σti−1ΔWi∣Fti−1]=σti−1E[ΔWi∣Fti−1]=0. Summing over future steps: E[Mt−Mr∣Fr]=0. This is why the Itô convention (left-endpoint) is essential for finance: the resulting stochastic integral represents the gains from a non-anticipating trading strategy, and the martingale property ensures no-arbitrage.
L3: Compare Itô and Stratonovich conventions. Why is the Stratonovich integral not a martingale? State the multidimensional Itô lemma and apply it to a two-factor stochastic volatility model where dS and dν are correlated.
Itô vs Stratonovich. The Itô integral evaluates the integrand at the left endpoint of each interval; the Stratonovich integral at the midpoint:
∫0Tσs∘dWs=L2-lim∑i2σti−1+σti(Wti−Wti−1).
The midpoint value 21(σti−1+σti) partially anticipates the future: σti is Fti-measurable, and there is a non-trivial correlation E[(σti−σti−1)(Wti−Wti−1)]=0 when σ is itself driven by W. Concretely, if dσ=adt+bdW, then:
E[2σti−1+σti(Wti−Wti−1)]≈21b⋅h=0.
So E[MtStrat−MrStrat∣Fr]=0: the conditional increment has a non-zero drift, and the Stratonovich integral is not a martingale. The conversion formula makes this precise:
∫0Tσs∘dWs=∫0TσsdWs+21∫0T∂x∂σ(Xs)σsds.
The correction term 21∫∂x∂σσds is a finite-variation process — it is the martingale-killing drift added by midpoint evaluation. Stratonovich is standard in physics (where it obeys the classical chain rule and is appropriate for ODEs perturbed by smooth noise) and differential geometry, but it is the wrong convention for financial modelling.
Multidimensional Itô lemma. For Itô processes X1,…,Xd driven by correlated Brownian motions W1,…,Wm with d⟨Wi,Wj⟩t=ρijdt, and f∈C1,2:
df(t,Xt)=ftdt+∑ifxidXti+21∑i,jfxixjd⟨Xi,Xj⟩t.
Application: Heston-type model. Let St (spot) and νt (instantaneous variance) satisfy:
dSt=μStdt+νtStdWt1,dνt=κ(νˉ−νt)dt+ξνtdWt2,d⟨W1,W2⟩t=ρdt.
The quadratic covariations are:
d⟨S,S⟩t=νtSt2dt,d⟨ν,ν⟩t=ξ2νtdt,d⟨S,ν⟩t=ρξνtStdt.
Applying the multidimensional Itô lemma to V(t,St,νt):
dV=drift[Vt+μSVS+κ(νˉ−ν)Vν+21νS2VSS+ρξνSVSν+21ξ2νVνν]dt+νSVSdW1+ξνVνdW2.
Under the risk-neutral measure Q, we replace μ→r and the two Brownian terms constitute the option's hedge portfolio. Setting the discounted option price to be a Q-martingale forces the drift to equal rV, giving the Heston PDE:
Vt+rSVS+κ(νˉ−ν)Vν+21νS2VSS+ρξνSVSν+21ξ2νVνν−rV=0.
Note the cross-derivative term ρξνSVSν: it is absent in Black-Scholes and directly encodes the vol-of-vol and spot-vol correlation that generates the implied vol skew.