Linear AlgebraMatrix FactorisationNumerical MethodsMonte Carlo

LU and Cholesky Factorisation

Module 2 of 522 min readLevel: Medium

Setup

The central operation: solving Ax=bAx = b

Almost every numerical routine in quantitative finance reduces to solving a linear system. Yield curve bootstrapping solves Ax=bAx = b for discount factors. Greeks computed by bump-and-reval solve for a perturbed price. Calibration via Gauss-Newton solves the normal equations JJδ=JrJ^\top J \delta = -J^\top r. Even generating correlated random variables for Monte Carlo requires factorising a covariance matrix.

The naïve approach — computing A1A^{-1} explicitly and then forming A1bA^{-1}b — is both slower and less numerically stable than direct factorisation. The standard alternatives are:

  • LU factorisation with partial pivoting: general square AA, O(n3/3)O(n^3/3) flops.
  • Cholesky factorisation: symmetric positive definite AA, O(n3/6)O(n^3/6) flops — twice as fast as LU, always stable without pivoting.
INSIGHT

Why this matters on a derivatives desk. Cholesky factorisation of the covariance matrix Σ=LL\Sigma = LL^\top is the backbone of correlated Monte Carlo simulation. If you call np.linalg.cholesky or LAPACK dpotrf in a pricing library, you are using Cholesky. If it throws an error ("matrix is not positive definite"), you have a broken covariance matrix — and you need to know why to fix it.

Assumptions

  • ARn×nA \in \mathbb{R}^{n \times n} is square. For LU: assume AA is non-singular (otherwise partial pivoting detects the singularity). For Cholesky: assume AA is symmetric positive definite (SPD).
  • All arithmetic is in exact real arithmetic unless otherwise stated; floating-point effects are discussed in Limitations.
  • Conventions: LU factorisation produces PA=LUPA = LU where PP is a permutation matrix, LL is unit lower triangular (diagonal entries all 1), and UU is upper triangular. Cholesky produces A=LLA = LL^\top where LL is lower triangular with positive diagonal.

Theory

1. LU Factorisation

The idea is systematic Gaussian elimination: reduce AA to upper triangular form UU by applying elementary row operations, then record those operations as a lower triangular matrix LL.

DEFINITION

Definition 1.1 (LU factorisation). An LU factorisation of ARn×nA \in \mathbb{R}^{n \times n} is a decomposition A=LUA = LU where LL is unit lower triangular (lower triangular with 1s on the diagonal) and UU is upper triangular.

With partial pivoting: PA=LUPA = LU, where PP is a permutation matrix that reorders the rows of AA to place the largest element in each column on the diagonal before eliminating — improving numerical stability.

Algorithm (Gaussian elimination with partial pivoting):

At step kk (for k=1,,n1k = 1, \ldots, n-1):

  1. Find the row pkp \geq k with the largest absolute value in column kk: p=argmaxjkAjkp = \arg\max_{j \geq k} |A_{jk}| (partial pivot).
  2. Swap rows kk and pp in AA.
  3. For each row i>ki > k: compute the multiplier ik=Aik/Akk\ell_{ik} = A_{ik} / A_{kk} and subtract ik\ell_{ik} times row kk from row ii.
  4. Store ik\ell_{ik} in the lower triangular part of LL.

After n1n-1 steps, AA has been transformed to UU and the multipliers fill LL.

THEOREM

Theorem 1.2 (Existence of LU with partial pivoting). For any non-singular ARn×nA \in \mathbb{R}^{n \times n}, there exists a permutation matrix PP such that PA=LUPA = LU with LL unit lower triangular, UU upper triangular, and Ukk0U_{kk} \neq 0 for all kk. The factorisation is unique given PP.

Solving Ax=bAx = b using LU:

Given PA=LUPA = LU, the system Ax=bAx = b becomes LUx=PbLUx = Pb. Solve in two triangular passes:

  1. Forward substitution: solve Ly=PbLy = Pb for yyO(n2)O(n^2) operations.
  2. Backward substitution: solve Ux=yUx = y for xxO(n2)O(n^2) operations.

Total cost: O(n3/3)O(n^3/3) for the factorisation (done once), then O(n2)O(n^2) per right-hand side. For bootstrapping with multiple right-hand sides (e.g., different rate scenarios), reusing the factorisation is a significant efficiency gain.

EXAMPLE

Example 1.3 (2×2 LU by hand). Let A=(2143)A = \begin{pmatrix} 2 & 1 \\ 4 & 3 \end{pmatrix}.

Step 1: pivot is row 2 (|4| > |2|), so swap: A(4321)A \to \begin{pmatrix} 4 & 3 \\ 2 & 1 \end{pmatrix}.

Step 2: multiplier 21=2/4=1/2\ell_{21} = 2/4 = 1/2. Subtract 12×\frac{1}{2} \times row 1 from row 2: (2,1)12(4,3)=(0,1/2)(2, 1) - \frac{1}{2}(4, 3) = (0, -1/2).

Result: U=(4301/2)U = \begin{pmatrix} 4 & 3 \\ 0 & -1/2 \end{pmatrix}, L=(101/21)L = \begin{pmatrix} 1 & 0 \\ 1/2 & 1 \end{pmatrix}, P=(0110)P = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix}.

Verify: LU=(4323/2+(1/2))LU = \begin{pmatrix} 4 & 3 \\ 2 & 3/2 + (-1/2) \cdot \ldots \end{pmatrix}... or more directly: PA=(4321)=LUPA = \begin{pmatrix} 4 & 3 \\ 2 & 1 \end{pmatrix} = LU. ✓

2. Cholesky Factorisation

When AA is symmetric and positive definite, a more efficient and always-stable factorisation exists.

DEFINITION

Definition 2.1 (Cholesky factorisation). The Cholesky factorisation of a symmetric positive definite matrix ARn×nA \in \mathbb{R}^{n \times n} is A=LLA = LL^\top where LL is lower triangular with strictly positive diagonal entries Lii>0L_{ii} > 0.

The factorisation is unique (given the positivity convention for the diagonal).

Algorithm (Cholesky-Banachiewicz):

For j=1,,nj = 1, \ldots, n: Ljj=Ajjk=1j1Ljk2L_{jj} = \sqrt{A_{jj} - \sum_{k=1}^{j-1} L_{jk}^2} Lij=1Ljj(Aijk=1j1LikLjk)for i=j+1,,n.L_{ij} = \frac{1}{L_{jj}} \left( A_{ij} - \sum_{k=1}^{j-1} L_{ik} L_{jk} \right) \quad \text{for } i = j+1, \ldots, n.

The square root Ajjk=1j1Ljk2\sqrt{A_{jj} - \sum_{k=1}^{j-1} L_{jk}^2} is real and positive iff AA is SPD — this is both the algorithm and the constructive proof of existence.

THEOREM

Theorem 2.2 (Cholesky existence and uniqueness). A real symmetric matrix AA has a Cholesky factorisation A=LLA = LL^\top with Lii>0L_{ii} > 0 if and only if AA is positive definite. When it exists, the factorisation is unique.

Proof sketch. (\Rightarrow) If A=LLA = LL^\top and LL has positive diagonal, then for any x0x \neq 0: xAx=xLLx=Lx2>0x^\top A x = x^\top L L^\top x = \|L^\top x\|^2 > 0 (since LL invertible Lx0\Rightarrow L^\top x \neq 0). (\Leftarrow) If A0A \succ 0, the algorithm runs to completion without encountering a square root of a non-positive number — this follows from the Schur complement characterisation of positive definiteness. \square

REMARK

Cholesky vs. LU: key differences.

FeatureLU (partial pivoting)Cholesky
RequirementNon-singularSymmetric positive definite
Flop count2n3/3\sim 2n^3/3n3/3\sim n^3/3
PivotingRequired for stabilityNot needed — SPD guarantees stability
Diagonal entriesCan be negativeStrictly positive
Failure modeZero pivot (singular)Complex square root (not SPD)

3. Application: Correlated Monte Carlo Sampling

Given a covariance matrix ΣRn×n\Sigma \in \mathbb{R}^{n \times n} (SPD), to generate mm samples from N(0,Σ)\mathcal{N}(0, \Sigma):

  1. Compute Σ=LL\Sigma = LL^\top via Cholesky.
  2. Draw ZRn×mZ \in \mathbb{R}^{n \times m} with i.i.d. N(0,1)\mathcal{N}(0,1) entries.
  3. Set X=LZX = LZ.

Then E[XX/m]LIL=Σ\mathbb{E}[XX^\top / m] \approx L \cdot I \cdot L^\top = \Sigma for large mm. Each column of XX is a sample from N(0,Σ)\mathcal{N}(0, \Sigma).

EXAMPLE

Example 3.1 (Bivariate correlated returns). Let assets 1 and 2 have equal volatility σ=0.20\sigma = 0.20 and correlation ρ=0.60\rho = 0.60. The covariance matrix is Σ=(0.040.0240.0240.04).\Sigma = \begin{pmatrix} 0.04 & 0.024 \\ 0.024 & 0.04 \end{pmatrix}.

Cholesky: L11=0.04=0.2L_{11} = \sqrt{0.04} = 0.2, L21=0.024/0.2=0.12L_{21} = 0.024/0.2 = 0.12, L22=0.040.122=0.0256=0.16L_{22} = \sqrt{0.04 - 0.12^2} = \sqrt{0.0256} = 0.16.

So L=(0.200.120.16)L = \begin{pmatrix} 0.2 & 0 \\ 0.12 & 0.16 \end{pmatrix}, and correlated returns (r1,r2)=L(ε1,ε2)(r_1, r_2)^\top = L(\varepsilon_1, \varepsilon_2)^\top where εiN(0,1)\varepsilon_i \sim \mathcal{N}(0,1) i.i.d.

4. Solving Ax=bAx = b Using Cholesky

For SPD systems, Cholesky gives the same two-pass strategy as LU:

Given A=LLA = LL^\top:

  1. Forward substitution: solve Ly=bLy = bO(n2)O(n^2).
  2. Backward substitution: solve Lx=yL^\top x = yO(n2)O(n^2).

This is the fastest general method for SPD systems, used in all production-grade libraries (LAPACK dpotrs, Eigen's LLT solver, Armadillo's solve).


Validation

The companion notebook verifies:

  1. LU correctness — factorises a 4×44 \times 4 matrix, checks PA=LUPA = LU to machine precision, solves Ax=bAx = b and verifies the residual.
  2. Cholesky correctness — factorises a 3×33 \times 3 SPD matrix by the explicit formula, verifies LL=ALL^\top = A.
  3. Correlated sampling — generates 100,000 bivariate normal samples from Example 3.1, verifies the empirical covariance converges to Σ\Sigma.
  4. Failure detection — confirms that a near-singular (rank-deficient) matrix causes Cholesky to fail, and shows the smallest eigenvalue is the signal.
PRACTICE

By hand before opening the notebook. Let Σ=(4223)\Sigma = \begin{pmatrix} 4 & 2 \\ 2 & 3 \end{pmatrix}.

  1. Verify Σ\Sigma is positive definite: check all leading principal minors are positive. (det(Σ)=4×32×2=8>0\det(\Sigma) = 4 \times 3 - 2 \times 2 = 8 > 0. ✓)
  2. Compute the Cholesky factor LL by hand.
  3. Verify LL=ΣLL^\top = \Sigma.
  4. Use LL to generate a correlated bivariate sample: if ε=(1,1)\varepsilon = (1, -1)^\top, what is x=Lεx = L\varepsilon?

(Answer: L11=2L_{11} = 2, L21=1L_{21} = 1, L22=31=2L_{22} = \sqrt{3 - 1} = \sqrt{2}. So L=(2012)L = \begin{pmatrix} 2 & 0 \\ 1 & \sqrt{2} \end{pmatrix} and x=(2,12)x = (2, 1 - \sqrt{2})^\top.)


Limitations

WARNING

Cholesky failure signals a broken covariance matrix, not a software bug. If cholesky() throws "matrix is not positive definite", the covariance matrix has a non-positive eigenvalue. Common causes: (i) T<nT < n (too few observations — rank-deficient), (ii) perfect correlation between two assets, (iii) floating-point accumulation of small negative eigenvalues in a near-SPD matrix. Fix: compute the eigendecomposition, set negative eigenvalues to a small positive threshold ε>0\varepsilon > 0, and reconstruct (covered in Module 3).

WARNING

LU with partial pivoting is not always stable. Partial pivoting bounds the growth factor at 2n12^{n-1} in the worst case (Wilkinson's bound), which is catastrophically large for n=100n = 100. In practice, growth factors are much smaller, and partial pivoting is reliable for well-conditioned problems. For ill-conditioned problems, condition number estimation (using LAPACK dgecon after factorisation) should precede any solve — if κ(A)1016\kappa(A) \approx 10^{16} (double precision machine epsilon), the solution is meaningless.

WARNING

Never compute A1A^{-1} explicitly. The operation x = inv(A) @ b requires O(n3)O(n^3) flops for the inversion plus O(n2)O(n^2) for the multiply, is less stable than a direct solve, and accumulates more rounding errors. Always use solve(A, b) which calls LU/Cholesky internally. Forming A1A^{-1} is justified only when you genuinely need all n2n^2 entries of the inverse — which is rare.

Scope limitations:

  • Cholesky assumes exact SPD. For SPSD matrices (rank-deficient covariance), use the modified Cholesky or the pivoted Cholesky factorisation (which stops when the diagonal would become zero, producing a low-rank factor).
  • These direct methods are O(n3)O(n^3) and impractical for n>104n > 10^4. For large sparse systems (e.g., PDE grids), iterative methods are required (Module 5).

Interview Angle

PRACTICE

L1 (Junior quant / developer).

  1. "What is Cholesky factorisation and when is it used?" — Expected: A=LLA = LL^\top for SPD matrices; used for correlated Monte Carlo sampling and solving SPD linear systems (bootstrapping, normal equations). Mention it is twice as fast as LU.

  2. "How do you generate samples from N(0,Σ)\mathcal{N}(0, \Sigma)?" — Expected: Cholesky Σ=LL\Sigma = LL^\top, draw zN(0,I)z \sim \mathcal{N}(0, I), return LzLz. Must understand why: Cov(Lz)=LIL=Σ\text{Cov}(Lz) = L \cdot I \cdot L^\top = \Sigma.

  3. "Why should you never compute the matrix inverse explicitly?" — Expected: slower (O(n3)O(n^3) flops), less stable, more rounding error than a direct solve. Use solve(A, b) instead.

PRACTICE

L2 (Senior quant).

  1. "You receive a covariance matrix from a data team and cholesky() fails. What are the possible causes and how do you fix each?" — Expected: (i) T<nT < n (rank-deficient — need regularisation or more data); (ii) perfect correlation (redundant asset — remove it); (iii) numerical noise producing tiny negative eigenvalues (project to SPD via eigendecomposition). Always diagnose by checking the spectrum first.

  2. "Explain the cost advantage of Cholesky over LU for solving the normal equations JJδ=JrJ^\top J \delta = J^\top r in Gauss-Newton calibration." — Expected: JJJ^\top J is symmetric PSD by construction. If it is SPD (full column rank of JJ), Cholesky costs n3/3n^3/3 vs. 2n3/32n^3/3 for LU — a factor of 2 saving. Additionally, Cholesky requires no pivoting, eliminating the overhead and numerical variability of partial pivoting.

  3. "How does LU factorisation change when you have multiple right-hand sides b1,,bkb_1, \ldots, b_k?" — Expected: factorise once PA=LUPA = LU (cost O(n3/3)O(n^3/3)), then for each bib_i do two triangular solves (cost O(n2)O(n^2) each). Total: O(n3/3)+kO(n2)O(n^3/3) + k \cdot O(n^2), vs. O(kn3/3)O(k \cdot n^3/3) for kk independent solves. This is the correct pattern for multi-scenario risk calculations.

PRACTICE

L3 (Researcher).

  1. "In a risk model with n=500n = 500 assets and a factor structure Σ=BFB+D\Sigma = BFB^\top + D, how do you solve Σw=μ\Sigma w = \mu efficiently without forming or factorising the full 500×500500 \times 500 Σ\Sigma?" — Expected: use the Woodbury matrix identity: (Σ)1=D1D1B(F1+BD1B)1BD1(\Sigma)^{-1} = D^{-1} - D^{-1}B(F^{-1} + B^\top D^{-1} B)^{-1} B^\top D^{-1}. Since DD is diagonal (O(n)O(n) to invert) and FF is k×kk \times k with knk \ll n (O(k3)O(k^3) to factorise), the dominant cost is O(nk2)O(nk^2) rather than O(n3)O(n^3).

  2. "What does the Cholesky factorisation reveal about the structure of a covariance matrix, beyond what the eigendecomposition shows?" — Expected: Cholesky gives a triangular factor, which reveals a causal ordering: LijL_{ij} captures the residual dependence of asset ii on asset jj after conditioning on assets 1,,j11, \ldots, j-1. This is exactly the Gram-Schmidt orthogonalisation of the assets' return vectors — the columns of LL^\top are the orthogonalised risk factors. The eigendecomposition gives an orthogonal basis (principal components) instead, which mixes all assets symmetrically.