Setup
The central operation: solving
Almost every numerical routine in quantitative finance reduces to solving a linear system. Yield curve bootstrapping solves for discount factors. Greeks computed by bump-and-reval solve for a perturbed price. Calibration via Gauss-Newton solves the normal equations . Even generating correlated random variables for Monte Carlo requires factorising a covariance matrix.
The naïve approach — computing explicitly and then forming — is both slower and less numerically stable than direct factorisation. The standard alternatives are:
- LU factorisation with partial pivoting: general square , flops.
- Cholesky factorisation: symmetric positive definite , flops — twice as fast as LU, always stable without pivoting.
Why this matters on a derivatives desk. Cholesky factorisation of the covariance matrix is the backbone of correlated Monte Carlo simulation. If you call np.linalg.cholesky or LAPACK dpotrf in a pricing library, you are using Cholesky. If it throws an error ("matrix is not positive definite"), you have a broken covariance matrix — and you need to know why to fix it.
Assumptions
- is square. For LU: assume is non-singular (otherwise partial pivoting detects the singularity). For Cholesky: assume is symmetric positive definite (SPD).
- All arithmetic is in exact real arithmetic unless otherwise stated; floating-point effects are discussed in Limitations.
- Conventions: LU factorisation produces where is a permutation matrix, is unit lower triangular (diagonal entries all 1), and is upper triangular. Cholesky produces where is lower triangular with positive diagonal.
Theory
1. LU Factorisation
The idea is systematic Gaussian elimination: reduce to upper triangular form by applying elementary row operations, then record those operations as a lower triangular matrix .
Definition 1.1 (LU factorisation). An LU factorisation of is a decomposition where is unit lower triangular (lower triangular with 1s on the diagonal) and is upper triangular.
With partial pivoting: , where is a permutation matrix that reorders the rows of to place the largest element in each column on the diagonal before eliminating — improving numerical stability.
Algorithm (Gaussian elimination with partial pivoting):
At step (for ):
- Find the row with the largest absolute value in column : (partial pivot).
- Swap rows and in .
- For each row : compute the multiplier and subtract times row from row .
- Store in the lower triangular part of .
After steps, has been transformed to and the multipliers fill .
Theorem 1.2 (Existence of LU with partial pivoting). For any non-singular , there exists a permutation matrix such that with unit lower triangular, upper triangular, and for all . The factorisation is unique given .
Solving using LU:
Given , the system becomes . Solve in two triangular passes:
- Forward substitution: solve for — operations.
- Backward substitution: solve for — operations.
Total cost: for the factorisation (done once), then per right-hand side. For bootstrapping with multiple right-hand sides (e.g., different rate scenarios), reusing the factorisation is a significant efficiency gain.
Example 1.3 (2×2 LU by hand). Let .
Step 1: pivot is row 2 (|4| > |2|), so swap: .
Step 2: multiplier . Subtract row 1 from row 2: .
Result: , , .
Verify: ... or more directly: . ✓
2. Cholesky Factorisation
When is symmetric and positive definite, a more efficient and always-stable factorisation exists.
Definition 2.1 (Cholesky factorisation). The Cholesky factorisation of a symmetric positive definite matrix is where is lower triangular with strictly positive diagonal entries .
The factorisation is unique (given the positivity convention for the diagonal).
Algorithm (Cholesky-Banachiewicz):
For :
The square root is real and positive iff is SPD — this is both the algorithm and the constructive proof of existence.
Theorem 2.2 (Cholesky existence and uniqueness). A real symmetric matrix has a Cholesky factorisation with if and only if is positive definite. When it exists, the factorisation is unique.
Proof sketch. () If and has positive diagonal, then for any : (since invertible ). () If , the algorithm runs to completion without encountering a square root of a non-positive number — this follows from the Schur complement characterisation of positive definiteness.
Cholesky vs. LU: key differences.
| Feature | LU (partial pivoting) | Cholesky |
|---|---|---|
| Requirement | Non-singular | Symmetric positive definite |
| Flop count | ||
| Pivoting | Required for stability | Not needed — SPD guarantees stability |
| Diagonal entries | Can be negative | Strictly positive |
| Failure mode | Zero pivot (singular) | Complex square root (not SPD) |
3. Application: Correlated Monte Carlo Sampling
Given a covariance matrix (SPD), to generate samples from :
- Compute via Cholesky.
- Draw with i.i.d. entries.
- Set .
Then for large . Each column of is a sample from .
Example 3.1 (Bivariate correlated returns). Let assets 1 and 2 have equal volatility and correlation . The covariance matrix is
Cholesky: , , .
So , and correlated returns where i.i.d.
4. Solving Using Cholesky
For SPD systems, Cholesky gives the same two-pass strategy as LU:
Given :
- Forward substitution: solve — .
- Backward substitution: solve — .
This is the fastest general method for SPD systems, used in all production-grade libraries (LAPACK dpotrs, Eigen's LLT solver, Armadillo's solve).
Validation
The companion notebook verifies:
- LU correctness — factorises a matrix, checks to machine precision, solves and verifies the residual.
- Cholesky correctness — factorises a SPD matrix by the explicit formula, verifies .
- Correlated sampling — generates 100,000 bivariate normal samples from Example 3.1, verifies the empirical covariance converges to .
- Failure detection — confirms that a near-singular (rank-deficient) matrix causes Cholesky to fail, and shows the smallest eigenvalue is the signal.
By hand before opening the notebook. Let .
- Verify is positive definite: check all leading principal minors are positive. (. ✓)
- Compute the Cholesky factor by hand.
- Verify .
- Use to generate a correlated bivariate sample: if , what is ?
(Answer: , , . So and .)
Limitations
Cholesky failure signals a broken covariance matrix, not a software bug. If cholesky() throws "matrix is not positive definite", the covariance matrix has a non-positive eigenvalue. Common causes: (i) (too few observations — rank-deficient), (ii) perfect correlation between two assets, (iii) floating-point accumulation of small negative eigenvalues in a near-SPD matrix. Fix: compute the eigendecomposition, set negative eigenvalues to a small positive threshold , and reconstruct (covered in Module 3).
LU with partial pivoting is not always stable. Partial pivoting bounds the growth factor at in the worst case (Wilkinson's bound), which is catastrophically large for . In practice, growth factors are much smaller, and partial pivoting is reliable for well-conditioned problems. For ill-conditioned problems, condition number estimation (using LAPACK dgecon after factorisation) should precede any solve — if (double precision machine epsilon), the solution is meaningless.
Never compute explicitly. The operation x = inv(A) @ b requires flops for the inversion plus for the multiply, is less stable than a direct solve, and accumulates more rounding errors. Always use solve(A, b) which calls LU/Cholesky internally. Forming is justified only when you genuinely need all entries of the inverse — which is rare.
Scope limitations:
- Cholesky assumes exact SPD. For SPSD matrices (rank-deficient covariance), use the modified Cholesky or the pivoted Cholesky factorisation (which stops when the diagonal would become zero, producing a low-rank factor).
- These direct methods are and impractical for . For large sparse systems (e.g., PDE grids), iterative methods are required (Module 5).
Interview Angle
L1 (Junior quant / developer).
-
"What is Cholesky factorisation and when is it used?" — Expected: for SPD matrices; used for correlated Monte Carlo sampling and solving SPD linear systems (bootstrapping, normal equations). Mention it is twice as fast as LU.
-
"How do you generate samples from ?" — Expected: Cholesky , draw , return . Must understand why: .
-
"Why should you never compute the matrix inverse explicitly?" — Expected: slower ( flops), less stable, more rounding error than a direct solve. Use
solve(A, b)instead.
L2 (Senior quant).
-
"You receive a covariance matrix from a data team and
cholesky()fails. What are the possible causes and how do you fix each?" — Expected: (i) (rank-deficient — need regularisation or more data); (ii) perfect correlation (redundant asset — remove it); (iii) numerical noise producing tiny negative eigenvalues (project to SPD via eigendecomposition). Always diagnose by checking the spectrum first. -
"Explain the cost advantage of Cholesky over LU for solving the normal equations in Gauss-Newton calibration." — Expected: is symmetric PSD by construction. If it is SPD (full column rank of ), Cholesky costs vs. for LU — a factor of 2 saving. Additionally, Cholesky requires no pivoting, eliminating the overhead and numerical variability of partial pivoting.
-
"How does LU factorisation change when you have multiple right-hand sides ?" — Expected: factorise once (cost ), then for each do two triangular solves (cost each). Total: , vs. for independent solves. This is the correct pattern for multi-scenario risk calculations.
L3 (Researcher).
-
"In a risk model with assets and a factor structure , how do you solve efficiently without forming or factorising the full ?" — Expected: use the Woodbury matrix identity: . Since is diagonal ( to invert) and is with ( to factorise), the dominant cost is rather than .
-
"What does the Cholesky factorisation reveal about the structure of a covariance matrix, beyond what the eigendecomposition shows?" — Expected: Cholesky gives a triangular factor, which reveals a causal ordering: captures the residual dependence of asset on asset after conditioning on assets . This is exactly the Gram-Schmidt orthogonalisation of the assets' return vectors — the columns of are the orthogonalised risk factors. The eigendecomposition gives an orthogonal basis (principal components) instead, which mixes all assets symmetrically.