Linear AlgebraSVDCondition NumbersPseudoinverseNumerical Stability

Singular Value Decomposition and Condition Numbers

Module 4 of 528 min readLevel: Hard

Setup

Every matrix — whether square or rectangular, rank-deficient or full-rank — admits a singular value decomposition (SVD). This is a strictly stronger result than eigendecomposition, which applies only to square matrices and fails for non-diagonalisable ones. The SVD is the canonical tool for understanding the geometry of a linear map and the sensitivity of the linear system Ax=bAx = b.

Where this lives on a desk. The SVD appears in three distinct quant workflows:

  1. Calibration and least-squares fitting. When you fit a volatility surface or calibrate a yield curve model, you solve an overdetermined system minθF(θ)market2\min_\theta \|F(\theta) - \text{market}\|^2. The normal equations AAθ=AbA^\top A\theta = A^\top b are solved via the pseudoinverse A+=VΣ+UA^+ = V\Sigma^+ U^\top — the SVD-based answer that minimises the residual with minimum-norm θ\theta.

  2. Risk factor models. Returns matrices XRT×nX \in \mathbb{R}^{T \times n} are decomposed into orthogonal factors. SVD of XX gives the principal components directly without forming XXX^\top X — numerically more stable when TT is small relative to nn.

  3. Numerical stability diagnostics. The condition number κ2(A)=σ1/σn\kappa_2(A) = \sigma_1 / \sigma_n quantifies how much the solution xx of Ax=bAx = b amplifies errors in bb. An ill-conditioned calibration problem signals that the model has redundant parameters or the data is insufficient to identify them separately.

Mathematical setting. Let ARm×nA \in \mathbb{R}^{m \times n} with mnm \geq n (the overdetermined case common in calibration). No symmetry assumption. r=rank(A)min(m,n)r = \text{rank}(A) \leq \min(m, n).

Notation. σ1σ2σr>0=σr+1==σn\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0 = \sigma_{r+1} = \cdots = \sigma_n are the singular values of AA. Subscripts follow the usual convention: σ1\sigma_1 is the largest (spectral norm of AA).

INSIGHT

Financial Insight. SVD makes the geometry of the calibration problem explicit: UU rotates the output space (market observables), VV rotates the input space (model parameters), and Σ\Sigma stretches/compresses each independent direction by its singular value. Directions with σi0\sigma_i \approx 0 are directions in parameter space that barely affect market observables — the model is near-unidentifiable in those directions.


Theory

1. The Singular Value Decomposition

THEOREM

Theorem 4.1 (SVD Existence). For any ARm×nA \in \mathbb{R}^{m \times n} there exist orthogonal matrices URm×mU \in \mathbb{R}^{m \times m} and VRn×nV \in \mathbb{R}^{n \times n}, and a matrix ΣRm×n\Sigma \in \mathbb{R}^{m \times n} with Σii=σi0\Sigma_{ii} = \sigma_i \geq 0 and Σij=0\Sigma_{ij} = 0 for iji \neq j, such that A=UΣV.A = U \Sigma V^\top. The diagonal entries of Σ\Sigma, taken in non-increasing order σ1σ20\sigma_1 \geq \sigma_2 \geq \cdots \geq 0, are unique and called the singular values of AA.

Derivation. The key is to relate singular values to eigenvalues of the symmetric matrices AAA^\top A and AAA A^\top.

Since AARn×nA^\top A \in \mathbb{R}^{n \times n} is symmetric positive semi-definite (SPSD), the Spectral Theorem (Module 3) guarantees AA=VΛV,Λ=diag(λ1,,λn),λi0,V orthogonal.A^\top A = V \Lambda V^\top, \quad \Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n), \quad \lambda_i \geq 0, \quad V \text{ orthogonal.}

Define σi=λi\sigma_i = \sqrt{\lambda_i} for i=1,,ri = 1, \ldots, r (the non-zero eigenvalues). For iri \leq r, set ui=1σiAviRm.u_i = \frac{1}{\sigma_i} A v_i \in \mathbb{R}^m.

These are orthonormal: for iji \neq j, ui,uj=1σiσjviAAvj=1σiσjviVΛVvj=λjσiσjδij=0.\langle u_i, u_j \rangle = \frac{1}{\sigma_i \sigma_j} v_i^\top A^\top A v_j = \frac{1}{\sigma_i \sigma_j} v_i^\top V \Lambda V^\top v_j = \frac{\lambda_j}{\sigma_i \sigma_j} \delta_{ij} = 0.

Extend {u1,,ur}\{u_1, \ldots, u_r\} to an orthonormal basis {u1,,um}\{u_1, \ldots, u_m\} for Rm\mathbb{R}^m. Then A=UΣVA = U \Sigma V^\top holds by construction.

Note: σi2=λi(AA)=λi(AA)\sigma_i^2 = \lambda_i(A^\top A) = \lambda_i(A A^\top) for imin(m,n)i \leq \min(m,n) — the non-zero eigenvalues of AAA^\top A and AAA A^\top coincide.

DEFINITION

Definition 4.1 (Thin SVD). The thin (economy) SVD retains only the rr non-zero singular values: A=UrΣrVr,A = U_r \Sigma_r V_r^\top, where UrRm×rU_r \in \mathbb{R}^{m \times r}, Σr=diag(σ1,,σr)Rr×r\Sigma_r = \text{diag}(\sigma_1, \ldots, \sigma_r) \in \mathbb{R}^{r \times r}, VrRn×rV_r \in \mathbb{R}^{n \times r}. For mrm \gg r this is far cheaper to compute and store than the full SVD.

2. Geometric Interpretation

The decomposition A=UΣVA = U \Sigma V^\top factors the action of AA into three steps:

  1. VV^\top: rotate the input vector xx into the right singular vector basis.
  2. Σ\Sigma: independently scale each coordinate by σi\sigma_i (and discard the null space).
  3. UU: rotate the stretched vector into the output (column) space.
INSIGHT

Financial Insight. Think of VV^\top as transforming raw model parameters into uncorrelated "eigen-parameters". Σ\Sigma tells you how sensitively observable prices respond to each eigen-parameter. Eigen-parameters with near-zero singular values are unobservable from the data — the calibration is degenerate along those directions.

3. The Four Fundamental Subspaces via SVD

The SVD gives explicit orthonormal bases for all four subspaces first seen in Module 1.

THEOREM

Theorem 4.2 (Fundamental Subspaces). Let A=UΣVA = U \Sigma V^\top with rank rr. Then:

  • Column space (image): col(A)=span(u1,,ur)\text{col}(A) = \text{span}(u_1, \ldots, u_r) — first rr left singular vectors.
  • Left null space: null(A)=span(ur+1,,um)\text{null}(A^\top) = \text{span}(u_{r+1}, \ldots, u_m) — last mrm - r left singular vectors.
  • Row space: row(A)=span(v1,,vr)\text{row}(A) = \text{span}(v_1, \ldots, v_r) — first rr right singular vectors.
  • Null space: null(A)=span(vr+1,,vn)\text{null}(A) = \text{span}(v_{r+1}, \ldots, v_n) — last nrn - r right singular vectors.

Proof sketch. Avi=σiuiAv_i = \sigma_i u_i for iri \leq r (so virow(A)v_i \in \text{row}(A), uicol(A)u_i \in \text{col}(A)), and Avi=0Av_i = 0 for i>ri > r (so vinull(A)v_i \in \text{null}(A)). The orthogonality of UU and VV gives the subspace dimensions. \square

4. The Moore-Penrose Pseudoinverse

When Ax=bAx = b is overdetermined (m>nm > n, more equations than unknowns) the system generally has no exact solution. The least-squares solution minimising Axb2\|Ax - b\|_2 is:

DEFINITION

Definition 4.2 (Pseudoinverse). The Moore-Penrose pseudoinverse of AA is A+=VΣ+U,A^+ = V \Sigma^+ U^\top, where Σ+\Sigma^+ replaces each non-zero σi\sigma_i by 1/σi1/\sigma_i and leaves zero entries as zero.

Why this solves the LS problem. For A=UΣVA = U\Sigma V^\top with full column rank (r=nr = n): A+=(AA)1A=VΣ1Un.A^+ = (A^\top A)^{-1} A^\top = V \Sigma^{-1} U_n^\top. This is the same formula as the normal equations, but computed via SVD — numerically stable even when AAA^\top A is ill-conditioned.

Among all minimisers, x=A+bx^* = A^+ b is the one with smallest norm x2\|x\|_2 — relevant when the system has a null space and you want the minimum-norm parameter vector.

EXAMPLE

Example 4.1 (Surface calibration as LS). Fitting n=15n = 15 SABR parameters to m=60m = 60 market option prices gives an overdetermined system. The pseudoinverse θ^=A+bmarket\hat\theta = A^+ b_\text{market} finds the minimum-residual parameter set. If the condition number κ2(A)108\kappa_2(A) \approx 10^8, then a 10410^{-4} relative error in market quotes translates to a 10410^4 relative error in the inferred parameters — the calibration is numerically singular and requires regularisation.

5. Condition Number and Perturbation Theory

DEFINITION

Definition 4.3 (Condition number). The 2-norm condition number of ARn×nA \in \mathbb{R}^{n \times n} invertible is κ2(A)=A2A12=σ1σn.\kappa_2(A) = \|A\|_2 \|A^{-1}\|_2 = \frac{\sigma_1}{\sigma_n}. For a rectangular AA (or rank-deficient), use κ2(A)=σ1/σr\kappa_2(A) = \sigma_1 / \sigma_r where σr\sigma_r is the smallest non-zero singular value.

Why it matters. Consider Ax=bAx = b. If bb is perturbed by δb\delta b (market quote error, floating-point rounding), the perturbed solution x^\hat x satisfies:

THEOREM

Theorem 4.3 (Perturbation bound). Let Ax^=b+δbA\hat x = b + \delta b. Then δx2x2κ2(A)δb2b2.\frac{\|\delta x\|_2}{\|x\|_2} \leq \kappa_2(A) \cdot \frac{\|\delta b\|_2}{\|b\|_2}.

Proof. δx=A1δb\delta x = A^{-1} \delta b, so δxA1δb=σn1δb\|\delta x\| \leq \|A^{-1}\| \|\delta b\| = \sigma_n^{-1} \|\delta b\|. Combine with bAx=σ1x\|b\| \leq \|A\| \|x\| = \sigma_1 \|x\|. \square

REMARK

Remark. The condition number is a worst-case amplification factor. In practice, errors in bb are not aligned with the worst-case direction (the left singular vector unu_n), so the actual amplification is often much smaller. However, κ2(A)\kappa_2(A) is the right diagnostic for "is this problem well-posed?"

Intuition for κ2\kappa_2 values:

κ2(A)\kappa_2(A)Interpretation
1\approx 1Well-conditioned; errors not amplified
10310^3Machine precision 1016\approx 10^{-16} → solution accurate to 101310^{-13}
10810^8Solution may have only 8 significant figures
1016\geq 10^{16}Numerically singular at double precision

6. Low-Rank Approximation (Eckart-Young Theorem)

The spectral truncation from Module 3 extends naturally to non-symmetric matrices via SVD.

THEOREM

Theorem 4.4 (Eckart-Young, 1936). Among all matrices of rank at most kk, the best approximation to AA in both the spectral norm 2\|\cdot\|_2 and the Frobenius norm F\|\cdot\|_F is Ak=i=1kσiuivi.A_k = \sum_{i=1}^k \sigma_i u_i v_i^\top. The approximation errors are: AAk2=σk+1,AAkF=i=k+1rσi2.\|A - A_k\|_2 = \sigma_{k+1}, \qquad \|A - A_k\|_F = \sqrt{\sum_{i=k+1}^r \sigma_i^2}.

Comparison with eigendecomposition (Module 3). For symmetric AA, singular values equal absolute values of eigenvalues: σi=λi\sigma_i = |\lambda_i|. The Eckart-Young theorem for the symmetric case used i>kλi2\sqrt{\sum_{i>k} \lambda_i^2} — identical to the SVD formula when eigenvalues are non-negative.

EXAMPLE

Example 4.2 (Returns matrix compression). A returns matrix XR252×100X \in \mathbb{R}^{252 \times 100} (252 daily returns, 100 assets) has rank 100. The rank-kk SVD approximation XkX_k captures the kk dominant risk factors. The ratio i=1kσi2/i=1100σi2\sum_{i=1}^k \sigma_i^2 / \sum_{i=1}^{100} \sigma_i^2 measures the fraction of the total Frobenius-norm squared (proportional to total variance) explained by the first kk factors — equivalent to the PCA variance explained ratio from Module 3.

7. Regularisation and Truncated SVD

When κ2(A)\kappa_2(A) is large, the naive pseudoinverse A+=VΣ+UA^+ = V\Sigma^+ U^\top amplifies noise in bb. Two standard remedies:

Truncated SVD (TSVD). Retain only the kk largest singular values; set σk+1+==σn+=0\sigma_{k+1}^+ = \cdots = \sigma_n^+ = 0: xkTSVD=i=1kuibσivi.x_k^{\text{TSVD}} = \sum_{i=1}^k \frac{u_i^\top b}{\sigma_i} v_i. Bias increases but variance (noise amplification) decreases as kk decreases.

Tikhonov regularisation. Solve minxAxb22+λx22\min_x \|Ax - b\|_2^2 + \lambda \|x\|_2^2, which has the closed-form solution: xλTikh=(AA+λI)1Ab=i=1nσiσi2+λ(uib)vi.x_\lambda^{\text{Tikh}} = (A^\top A + \lambda I)^{-1} A^\top b = \sum_{i=1}^n \frac{\sigma_i}{\sigma_i^2 + \lambda} (u_i^\top b) v_i. This smoothly down-weights directions with σiλ\sigma_i \ll \sqrt{\lambda} rather than hard-truncating them.

WARNING

Warning. Choosing λ\lambda (or kk) is a model selection problem, not a linear algebra problem. Too small: noise amplified. Too large: the solution is biased toward zero. Cross-validation or L-curve methods are required. On a calibration desk, the regularisation parameter often encodes prior information about parameter magnitude — equivalent to a Bayesian prior.


Validation

The companion notebook verifies:

  1. SVD factorisation: AUΣVF<εmachine\|A - U\Sigma V^\top\|_F < \varepsilon_\text{machine} for a random 5×35 \times 3 matrix.
  2. Orthogonality: UUI\|U^\top U - I\|, VVI\|V^\top V - I\| both near zero.
  3. Four subspaces: Avi=σiuiAv_i = \sigma_i u_i for iri \leq r; Avi0Av_i \approx 0 for i>ri > r.
  4. Pseudoinverse: A+bA^+ b minimises Axb\|Ax - b\| over all xx; verified by perturbing from A+bA^+b and checking residual increases.
  5. Condition number: Hilbert matrix HnH_n with Hij=1/(i+j1)H_{ij} = 1/(i+j-1) is notoriously ill-conditioned; κ2(Hn)\kappa_2(H_n) grows exponentially with nn.
  6. Eckart-Young error: AAkF=i>kσi2\|A - A_k\|_F = \sqrt{\sum_{i>k} \sigma_i^2} checked against direct computation.
PRACTICE

Hand exercise (before running the notebook). Let A=(300200)A = \begin{pmatrix} 3 & 0 \\ 0 & 2 \\ 0 & 0 \end{pmatrix}. By inspection: (a) What are the singular values of AA? (b) What is AAA^\top A? What are its eigenvalues? (c) Write down UU, Σ\Sigma, VV^\top explicitly. (d) What is A+A^+? Verify A+A=I2A^+ A = I_2. (Answer: σ1=3\sigma_1 = 3, σ2=2\sigma_2 = 2; AA=diag(9,4)A^\top A = \text{diag}(9, 4); U=[e1e2e3]U = [e_1 \, e_2 \, e_3], Σ=(300200)\Sigma = \begin{pmatrix}3&0\\0&2\\0&0\end{pmatrix}, V=I2V = I_2; A+=(1/30001/20)A^+ = \begin{pmatrix}1/3&0&0\\0&1/2&0\end{pmatrix}.)


Limitations

WARNING

Warning: SVD cost vs. eigendecomposition. Full SVD of ARm×nA \in \mathbb{R}^{m \times n} costs O(min(m,n)2max(m,n))O(\min(m,n)^2 \max(m,n)) flops. For large, square symmetric matrices (covariance matrices in risk), eigh (which exploits symmetry) is faster. Use SVD when AA is rectangular, not symmetric, or when you need the pseudoinverse explicitly.

WARNING

Warning: condition number is a worst-case bound. κ2(A)=108\kappa_2(A) = 10^8 does not mean the solution is wrong by a factor of 10810^8 — only that it could be. If the right-hand side bb is nearly orthogonal to the worst-case left singular vector unu_n, the actual error is much smaller. But in calibration with many market quotes, the error vector can project onto any direction, so worst-case is relevant.

WARNING

Warning: truncated SVD vs. Tikhonov at the boundary. TSVD is optimal when the signal and noise occupy disjoint singular value subspaces (rare in practice). Tikhonov is optimal when noise is Gaussian and the prior on xx is Gaussian. In the common case where neither holds exactly, both are approximations and the choice of threshold / λ\lambda dominates the solution quality.

Model failure modes:

  • Near-degenerate singular values: like near-equal eigenvalues (Module 3), the singular vector basis is unstable. A small perturbation in AA rotates viv_i and vjv_j arbitrarily if σiσj\sigma_i \approx \sigma_j.
  • Rank determination: floating-point σi\sigma_i are never exactly zero. A threshold τ\tau must be chosen to determine effective rank. The common choice τ=max(m,n)εmachineσ1\tau = \max(m,n) \cdot \varepsilon_\text{machine} \cdot \sigma_1 is the numpy.linalg.matrix_rank default.
  • Double-precision overflow: for matrices with σ110150\sigma_1 \approx 10^{150}, intermediate computations in AAA^\top A overflow. Work with the matrix AA directly via scipy.linalg.lstsq rather than forming AAA^\top A explicitly.

Interview Angle

PRACTICE

L1 (Junior Quant Developer / Junior Quant).

Expected depth: definitions, the formula, one worked example.

  1. "What is the singular value decomposition? How does it differ from eigendecomposition?" Answer: SVD works for any matrix A=UΣVA = U\Sigma V^\top (rectangular, non-symmetric). Eigendecomposition requires a square matrix and may fail (no full eigenbasis). For symmetric AA, singular values = |eigenvalues|, U=V=QU = V = Q from the spectral theorem.

  2. "What does the condition number tell you?" Answer: κ2(A)=σ1/σn\kappa_2(A) = \sigma_1/\sigma_n; it bounds how much relative errors in bb are amplified in the solution to Ax=bAx = b. For calibration, a large condition number signals near-unidentifiability of some parameters.

  3. "How do you compute the pseudoinverse? When do you need it?" Answer: A+=VΣ+UA^+ = V\Sigma^+ U^\top. Needed for overdetermined LS (calibration, regression), rank-deficient systems (degenerate covariance), and minimum-norm solutions.

PRACTICE

L2 (Senior Quant / Quant Researcher).

Expected depth: derivation, connection to LS, condition number perturbation theory.

  1. "Derive the pseudoinverse from first principles and prove it solves the LS problem." Answer: Start from A=UΣVA = U\Sigma V^\top. LS minimises Axb2=UΣVxb2=Σyb~2\|Ax - b\|^2 = \|U\Sigma V^\top x - b\|^2 = \|\Sigma y - \tilde b\|^2 where y=Vxy = V^\top x, b~=Ub\tilde b = U^\top b. Minimise: set yi=b~i/σiy_i = \tilde b_i / \sigma_i for iri \leq r, yi=0y_i = 0 for i>ri > r (minimum norm). Back-substitute: x=Vy=VΣ+Ub=A+bx^* = Vy = V\Sigma^+ U^\top b = A^+b.

  2. "A calibration of 20 SABR parameters to 80 market prices gives condition number 101010^{10}. What is the implication and how do you fix it?" Answer: A 10610^{-6} relative bid-ask spread translates to 10410^4 relative error in some parameter directions — numerically meaningless. Fix: (i) identify near-zero singular values (those below threshold τ\tau); (ii) either truncate (TSVD) or regularise (Tikhonov with λ\lambda chosen by L-curve or cross-validation); (iii) consider reducing model complexity — the ill-conditioning signals over-parameterisation.

PRACTICE

L3 (Quant Researcher / Model Risk).

Expected depth: original analysis, regularisation theory, model risk implications.

  1. "Compare TSVD and Tikhonov regularisation. Under what data-generating processes is each optimal, and how would you choose between them for vol surface calibration?" Answer: TSVD is optimal when the signal lives in a low-dimensional subspace (top kk singular directions) and noise is in the complement. Tikhonov is the minimum-variance unbiased estimator under Gaussian prior on xx and Gaussian noise on bb. For vol surfaces, neither assumption holds strictly: skew structure creates a soft separation. Practical choice: use Tikhonov with λ\lambda determined by GCV (generalised cross-validation) or the L-curve. TSVD is preferable when interpretability matters (clear separation of priced vs. unpriced risk factors).

  2. "How does the SVD of the forward sensitivity matrix F/θ\partial F / \partial \theta inform model risk for a structured product?" Answer: Singular values of the Jacobian measure how sensitively each market observable responds to each parameter direction. Directions with σi0\sigma_i \approx 0 are parameter combinations that cannot be identified from the hedging instruments. These directions represent model risk: their contribution to the price is not hedged. Monitoring these via daily re-calibration SVD is a model risk control. If a previously identifiable singular direction becomes near-zero, it signals either a regime change or a model breakdown.