Setup
Context and Assumptions
Factor models decompose asset returns into systematic exposures to common risk factors and an idiosyncratic residual. They are used on every quant equity desk — for risk attribution, portfolio construction, and alpha signal design.
The central question factor models answer is: why do different assets earn different expected returns? The answer, in all factor frameworks, is compensation for bearing systematic risk that cannot be diversified away.
Notation throughout. Let:
- = excess return of asset at time (return minus risk-free rate )
- = unconditional expected excess return of asset
- = factor loading (sensitivity) of asset to factor
- = risk premium for factor (expected excess return per unit of factor exposure)
- = idiosyncratic return; , uncorrelated with factors
Key assumptions that vary by model are stated in each section.
Theory
1. CAPM: Capital Asset Pricing Model
Assumptions.
- Investors are mean-variance optimisers (Markowitz 1952) with identical beliefs.
- All assets are tradeable; no transaction costs, taxes, or short-selling constraints.
- Returns are jointly normally distributed (or investors have quadratic utility).
- A risk-free asset exists, lendable and borrowable at rate .
- All investors have the same investment horizon.
Under these assumptions, every investor holds the same risky portfolio — the market portfolio , which in equilibrium is the value-weighted portfolio of all risky assets.
Derivation of the SML. Consider any asset . Form a portfolio with weight in asset and in the market portfolio. Expected excess return and variance:
In equilibrium, asset is already in the market portfolio, so the efficient portfolio locus through must be tangent to the Capital Market Line at . Computing at and equating to the Sharpe ratio of the market gives:
This is the Security Market Line (SML). Assets plot on the SML in equilibrium; deviations from it are alphas — excess returns not explained by market beta.
Time-series regression form (Jensen 1968):
Under CAPM, for all assets in equilibrium.
2. APT: Arbitrage Pricing Theory
Assumptions (Ross 1976).
- Returns are generated by a K-factor linear model: where are zero-mean factor realisations and are idiosyncratic, mutually uncorrelated, with bounded variance.
- There are sufficiently many assets to form well-diversified portfolios.
- No-arbitrage: no portfolio with zero cost, zero systematic risk, and positive expected return.
Result. In a no-arbitrage economy, expected returns satisfy (approximately):
where (zero-beta return) and is the risk premium for factor . The APT does not specify which factors matter — it only says that if factors explain covariance structure, their premia must exist to preclude arbitrage.
CAPM is a special case of APT with and , plus the equilibrium assumptions that fix .
3. Fama-French Three-Factor Model
Motivation. Fama and French (1992, 1993) documented that CAPM beta does not fully explain the cross-section of expected returns. Two anomalies survive controlling for market beta:
- Size effect: small-cap stocks earn higher average returns than large-cap.
- Value effect: stocks with high book-to-market (B/M) ratio earn higher average returns than growth stocks.
Factor construction (Fama-French 1993).
Let (Small Minus Big) and (High Minus Low) be zero-cost long-short factor portfolios constructed monthly:
- SMB: long bottom 50% of stocks by market cap, short top 50%.
- HML: long top 30% of stocks by B/M, short bottom 30%; sort within size buckets to control for size.
The three-factor model:
Under the model, in equilibrium. Empirically, is close to zero for most equity portfolios but non-zero for momentum strategies — motivating Carhart (1997) to add a momentum factor .
Fama-MacBeth two-pass regression. The canonical approach to estimating factor premia in the cross-section:
- First pass (time-series): For each asset , regress on factors to estimate .
- Second pass (cross-section): At each time , regress on estimated betas: Average the cross-sectional estimates: . Standard errors use the time-series standard deviation of , which is robust to cross-sectional heteroskedasticity (but not to estimation error in betas — the Shanken correction applies).
Implementation
"""
Factor model estimation: CAPM and Fama-French three-factor.
Assumptions:
- Returns are in excess of the risk-free rate (monthly frequency)
- Factor data follows Ken French's construction conventions
- All returns and factors are in decimal form (not percent)
"""
from __future__ import annotations
import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.api as sm
from typing import NamedTuple
class FactorRegressionResult(NamedTuple):
alpha: float # annualised intercept
alpha_tstat: float # t-statistic on alpha
betas: dict[str, float]
r_squared: float
residual_std: float # annualised idiosyncratic vol
def capm_regression(
excess_returns: pd.Series,
market_excess_return: pd.Series,
) -> FactorRegressionResult:
"""
Estimate CAPM beta via OLS. Both series must be monthly excess returns.
Alpha is annualised (multiplied by 12).
"""
X = sm.add_constant(market_excess_return.rename("MKT"))
model = sm.OLS(excess_returns, X).fit()
alpha_monthly = model.params["const"]
alpha_tstat = model.tvalues["const"]
return FactorRegressionResult(
alpha=alpha_monthly * 12,
alpha_tstat=alpha_tstat,
betas={"MKT": model.params["MKT"]},
r_squared=model.rsquared,
residual_std=model.resid.std() * np.sqrt(12),
)
def fama_french_regression(
excess_returns: pd.Series,
factors: pd.DataFrame, # columns: MKT, SMB, HML (monthly excess returns)
) -> FactorRegressionResult:
"""
Estimate Fama-French three-factor model via OLS.
factors must contain columns ['MKT', 'SMB', 'HML'].
"""
required = {"MKT", "SMB", "HML"}
if not required.issubset(factors.columns):
raise ValueError(f"factors must contain columns {required}")
X = sm.add_constant(factors[["MKT", "SMB", "HML"]])
model = sm.OLS(excess_returns, X).fit()
alpha_monthly = model.params["const"]
alpha_tstat = model.tvalues["const"]
return FactorRegressionResult(
alpha=alpha_monthly * 12,
alpha_tstat=alpha_tstat,
betas={k: model.params[k] for k in ["MKT", "SMB", "HML"]},
r_squared=model.rsquared,
residual_std=model.resid.std() * np.sqrt(12),
)
def fama_macbeth(
returns: pd.DataFrame, # T x N matrix of monthly excess returns
factors: pd.DataFrame, # T x K matrix of factor returns
shanken_correction: bool = True,
) -> pd.DataFrame:
"""
Fama-MacBeth two-pass cross-sectional regression.
Pass 1: time-series OLS per asset → beta estimates.
Pass 2: cross-sectional OLS at each t → gamma_t estimates.
Returns DataFrame with columns: lambda (mean gamma), t_stat, std_error.
Shanken (1992) correction adjusts SE for errors-in-variables from beta estimation.
"""
T, N = returns.shape
K = factors.shape[1]
factor_names = list(factors.columns)
# --- Pass 1: estimate betas ---
X_ts = sm.add_constant(factors)
betas = np.zeros((N, K)) # N assets x K factors
for i, asset in enumerate(returns.columns):
res = sm.OLS(returns[asset], X_ts).fit()
betas[i] = [res.params[k] for k in factor_names]
# --- Pass 2: cross-sectional regression at each t ---
gammas = np.zeros((T, K + 1)) # intercept + K factor premia
for t in range(T):
y_t = returns.iloc[t].values
X_cs = sm.add_constant(betas)
res_t = sm.OLS(y_t, X_cs).fit()
gammas[t] = res_t.params
lambda_hat = gammas.mean(axis=0)
se_raw = gammas.std(axis=0, ddof=1) / np.sqrt(T)
if shanken_correction:
# Shanken (1992): inflate SE by (1 + lambda' Sigma_F^{-1} lambda)
sigma_f = np.cov(factors.values.T, ddof=1)
lam_k = lambda_hat[1:] # factor premia only
c = 1.0 + lam_k @ np.linalg.inv(sigma_f) @ lam_k
se_raw[1:] *= np.sqrt(c)
t_stats = lambda_hat / se_raw
index = ["intercept"] + factor_names
return pd.DataFrame({
"lambda": lambda_hat,
"t_stat": t_stats,
"std_error": se_raw,
}, index=index)
Validation
CAPM. A correct implementation satisfies:
- For the market portfolio itself: , , (by construction).
- For the risk-free asset: , , .
Fama-French. Using monthly data from Ken French's data library (publicly available at mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html):
- SMB average monthly return ≈ +0.20% (1963–2022); HML ≈ +0.37%.
- Small-cap value portfolios load positively on both SMB and HML; large-cap growth loads negatively.
- The three-factor model explains for most size/B-M sorted decile portfolios, versus for CAPM alone.
Fama-MacBeth. Applying to 25 Fama-French size/B-M portfolios over 1963–2022 should yield:
- Market premium per month (t-stat ≈ 2.0).
- SMB premium (t-stat ≈ 1.8).
- HML premium (t-stat ≈ 2.3).
Limitations
CAPM: Empirically Rejected
CAPM assumes homogeneous beliefs, no frictions, and a mean-variance world. In practice:
- The SML is too flat: low-beta stocks earn higher risk-adjusted returns than CAPM predicts (Black, Jensen, Scholes 1972); high-beta stocks underperform. This underlies the low-volatility anomaly.
- The true market portfolio is unobservable (Roll's critique, 1977): the value-weighted equity index is not the market portfolio (it excludes human capital, real estate, private equity, foreign assets). CAPM is therefore not testable in principle.
Fama-French: The Factor Zoo
Harvey, Liu, and Zhu (2016) documented over 300 claimed cross-sectional return predictors. Most are likely false discoveries:
- Data snooping: factors found in-sample lose significance out-of-sample.
- Non-stationarity: the HML value premium has been weak since 2007; the size premium has been disputed.
- Structural explanation: Fama and French view the factors as risk premia; others (behavioural finance) attribute them to mispricing. The distinction matters for whether premia persist.
Estimation Errors in Betas
OLS beta estimates are noisy, especially from short time-series. The Shanken (1992) errors-in-variables correction partially addresses this, but shrinkage estimators (James-Stein, Ledoit-Wolf) or Bayesian shrinkage (Vasicek 1973) often outperform in practice for portfolio construction.
Changing Factor Premia
Factor premia are not stationary. Post-publication decay (McLean and Pontiff 2016) is well-documented: anomalies shrink by roughly 25% after academic publication as capital arbitrages them away. Factor models trained on historical data should be validated on out-of-sample periods; the period should not overlap the discovery sample.
Interview Angle
L1. Derive the CAPM Security Market Line. What is beta, and how do you estimate it? Why does drift not appear in Black-Scholes but expected return does appear in the CAPM? (Answer: BS is a no-arbitrage argument under risk neutrality; CAPM is an equilibrium theory about required returns under physical measure.)
L2. Explain Fama-MacBeth and why it's preferred over a single pooled panel regression for estimating factor premia. What is the Shanken correction and why is it needed? Compare the economic interpretation of SMB and HML under the rational risk story versus the behavioural story.
Fama-MacBeth vs pooled OLS. A pooled OLS of on betas treats all observations as independent, which is false — errors are cross-sectionally correlated at each . Fama-MacBeth separates the time dimension (averaging across periods) from the cross-section, giving standard errors that are robust to cross-sectional correlation. The cost: it ignores time-series correlation of the series (Newey-West adjustment extends this).
L3. Critique Roll's critique of CAPM. How does the APT avoid Roll's critique? What is the distinction between a statistical factor model (PCA on returns covariance) and a fundamental factor model (sector, style exposures), and when would you use each? How would you test whether a new factor is priced versus a statistical artefact — what criteria (economic intuition, out-of-sample, Bayesian t-statistic threshold) would you apply?