ECON2515 · Intermediate Applied Econometrics Ii
Heteroskedasticity: Detection and Correction
Heteroskedasticity means the error variance is not constant across observations — Var(u | x) = σᵢ² ≠ σ² — which breaks the homoskedasticity assumption MLR.5. The result you must state precisely: OLS coefficients stay unbiased and linear, but they are no longer BLUE, and the usual standard errors are biased, so every t-test, F-test and confidence interval built on them becomes unreliable. This ECON 2515 Week 9 topic (and Quiz 4) teaches you to detect it — residual/û² plots plus the Breusch-Pagan and White LM tests, where LM = n·R²_aux ~ χ² — and to correct it with robust (White) standard errors or WLS/GLS. The one thing you never do is claim the coefficients are biased or 're-estimate the betas'.
What this chapter covers
- 011. What it is — Var(u | x) = σᵢ² varies with the regressors, violating MLR.5 (homoskedasticity)
- 022. Consequences — OLS stays unbiased and linear but is no longer BLUE; the usual SEs are biased
- 033. What breaks vs survives — coefficients and R² look fine; t-tests, F-tests and CIs are invalid
- 044. Detection by eye — residual/û² plots; the funnel/trumpet shape signals a non-constant variance
- 055. Breusch-Pagan test — regress û² on the x's; LM = n·R² ~ χ²(df = number of auxiliary regressors)
- 066. White test — add squares and cross-products (or use ŷ, ŷ²); catches non-linear variance forms
- 077. Robust (White) standard errors — the easy fix; corrects the SEs and leaves coefficients unchanged
- 088. WLS / GLS / FGLS — re-weight by 1/√h to restore efficiency (BLUE); interpret in original units
Run a Breusch-Pagan test and prescribe the fix
- +2(a) Heteroskedasticity means the error variance is not constant across observations — here it rises with fitted price, so Var(u | x) = σᵢ² rather than a single σ². The fanning residual plot is evidence that the spread of the errors grows with the regressors, which violates MLR.5, the homoskedasticity assumption Var(u | x) = σ².
- +4(b) Breusch-Pagan regresses the squared OLS residuals on the model's regressors: û² = δ₀ + δ₁size + δ₂beds + δ₃age + v. The hypotheses are H₀: δ₁ = δ₂ = δ₃ = 0 (the variance does not depend on the x's — homoskedasticity) versus H₁: at least one δ ≠ 0 (the variance depends on at least one regressor — heteroskedasticity).
- +4(c) The test statistic is LM = n·R² = 250 × 0.048 = 12.0, which follows χ²(3) under H₀ — the df equal the number of regressors in the auxiliary regression. Since 12.0 > χ²₀.₀₅,₃ = 7.81, the statistic falls in the upper-tail rejection region, so reject H₀: there is significant evidence of heteroskedasticity at the 5% level.
- +2(d) Report heteroskedasticity-robust (White) standard errors. The coefficients are unchanged — only the standard errors, and therefore the t-statistics, confidence intervals and p-values, are corrected so that inference is valid again. (An alternative fix is WLS/GLS, which re-weights the data to restore efficiency.)
Key terms
- Heteroskedasticity
- A non-constant error variance, Var(u | x) = σᵢ² ≠ σ², so the typical size of the error changes with the regressors. It violates MLR.5 and makes the usual OLS standard errors wrong.
- Homoskedasticity (MLR.5)
- The Gauss-Markov assumption that the conditional error variance is a single constant, Var(u | x) = σ², for every observation. It is needed for OLS efficiency (BLUE) and for correct standard errors.
- Consequence for OLS
- Under heteroskedasticity OLS stays unbiased, linear and consistent, but is no longer BLUE (efficient), and the conventional standard errors are biased — so t-tests, F-tests, CIs and p-values are unreliable.
- Breusch-Pagan (LM) test
- Regress the squared residuals û² on the original regressors; the statistic LM = n·R²_aux follows χ²(df = number of auxiliary regressors). A large LM rejects homoskedasticity. It assumes a linear variance form.
- White test
- A more flexible test that regresses û² on the regressors, their squares and cross-products (or, to save df, on the fitted values ŷ and ŷ²). Same LM = n·R² ~ χ² rule, but it catches non-linear variance forms.
- Robust (White) standard errors
- Also Huber/Eicker or 'sandwich' SEs. They give a variance estimate valid under arbitrary, unknown heteroskedasticity. The coefficients are unchanged; only the standard errors — and hence t, F, CI, p — are corrected.
- Weighted least squares (WLS) / GLS
- If Var(u | x) = σ²h(x) with h known, divide every term (including the intercept) by √hᵢ so the transformed error is homoskedastic; OLS on the transformed model is GLS = BLUE, giving less weight to high-variance observations.
- Feasible GLS (FGLS)
- GLS when the variance form is unknown: run OLS, form ln(û²), regress it on the x's to get fitted ĝ, set ĥ = exp(ĝ), then run WLS with weights 1/ĥ. Inference is on the transformed model; interpret in original units.
Heteroskedasticity: Detection and Correction FAQ
Does heteroskedasticity bias the OLS coefficients?
No — this is the single most common wrong answer. The point estimates stay unbiased and consistent; what breaks are the standard errors, and therefore all inference (t-tests, F-tests, confidence intervals, p-values). Contrast this with omitted-variable bias, which does bias the coefficients. Confusing the two diagnostics loses easy marks in both the MCQ and Part B.
What is the difference between the Breusch-Pagan and White tests?
Both form LM = n·R² from an auxiliary regression of the squared residuals and compare it to a χ² critical value. Breusch-Pagan regresses û² on the original regressors only, so it assumes a linear variance form. The White test adds the squares and cross-products of the regressors (or uses ŷ and ŷ²), so it can detect arbitrary, non-linear heteroskedasticity — at the cost of many more degrees of freedom.
What degrees of freedom does the LM statistic use, and which table?
df equals the number of regressors in the auxiliary regression — not n and not n − k − 1. And LM = n·R² is a chi-square statistic, so you read it against the χ² table in the upper tail, never the F or t table. Getting the df or the table wrong flips the decision on a borderline statistic.
How do I fix heteroskedasticity once I have found it?
The easy default is heteroskedasticity-robust (White) standard errors: keep the OLS coefficients and just replace the SE formula with one valid under unknown heteroskedasticity, then do t/F inference as usual. If you know the variance form you can use WLS/GLS, which divides every term by √hᵢ to restore efficiency (BLUE); when the form is unknown, estimate it first with feasible GLS.
The White test uses ŷ or y?
The fitted values ŷ (and ŷ²), never the raw outcome y. The df-saving version of the White test regresses û² on ŷ and ŷ², because the fitted values are a compact summary of all the regressors. Using y or y² instead is a marked error.
After a WLS transformation, how do I interpret the coefficients?
Do the t-tests and F-tests on the transformed (re-weighted) model, because that is where the errors are homoskedastic and the SEs are valid. But interpret the coefficients in the original units of the model — the transformation is only a device to fix the variance, not a change in what the parameters mean. Remember also to divide the intercept term (the 1) by √hᵢ when doing WLS by hand.
Exam move
Memorise the one-line consequence verbatim and be able to defend every word: 'under heteroskedasticity OLS is still unbiased and linear but no longer BLUE, and the usual standard errors — and every t, F and CI built on them — are wrong.' Then drill the detect-and-fix routine until it is automatic: describe the funnel/trumpet residual plot and name MLR.5; write the auxiliary regression of û² and state H₀ (all δ = 0) versus H₁; form LM = n·R² and compare it to χ² at df = number of auxiliary regressors in the upper tail; decide; and prescribe the fix (robust SEs, which leave the coefficients unchanged, or WLS/GLS). Keep a one-page reference with the two tests side by side (Breusch-Pagan on the original x's, White adding squares and cross-products or ŷ and ŷ²), the LM = n·R² ~ χ² rule, and the WLS transform (divide every term by √hᵢ). Finally, practise the traps that examiners plant: never say the coefficients are biased, never read LM against the F table, never regress û² on y in the White test, and always separate this diagnostic (SEs break) from omitted-variable bias (coefficients break) and multicollinearity (coefficients fine but imprecise).