ECON2515 · Intermediate Applied Econometrics Ii
Course Review and Applied Model Critique
Week 10 adds no new theory — it stacks the whole course into the one move the final rewards: read a regression output and critique it end to end. That means interpreting each coefficient in its functional form, separating statistical fit (R², F) from causal validity (is E[u|x] = 0?), running the right test (t, F, or a linear combination), and naming the likely assumption violation. This ECON 2515 revision topic maps directly onto the closed-book final, whose Part B is almost always "here is a fitted model — critique it."
What this chapter covers
- 011. Population vs sample — PRF (true β, error u) vs SRF (estimate β̂, residual û); OLS minimises Σû²
- 022. The CLM assumptions MLR.1–6 — and which property each one buys (MLR.4 → unbiased/causal, MLR.5 → valid SEs)
- 033. Causality vs endogeneity — E[u|x] ≠ 0 means biased, not causal; three channels: omitted vars, reverse causality, measurement error
- 044. Omitted-variable bias — Bias = β₂ × δ; sign it with the effect on y AND the correlation with the included x
- 055. Reading R output — recompute t = Estimate/SE, a CI, an F, a marginal effect; the printed p is two-sided
- 066. Interpreting every coefficient type — level-level, log-linear, linear-log, log-log, dummy, quadratic, interaction
- 077. The inference toolkit — t vs F, linear combinations via Var(a±b)=Var+Var±2Cov, R² vs adjusted R² for model choice
- 088. Diagnostics roundup — OVB biases coefficients; multicollinearity → imprecise; heteroskedasticity → wrong SEs
Critique a housing-price regression — the signature Part-B question
- +3(a) The dependent variable is logged, so a one-room increase is associated with about 100 × 0.086 = 8.6% higher price, holding DIST and AGE fixed. It is clearly significant: t = 0.086/0.022 = 3.9 > 2.
- +4(b) Not on its own. ROOMS is likely endogenous — unobserved quality, renovation and neighbourhood desirability sit in the error u and correlate with room count, so E[u|ROOMS] ≠ 0 and MLR.4 fails. The positive sign is plausible but possibly biased; you would need a natural experiment, an instrument or panel data to defend a causal claim.
- +6(c) Linear combination: β̂₁ + β̂₂ = 0.086 + (−0.048) = 0.038. Var(β̂₁+β̂₂) = 0.000484 + 0.000361 + 2(−0.00012) = 0.000605, so SE = √0.000605 = 0.0246 and t = 0.038/0.0246 = 1.55. With df = 176 the critical value ≈ 1.97, and 1.55 < 1.97, so fail to reject H₀ — no evidence the two effects sum to something other than zero. The covariance term is the piece students forget.
- +3(d) The model explains only 34% of the variation in log(price) — normal for cross-sectional micro data — while the highly significant F says the regressors are jointly significant. There is no contradiction: R² measures fit, whereas the F-test asks whether the predictors matter at all.
- +4(e) A 10% effect in a log-dependent model means β₁ ≈ 0.10, and the claim is the alternative, so H₀: β₁ ≤ 0.10 vs H₁: β₁ > 0.10 with c = 0.10, not 0. Recompute: t = (0.086 − 0.10)/0.022 = −0.014/0.022 = −0.64. The one-sided critical value ≈ 1.65, and −0.64 < 1.65, so fail to reject — no evidence the room effect exceeds 10%.
Key terms
- PRF vs SRF
- The population regression function holds the true unknown parameters β and the unobservable error u; the sample regression function holds the estimates β̂ and the observable residual û = y − ŷ. OLS estimates the β's by minimising Σû².
- Classical linear model (CLM)
- The Gauss-Markov assumptions MLR.1–5 plus normal errors MLR.6. Under MLR.1–5 OLS is BLUE; adding MLR.6 (or the CLT in large samples) makes the t and F tests exact.
- MLR.4 (exogeneity)
- The zero-conditional-mean assumption E[u|x₁…xₖ] = 0 — nothing in the error is correlated with a regressor. It is exactly what makes OLS unbiased and a slope causal, and it is what an omitted variable breaks.
- Endogeneity
- A regressor being correlated with the error, E[u|x] ≠ 0, through omitted variables, reverse causality or measurement error. It biases the coefficient, so regression is not causation.
- Omitted-variable bias (OVB)
- The bias when a relevant variable that correlates with an included regressor is left out. Bias = β₂ × δ, so its direction needs both the omitted variable's effect on y and its correlation with the included x.
- Marginal effect
- The change in y for a one-unit change in x, which is not constant once logs, squares or interactions appear: for β₁x + β₂x² it is β₁ + 2β₂x (with turning point −β₁/2β₂), and for an interaction β₂x₂ + β₅x₂·x₃ it is β₂ + β₅·x₃.
- Adjusted R²
- R̄² = 1 − (1 − R²)(n − 1)/(n − k − 1): R² penalised for the number of regressors. Unlike raw R² it can fall when a useless variable is added, so it is the measure used to compare models of different size.
- Variance-covariance matrix (VCE)
- The matrix of estimator variances and covariances. It supplies Var(β̂₁ ± β̂₂) = Var(β̂₁) + Var(β̂₂) ± 2Cov(β̂₁,β̂₂), the standard error needed to test any linear combination of coefficients.
Course Review and Applied Model Critique FAQ
What does OLS actually estimate — the error or the parameters?
The parameters. OLS estimates the population β's (and the regression function E[y|x]), producing β̂; it never observes the error u. The residual û = y − ŷ is the sample estimate of u, but u itself — the true deviation from the population line — is unobservable. Getting this straight is a recurring short-answer mark.
How do I know which assumption to blame for a problem?
Match the casualty to the cause. MLR.4 (exogeneity) failing biases the coefficients — that is omitted-variable bias or endogeneity. MLR.5 (homoskedasticity) failing leaves the coefficients fine but makes the standard errors wrong — that is heteroskedasticity, fixed with robust SEs. Multicollinearity keeps the coefficients unbiased but inflates their standard errors, so t's look insignificant while the F is significant. Three problems, three different casualties.
Why can't I just compare models using R²?
Because raw R² can only rise as you add regressors — even a column of noise nudges it up — so it always prefers the bigger model. Use adjusted R², which subtracts a penalty for extra terms and can fall when a variable does not earn its place, or run an F-test between the restricted and unrestricted models.
The software already prints a t and a p-value — when can't I use them?
Whenever the null value is not zero or the test is one-sided. The printed t and p always test βₖ = 0, so for a 'more than X' claim you must reset c = X and recompute t = (β̂ − c)/se by hand. The printed p is also two-sided: for a one-sided test compare p/2 to α and check the estimate is on the claimed side.
How do I test a sum or difference of two coefficients?
Build the standard error from the variance-covariance matrix: Var(β̂₁ ± β̂₂) = Var(β̂₁) + Var(β̂₂) ± 2Cov(β̂₁,β̂₂), then t = (β̂₁ ± β̂₂)/√Var(·) on df = n − k − 1. The covariance term is the most-forgotten piece and often flips the decision, so it is where the marks are on part (c)-style questions.
What should go on my one A4 cheat sheet?
Because the exam rewards application over recall, load the sheet with interpretation tools rather than derivations: the functional-form interpretation table, the marginal-effect and turning-point formulas, the MLR.1–6 list with what each assumption buys, the OVB sign grid, the linear-combination variance formula, and the six-step test write-up template. You also get a calculator and statistical tables, so you do not need to copy those.
Exam move
Treat revision as rehearsing one move: given an output, critique it. Take past-style questions and run the same skeleton every time — interpret each coefficient in its functional form (level, log, dummy, quadratic, interaction), check significance, then judge economic magnitude, keeping 'statistically significant' separate from 'economically important'. Drill the three tools until choosing between them is automatic: a t-test for one coefficient, an F-test for several restrictions, and a var-cov-based t for a sum or difference where Var(a±b) = Var(a) + Var(b) ± 2Cov(a,b). Rehearse the diagnosis reflex — omitted-variable bias biases coefficients, multicollinearity makes them imprecise, heteroskedasticity makes the SEs wrong — and always ask whether E[u|x] = 0 before calling any slope causal. Build the A4 sheet around the functional-form table, the marginal-effect and turning-point formulas, the MLR.1–6 pay-offs, the OVB sign grid, and the six-step template, then cover it and reconstruct each row from memory. Finish every worked answer with a plain-English economic sentence in the variable's real units — in Part B that closing line is where the conclusion mark lives.