ECON2515 · Intermediate Applied Econometrics Ii
Multicollinearity, Indicator Variables and Interactions
Week 8 of ECON 2515 pairs two unrelated but often-examined tools. Multicollinearity is high (but not perfect) correlation among your regressors: it inflates standard errors and makes individual t-tests insignificant even while the overall F stays significant and R² stays high — yet it does not bias OLS. Indicator (dummy) variables put 0/1 group membership into a regression: a dummy alone shifts the intercept, while a dummy interacted with a continuous variable shifts the slope. The exam skill is to detect collinearity with the VIF and to interpret dummies and interactions by splitting the equation and differentiating for the marginal effect.
What this chapter covers
- 011. What multicollinearity is — high but imperfect correlation among the x's; not an MLR.3 violation (that forbids only perfect collinearity)
- 022. The fingerprint — insignificant individual t's but a significant joint F and high R²; unbiased but imprecise
- 033. Detection: the VIF — VIFⱼ = 1/(1 − R²ⱼ) from an auxiliary regression; flag VIF > 10 (auxiliary R²ⱼ > 0.8)
- 044. Consequences and remedies — do nothing, collect more data, drop or combine variables, or impose a theory restriction
- 055. Indicator (dummy) variables — 0/1 group markers as intercept shifters; coefficient = mean-y gap vs the base category
- 066. The dummy-variable trap — include m − 1 dummies for m categories; the omitted one is the base
- 077. Interaction terms — the product x·D lets the slope differ; β₃ is the slope GAP, not the slope
- 088. The marginal effect — differentiate: ∂y/∂x = β₁ + β₃·D, evaluated at the group value
Dummy interacted with a continuous variable — split, interpret, differentiate
- +4(a) Split by the dummy. Non-MBA (MBA = 0): log(salary) = 3.10 + 0.040·exper. MBA (MBA = 1): log(salary) = (3.10 + 0.150) + (0.040 + 0.012)·exper = 3.25 + 0.052·exper.
- +4(b) Interpret. β̂₂ = 0.150: at exper = 0 an MBA holder's log-salary is about 100×0.150 = 15% higher than a non-MBA holder's — an intercept shift. β̂₃ = 0.012: each extra year of experience raises an MBA holder's salary by about 1.2 percentage points more per year — a slope shift.
- +2(c) Return for an MBA holder = ∂log(salary)/∂exper = β̂₁ + β̂₃·MBA = 0.040 + 0.012·1 = 0.052 → about 5.2% per year (vs 4.0% for a non-MBA holder).
- +2(d) Premium at exper = 10 = β̂₂ + β̂₃·exper = 0.150 + 0.012·10 = 0.150 + 0.120 = 0.270 → about 27% higher. The premium widens with experience because β̂₃ > 0.
Key terms
- Multicollinearity
- High but imperfect correlation among the explanatory variables. It inflates standard errors and makes individual t's insignificant, but leaves OLS unbiased.
- Variance Inflation Factor (VIF)
- VIFⱼ = 1/(1 − R²ⱼ), how many times the variance of β̂ⱼ is inflated by collinearity. Rule of thumb: VIF > 10 is a serious concern.
- Auxiliary regression
- Regressing one regressor xⱼ on all the other x's. A high R²ⱼ (> 0.8) signals that xⱼ is largely explained by the rest — the VIF's input.
- Indicator (dummy) variable
- A binary 0/1 variable marking group membership (e.g. female = 1). Entered alone it shifts the intercept.
- Base (reference) category
- The omitted category absorbed into the intercept. Every dummy's coefficient is a difference relative to this base group.
- Interaction term
- A product of two variables (e.g. exper × MBA) that lets one variable's effect depend on the other — a slope shifter.
- Intercept vs slope shifter
- A dummy alone moves only the intercept (parallel lines); adding its interaction with a continuous variable lets the slope differ (lines fan out).
- Marginal effect
- With an interaction, the effect of x is ∂y/∂x = β₁ + β₃·D — differentiate and evaluate at the group value, never just β₁ or just β₃.
Multicollinearity, Indicator Variables and Interactions FAQ
Does multicollinearity bias the coefficients?
No. OLS stays unbiased — the coefficients are still right on average. Multicollinearity only inflates the standard errors, so estimates are imprecise and unstable. Bias is the omitted-variable-bias story, a different problem.
How do I detect it — is a high pairwise correlation enough?
A pairwise correlation is a hint, not proof. Report the VIFⱼ = 1/(1 − R²ⱼ) from an auxiliary regression of xⱼ on the other x's and flag VIF > 10 (equivalently auxiliary R²ⱼ > 0.8). In R, use vif(model).
What's the tell-tale sign in the output?
Insignificant individual t-stats but a significant overall F and a high R², plus coefficients that flip sign or have huge standard errors. That pattern is multicollinearity, not evidence the variables don't matter.
What is the dummy-variable trap?
Including a dummy for every category plus an intercept. The dummies then sum to 1, creating perfect collinearity, so MLR.3 fails and OLS can't run. Fix: for m categories include m − 1 dummies and leave one as the base.
Dummy alone vs interaction — what's the difference?
A dummy alone shifts the intercept: both groups share one slope and the lines are parallel, gap = β₂. Adding the interaction x·D lets the slope differ: the lines fan out and β₃ is the difference in slopes.
How do I get the marginal effect when there's an interaction?
Differentiate. For y = β₀ + β₁x + β₂D + β₃(x·D), ∂y/∂x = β₁ + β₃D, so the effect is β₁ for the base group (D = 0) and β₁ + β₃ for the group (D = 1). Always keep the main-effect terms x and D in the model when their interaction is present.
Exam move
Treat this as three separate exam skills. For multicollinearity, memorise the fingerprint — insignificant t's but a significant F and high R² — and be ready to compute a VIF = 1/(1 − R²ⱼ) and state the verdict: unbiased but imprecise, so don't drop a relevant variable (that would cause omitted-variable bias). For dummies, always name the base category and read the coefficient as a mean-y gap relative to it. For interactions, drill the routine: split the equation by the dummy, read β₂ as the intercept gap and β₃ as the slope gap, then differentiate (∂y/∂x = β₁ + β₃D) and plug in the group value — in a log model convert to percentages with 100×β. Finally, keep the three diagnostic problems straight: omitted-variable bias biases coefficients; multicollinearity inflates SEs (imprecise); heteroskedasticity makes SEs wrong (invalid inference). Same OLS estimates, three different stories about what breaks.