QBUS5001 · Foundation In Data Analytics For Business
Multiple Linear Regression & Dummy Variables
Module 12 generalises regression to several predictors: Ŷ = b₀ + b₁X₁ + … + b_kX_k. You compare models with adjusted R² (which penalises adding weak predictors), test the model as a whole with the overall F-test (H₀: all slopes = 0), and test individual coefficients with t-tests on n−k−1 degrees of freedom.
The module also handles categorical predictors through dummy variables: a 2-level category needs one 0/1 dummy, and a c-level category needs c−1 dummies to avoid the dummy-variable trap. Each dummy coefficient shifts the intercept relative to the omitted reference category.
What this chapter covers
- 01The MLR model and estimated equation Ŷ = b₀ + b₁X₁ + … + b_kX_k
- 02Interpreting a coefficient “holding other variables fixed”
- 03Coefficient of multiple determination R²
- 04Adjusted R² and comparing competing models
- 05Overall F-test: F = MSR/MSE ~ F(k, n−k−1)
- 06Individual coefficient t-tests on n−k−1 df
- 07Dummy variables for categorical predictors
- 08The c−1 rule and the dummy-variable trap
Adjusted R² and the overall F-test
- 1 markR² = SSR/SST = 320/400 = 0.80 (the model explains 80% of the variation in Y).
- 1 markAdjusted R² = 1 − (SSE/(n−k−1))/(SST/(n−1)) = 1 − (80/36)/(400/39).
- 1 markEvaluate: (80/36) = 2.2222 and (400/39) = 10.2564, so adjusted R² = 1 − 2.2222/10.2564 = 1 − 0.2167 = 0.7833.
- 1 markOverall F-test hypotheses: H₀: β₁ = β₂ = β₃ = 0 versus H₁: at least one slope ≠ 0.
- 1 markMean squares: MSR = SSR/k = 320/3 = 106.67; MSE = SSE/(n−k−1) = 80/36 = 2.2222.
- 1 markF = MSR/MSE = 106.67/2.2222 = 48.0 on (3, 36) df.
- 1 markDecision: 48.0 > 2.87, so reject H₀ — the model is jointly significant; at least one predictor has a non-zero slope.
Key terms
- Multiple linear regression
- A model relating Y to several predictors, Ŷ = b₀ + b₁X₁ + … + b_kX_k, where each slope is the effect of its predictor holding the others fixed.
- Adjusted R²
- R² corrected for the number of predictors, 1 − (SSE/(n−k−1))/(SST/(n−1)); it can fall when a weak predictor is added, making it the fair basis for model comparison.
- Overall F-test
- A test of whether the model as a whole is significant, F = MSR/MSE ~ F(k, n−k−1), with H₀ that all slope coefficients are zero.
- Dummy variable
- A 0/1 indicator encoding a category; its coefficient is the shift in the intercept relative to the omitted reference category.
- Dummy-variable trap
- The perfect collinearity that arises from including a dummy for every level of a category; avoided by using c−1 dummies for c levels.
Multiple Linear Regression & Dummy Variables FAQ
How many dummy variables do I need for a categorical predictor?
For a category with c levels you need c−1 dummies. For example, a 4-season variable needs 3 dummies; the omitted level becomes the reference category against which the others are compared. Using all c dummies causes the dummy-variable trap.
Why use adjusted R² rather than R² to compare models?
Plain R² never decreases when you add a predictor, so it always favours bigger models. Adjusted R² penalises extra predictors, so it rises only when a new variable improves fit more than chance would predict — making it the honest comparison metric.
What is the difference between the overall F-test and the individual t-tests?
The F-test asks whether the model collectively explains Y (any slope non-zero). Each t-test asks whether one specific coefficient is non-zero, holding the others fixed. A model can be F-significant overall while some individual coefficients are not.
Exam move
Anchor the degrees of freedom: individual t-tests and SER use n−k−1, the F-test uses (k, n−k−1). Practise interpreting a dummy coefficient as a shift relative to the reference category and rehearse the c−1 rule on a worked example with three or more levels. Since the exam usually supplies an Excel regression output, drill reading R², adjusted R², the F-statistic and the coefficient t-values straight off the printout and translating each into a business sentence.