University of Sydney · S1 2026 · FACULTY OF BUSINESS & ECONOMICS

QBUS5001 · Foundation In Data Analytics For Business

- one subject, every graph, every model, every mark
50% final exam · hurdle14 Chapters8-page Bible
Our own words - no uploaded lecturer files
Built to mirror S1 2026 · updated this semester
Chapter 11 of 11 · QBUS5001

Multiple Linear Regression & Dummy Variables

Module 12 generalises regression to several predictors: Ŷ = b₀ + b₁X₁ + … + b_kX_k. You compare models with adjusted R² (which penalises adding weak predictors), test the model as a whole with the overall F-test (H₀: all slopes = 0), and test individual coefficients with t-tests on n−k−1 degrees of freedom.

The module also handles categorical predictors through dummy variables: a 2-level category needs one 0/1 dummy, and a c-level category needs c−1 dummies to avoid the dummy-variable trap. Each dummy coefficient shifts the intercept relative to the omitted reference category.

In this chapter

What this chapter covers

  • 01The MLR model and estimated equation Ŷ = b₀ + b₁X₁ + … + b_kX_k
  • 02Interpreting a coefficient “holding other variables fixed”
  • 03Coefficient of multiple determination R²
  • 04Adjusted R² and comparing competing models
  • 05Overall F-test: F = MSR/MSE ~ F(k, n−k−1)
  • 06Individual coefficient t-tests on n−k−1 df
  • 07Dummy variables for categorical predictors
  • 08The c−1 rule and the dummy-variable trap
Worked example · free

Adjusted R² and the overall F-test

Q [7 marks]. A 3-predictor multiple regression on n = 40 observations gives SSR = 320 and SSE = 80 (so SST = 400). Compute R² and adjusted R², then carry out the overall F-test at α = 0.05. Use F(0.05; 3, 36) ≈ 2.87.
  • 1 markR² = SSR/SST = 320/400 = 0.80 (the model explains 80% of the variation in Y).
  • 1 markAdjusted R² = 1 − (SSE/(n−k−1))/(SST/(n−1)) = 1 − (80/36)/(400/39).
  • 1 markEvaluate: (80/36) = 2.2222 and (400/39) = 10.2564, so adjusted R² = 1 − 2.2222/10.2564 = 1 − 0.2167 = 0.7833.
  • 1 markOverall F-test hypotheses: H₀: β₁ = β₂ = β₃ = 0 versus H₁: at least one slope ≠ 0.
  • 1 markMean squares: MSR = SSR/k = 320/3 = 106.67; MSE = SSE/(n−k−1) = 80/36 = 2.2222.
  • 1 markF = MSR/MSE = 106.67/2.2222 = 48.0 on (3, 36) df.
  • 1 markDecision: 48.0 > 2.87, so reject H₀ — the model is jointly significant; at least one predictor has a non-zero slope.
R² = 0.80, adjusted R² = 0.7833, and the overall F = 48.0 far exceeds the critical 2.87, so reject H₀: the model is jointly significant at 5%.
Sia tip — Adjusted R² is always slightly below R² and is the right measure when comparing models with different numbers of predictors. For the F-test, MSR uses k in the denominator and MSE uses n−k−1 — mixing these up is the most common slip.
Glossary

Key terms

Multiple linear regression
A model relating Y to several predictors, Ŷ = b₀ + b₁X₁ + … + b_kX_k, where each slope is the effect of its predictor holding the others fixed.
Adjusted R²
R² corrected for the number of predictors, 1 − (SSE/(n−k−1))/(SST/(n−1)); it can fall when a weak predictor is added, making it the fair basis for model comparison.
Overall F-test
A test of whether the model as a whole is significant, F = MSR/MSE ~ F(k, n−k−1), with H₀ that all slope coefficients are zero.
Dummy variable
A 0/1 indicator encoding a category; its coefficient is the shift in the intercept relative to the omitted reference category.
Dummy-variable trap
The perfect collinearity that arises from including a dummy for every level of a category; avoided by using c−1 dummies for c levels.
FAQ

Multiple Linear Regression & Dummy Variables FAQ

How many dummy variables do I need for a categorical predictor?

For a category with c levels you need c−1 dummies. For example, a 4-season variable needs 3 dummies; the omitted level becomes the reference category against which the others are compared. Using all c dummies causes the dummy-variable trap.

Why use adjusted R² rather than R² to compare models?

Plain R² never decreases when you add a predictor, so it always favours bigger models. Adjusted R² penalises extra predictors, so it rises only when a new variable improves fit more than chance would predict — making it the honest comparison metric.

What is the difference between the overall F-test and the individual t-tests?

The F-test asks whether the model collectively explains Y (any slope non-zero). Each t-test asks whether one specific coefficient is non-zero, holding the others fixed. A model can be F-significant overall while some individual coefficients are not.

Study strategy

Exam move

Anchor the degrees of freedom: individual t-tests and SER use n−k−1, the F-test uses (k, n−k−1). Practise interpreting a dummy coefficient as a shift relative to the reference category and rehearse the c−1 rule on a worked example with three or more levels. Since the exam usually supplies an Excel regression output, drill reading R², adjusted R², the F-statistic and the coefficient t-values straight off the printout and translating each into a business sentence.

A+Everything unlocked
Unlocks this Bible + all 203 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.
Sia - your QBUS5001 tutor, unlimited, worked the way the exam marks it
The full 8-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full QBUS5001 Bible + 203 University of Sydney subjects解锁完整 QBUS5001 Bible + University of Sydney 203 门科目
$25/mo