University of Sydney · S1 2026 · FACULTY OF BUSINESS & ECONOMICS

QBUS5001 · Foundation In Data Analytics For Business

- one subject, every graph, every model, every mark
50% final exam · hurdle14 Chapters8-page Bible
Our own words - no uploaded lecturer files
Built to mirror S1 2026 · updated this semester
Chapter 10 of 11 · QBUS5001

Regression Diagnostics & Inference

Module 11 asks whether a fitted line can be trusted. The four L.I.N.E. assumptions — Linearity, Independence of errors, Normality of errors, Equal variance — are checked through residual plots. You then do inference: a t-test on the slope (usually testing β₁ = 0), confidence and prediction intervals for Y, and a t-test on the correlation coefficient.

For time-series data, autocorrelation is diagnosed with the Durbin–Watson statistic (near 2 = none, below 2 = positive, above 2 = negative). The module closes with the classic pitfalls: extrapolation, confusing correlation with causation, and influential outliers.

In this chapter

What this chapter covers

  • 01The four L.I.N.E. assumptions and residual-plot checks
  • 02t-test on the slope: T = (b₁ − β₁)/s(b₁) ~ t(n−2)
  • 03Confidence interval for the slope
  • 04t-test on the correlation coefficient
  • 05Confidence interval for mean Y at a given X
  • 06Prediction interval for an individual Y
  • 07Autocorrelation and the Durbin–Watson statistic
  • 08Pitfalls: extrapolation, causation vs correlation, outliers
Worked example · free

t-test on a regression slope

Q [6 marks]. A simple linear regression on n = 22 observations estimates a slope of b₁ = 1.40 with standard error s(b₁) = 0.45. Test at α = 0.05 whether the slope is significantly different from zero. Use t(0.025, 20) ≈ 2.086.
  • 1 markHypotheses: H₀: β₁ = 0 versus H₁: β₁ ≠ 0 (the variable X has no linear effect on Y under H₀).
  • 1 markDegrees of freedom: n − 2 = 22 − 2 = 20.
  • 2 marksTest statistic: T = (b₁ − 0)/s(b₁) = 1.40/0.45 = 3.1111.
  • 1 markCritical value: reject if |T| > t(0.025, 20) ≈ 2.086.
  • 1 markDecision and conclusion: 3.1111 > 2.086, so reject H₀ — the slope is significantly different from zero, so X is a significant linear predictor of Y at the 5% level.
T = 1.40/0.45 = 3.1111 on 20 df exceeds the critical 2.086, so reject H₀: the slope is significantly different from zero and X is a significant predictor of Y at 5%.
Sia tip — The slope t-test uses n−2 degrees of freedom (two parameters estimated, b₀ and b₁). A significant slope and a small p-value in Excel's regression output say the same thing — quote whichever the question provides.
Glossary

Key terms

L.I.N.E. assumptions
The four conditions for valid simple regression inference: Linearity, Independence of errors, Normality of errors, and Equal variance (homoscedasticity).
Homoscedasticity
The assumption that the error variance is constant across all values of X; its violation (a fan-shaped residual plot) is heteroscedasticity.
Durbin–Watson statistic
A diagnostic for autocorrelation of residuals lying in [0, 4]: near 2 indicates no autocorrelation, below 2 positive autocorrelation, above 2 negative.
Prediction interval
An interval for an individual future Y at a given X, wider than the confidence interval for the mean Y because it adds the irreducible scatter of single observations.
Influential outlier
An observation with an extreme X or large residual that disproportionately changes the fitted line; flagged in residual analysis as a pitfall.
FAQ

Regression Diagnostics & Inference FAQ

What is the difference between a confidence interval and a prediction interval here?

The confidence interval estimates the mean of Y at a given X; the prediction interval estimates a single new Y at that X. The prediction interval is always wider because it includes the variability of an individual observation (the “1 +” term under the root).

How do I read the Durbin–Watson statistic?

It ranges from 0 to 4. A value near 2 means no autocorrelation; below 2 indicates positive autocorrelation (common in time-series); above 2 indicates negative. Between the dL and dU table bounds the test is inconclusive.

Does a significant slope prove X causes Y?

No. Regression establishes a statistical association, not causation. A significant slope can arise from confounding, reverse causation or coincidence; causal claims need experimental design or strong theory, which is a key pitfall the module emphasises.

Study strategy

Exam move

Memorise L.I.N.E. as a checklist and pair each letter with the residual plot that diagnoses it (curvature → linearity, runs/patterns → independence, QQ-plot → normality, funnel → equal variance). For inference, note that the slope t-test uses n−2 df, and practise reading Excel regression output directly, since the exam often hands you the coefficient table and asks you to test and interpret rather than compute from raw data.

A+Everything unlocked
Unlocks this Bible + all 203 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.
Sia - your QBUS5001 tutor, unlimited, worked the way the exam marks it
The full 8-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full QBUS5001 Bible + 203 University of Sydney subjects解锁完整 QBUS5001 Bible + University of Sydney 203 门科目
$25/mo