University of Sydney · S1 2026 · FACULTY OF BUSINESS & ECONOMICS

QBUS5001 · Foundation In Data Analytics For Business

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters8-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 10 of 11 · QBUS5001

Regression Diagnostics & Inference

Module 11 asks whether a fitted line can be trusted. The four L.I.N.E. assumptions — Linearity, Independence of errors, Normality of errors, Equal variance — are checked through residual plots. You then do inference: a t-test on the slope (usually testing β₁ = 0), confidence and prediction intervals for Y, and a t-test on the correlation coefficient.

For time-series data, autocorrelation is diagnosed with the Durbin–Watson statistic (near 2 = none, below 2 = positive, above 2 = negative). The module closes with the classic pitfalls: extrapolation, confusing correlation with causation, and influential outliers.

In this chapter

What this chapter covers

01The four L.I.N.E. assumptions and residual-plot checks
02t-test on the slope: T = (b₁ − β₁)/s(b₁) ~ t(n−2)
03Confidence interval for the slope
04t-test on the correlation coefficient
05Confidence interval for mean Y at a given X
06Prediction interval for an individual Y
07Autocorrelation and the Durbin–Watson statistic
08Pitfalls: extrapolation, causation vs correlation, outliers

Worked example · free

t-test on a regression slope

Q [6 marks]. A simple linear regression on n = 22 observations estimates a slope of b₁ = 1.40 with standard error s(b₁) = 0.45. Test at α = 0.05 whether the slope is significantly different from zero. Use t(0.025, 20) ≈ 2.086.

1 markHypotheses: H₀: β₁ = 0 versus H₁: β₁ ≠ 0 (the variable X has no linear effect on Y under H₀).
1 markDegrees of freedom: n − 2 = 22 − 2 = 20.
2 marksTest statistic: T = (b₁ − 0)/s(b₁) = 1.40/0.45 = 3.1111.
1 markCritical value: reject if |T| > t(0.025, 20) ≈ 2.086.
1 markDecision and conclusion: 3.1111 > 2.086, so reject H₀ — the slope is significantly different from zero, so X is a significant linear predictor of Y at the 5% level.

T = 1.40/0.45 = 3.1111 on 20 df exceeds the critical 2.086, so reject H₀: the slope is significantly different from zero and X is a significant predictor of Y at 5%.

Sia tip — The slope t-test uses n−2 degrees of freedom (two parameters estimated, b₀ and b₁). A significant slope and a small p-value in Excel's regression output say the same thing — quote whichever the question provides.

Glossary

Key terms

L.I.N.E. assumptions: The four conditions for valid simple regression inference: Linearity, Independence of errors, Normality of errors, and Equal variance (homoscedasticity).
Homoscedasticity: The assumption that the error variance is constant across all values of X; its violation (a fan-shaped residual plot) is heteroscedasticity.
Durbin–Watson statistic: A diagnostic for autocorrelation of residuals lying in [0, 4]: near 2 indicates no autocorrelation, below 2 positive autocorrelation, above 2 negative.
Prediction interval: An interval for an individual future Y at a given X, wider than the confidence interval for the mean Y because it adds the irreducible scatter of single observations.
Influential outlier: An observation with an extreme X or large residual that disproportionately changes the fitted line; flagged in residual analysis as a pitfall.

FAQ

Regression Diagnostics & Inference FAQ

What is the difference between a confidence interval and a prediction interval here?

The confidence interval estimates the mean of Y at a given X; the prediction interval estimates a single new Y at that X. The prediction interval is always wider because it includes the variability of an individual observation (the “1 +” term under the root).

How do I read the Durbin–Watson statistic?

It ranges from 0 to 4. A value near 2 means no autocorrelation; below 2 indicates positive autocorrelation (common in time-series); above 2 indicates negative. Between the dL and dU table bounds the test is inconclusive.

Does a significant slope prove X causes Y?

No. Regression establishes a statistical association, not causation. A significant slope can arise from confounding, reverse causation or coincidence; causal claims need experimental design or strong theory, which is a key pitfall the module emphasises.

Study strategy

Exam move

Memorise L.I.N.E. as a checklist and pair each letter with the residual plot that diagnoses it (curvature → linearity, runs/patterns → independence, QQ-plot → normality, funnel → equal variance). For inference, note that the slope t-test uses n−2 df, and practise reading Excel regression output directly, since the exam often hands you the coefficient table and asks you to test and interpret rather than compute from raw data.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 203 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your QBUS5001 tutor, unlimited, worked the way the exam marks it

The full 8-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works