University of Sydney · S1 2026 · FACULTY OF BUSINESS & ECONOMICS

ECMT1010 · Introduction To Economic Statistics

- one subject, every graph, every model, every mark
50% final exam · hurdle14 Chapters7-page Bible
Our own words - no uploaded lecturer files
Built to mirror S1 2026 · updated this semester
Chapter 8 of 11 · ECMT1010

Simple Linear Regression

Week 10 models a straight-line relationship between two quantitative variables: the least-squares line ŷ = b₀ + b₁x, with slope b₁ = r·(sy/sx), the meaning of the slope and intercept, residuals, inference for the slope (t with df = n − 2), and goodness of fit via R² = r² and the ANOVA decomposition. It is examined as short-answer: interpret the slope, predict (and avoid extrapolation), test H₀: β₁ = 0, and report R² in words.

In this chapter

What this chapter covers

  • 011. The least-squares line ŷ = b₀ + b₁x, fitted to minimise Σ(yᵢ − ŷᵢ)²
  • 022. Slope b₁ = r·(sy/sx) and intercept b₀ = ȳ − b₁·x̄
  • 033. Interpreting the slope (change in ŷ per 1-unit x) and intercept (ŷ at x = 0)
  • 044. Residuals eᵢ = yᵢ − ŷᵢ and the extrapolation warning (only predict within the x range)
  • 055. Inference for the slope: t = b₁/SE(b₁), df = n − 2, and a CI for β₁
  • 066. The correlation test t = r√(n − 2)/√(1 − r²) on df = n − 2
  • 077. R² = r² (simple regression): the share of variation in y explained by x
  • 088. The ANOVA decomposition SST = SSModel + SSE and the standard error of the regression sₑ
Worked example · free

Regression slope, prediction, fit and a slope test

Q [8 marks]. A regression of weekly revenue (y, $000) on advertising spend (x, $000) for n = 22 cafés gives ŷ = 9.6 + 1.40x, with correlation r = 0.72. (a) Interpret the slope. (b) Predict revenue when ad spend is $5,000. (c) What proportion of the variation in revenue is explained by ad spend? (d) The slope's standard error is SE(b₁) = 0.38 — test H₀: β₁ = 0 at the 5% level.
  • 2 marks(a) Interpret the slope: each extra $1,000 of advertising spend is associated with $1,400 more weekly revenue (slope 1.40, both variables measured in $000).
  • 2 marks(b) Predict at x = 5: ŷ = 9.6 + 1.40(5) = 9.6 + 7.0 = 16.6 → about $16,600 weekly revenue.
  • 2 marks(c) Goodness of fit: R² = r² = 0.72² = 0.518, so about 52% of the variation in revenue is explained by advertising spend.
  • 2 marks(d) Test the slope: t = b₁/SE(b₁) = 1.40/0.38 ≈ 3.68 on df = n − 2 = 20; the two-sided 5% critical value t(20) ≈ 2.086. Since 3.68 > 2.086, reject H₀ — there is strong evidence revenue is positively related to ad spend.
(a) +$1,400 revenue per extra $1,000 of ad spend; (b) ŷ ≈ $16,600 at $5,000 spend; (c) R² = 0.518, ~52% of variation explained; (d) t ≈ 3.68 > 2.086, so reject H₀: a significant positive relationship.
Sia tip — State the slope in real units with the direction, not just the number. Use R² = r² only in simple regression, and report it as 'X% of the variation in y is explained by x'. Predicting outside the range of x in the data is extrapolation — flag it rather than trusting the number.
Glossary

Key terms

Least-squares line
The fitted line ŷ = b₀ + b₁x chosen to minimise the sum of squared residuals Σ(yᵢ − ŷᵢ)². It is the best straight-line summary of how y depends on x.
Slope (b₁)
The predicted change in y per one-unit increase in x, computed as b₁ = r·(sy/sx). Its sign matches the sign of the correlation, and inference about it tests whether x and y are linearly related.
Residual
The vertical gap between an observed and a predicted value, eᵢ = yᵢ − ŷᵢ. Residual plots are used to check the model assumptions of linearity and constant spread.
R² (coefficient of determination)
The proportion of the variation in y explained by the regression, between 0 and 1. In simple linear regression R² = r², so an r of 0.72 gives R² ≈ 0.52, meaning about 52% of the variation in y is explained by x.
Slope inference
Testing H₀: β₁ = 0 with t = b₁/SE(b₁) on df = n − 2, or building a CI b₁ ± t*(df, α/2)·SE(b₁). Rejecting H₀ is evidence of a real linear relationship.
ANOVA decomposition
The split of total variation SST = SSModel + SSE into the part explained by the model and the leftover error. R² = SSModel/SST, and the standard error of the regression is sₑ = √(SSE/(n − 2)).
FAQ

Simple Linear Regression FAQ

How do I interpret the slope and intercept correctly?

The slope is the predicted change in y for a one-unit increase in x, stated in the real units of both variables and with the right sign — e.g. '+$1,400 of revenue per extra $1,000 of ad spend'. The intercept is the predicted y when x = 0; it is only meaningful if x = 0 is sensible and within the data range, otherwise it is just where the line crosses the axis (often an extrapolation) and should not be over-interpreted.

What does R² actually mean and why does R² = r²?

R² is the proportion of the variation in y that the regression explains, ranging from 0 (no explanatory power) to 1 (a perfect fit). You report it as 'X% of the variation in y is explained by x'. In simple linear regression — one predictor — it equals the square of the correlation, R² = r², because the single predictor's linear association with y is exactly what r measures. With more predictors (a later unit) this identity no longer holds.

Why is the slope test on n − 2 degrees of freedom?

Because fitting a line estimates two parameters from the data — the intercept b₀ and the slope b₁ — so two degrees of freedom are 'used up', leaving n − 2 for the error. That is the df you read the t table at when testing H₀: β₁ = 0 with t = b₁/SE(b₁), and also the divisor in the standard error of the regression sₑ = √(SSE/(n − 2)). State the df explicitly so the marker can check the table line.

What is extrapolation and why is it a problem?

Extrapolation is using the fitted line to predict y for an x value outside the range observed in the data. It is risky because you have no evidence the linear relationship continues beyond that range — the true relationship could bend, flatten or reverse. The line is only validated where there are data, so predictions inside the x range are reasonable, but a prediction far outside should be flagged as unreliable rather than reported as fact.

Study strategy

Exam move

Treat a regression question as a checklist of standard sub-tasks the exam strings together: interpret the slope (units + direction), predict a value (and check it is not extrapolation), report R² in words, and test the slope with t = b₁/SE(b₁) on n − 2 df. Practise writing slope interpretations as full sentences in the variables' real units, because a bare number rarely earns the mark. Memorise that R² = r² only in simple regression and that fit means 'share of variation in y explained'. Keep the ANOVA picture in mind — SST splits into explained (SSModel) and leftover (SSE) — so you can connect R², sₑ and the slope test. Always flag extrapolation and finish a slope test with a strength-of-evidence sentence about the relationship.

A+Everything unlocked
Unlocks this Bible + all 191 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.
Sia - your ECMT1010 tutor, unlimited, worked the way the exam marks it
The full 7-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full ECMT1010 Bible + 191 University of Sydney subjects解锁完整 ECMT1010 Bible + University of Sydney 191 门科目
$25/mo