ECON20003 · Quantitative Methods 2
Correlation & Simple Linear Regression
Correlation & Simple Linear Regression moves from comparing groups to modelling a relationship between two numerical variables. You measure linear association with covariance and the correlation coefficient r, then fit the ordinary-least-squares line ŷ = β̂₀ + β̂₁x. Inference centres on the t-test of the slope (is there a real linear relationship?), the coefficient of determination R² (how much variation the line explains, equal to r² in simple regression), and the distinction between a confidence interval for the mean response and a wider prediction interval for an individual value.
What this chapter covers
- 01Covariance and correlation r ∈ [−1, 1]; test ρ = 0 with t = r√(n−2)/√(1−r²)
- 02OLS estimates: β̂₁ = S_xy/S²_x, β̂₀ = ȳ − β̂₁x̄
- 03Gauss-Markov assumptions for valid inference
- 04Slope t-test: t = β̂₁/SE(β̂₁), df = n − 2
- 05R² = 1 − SSE/SST = r² in simple regression
- 06Confidence interval for the mean response vs the wider prediction interval
t-test on a regression slope and R²
- 1 markState the hypotheses for the slope (two-tailed): H₀: β₁ = 0 versus H₁: β₁ ≠ 0.
- 1 markIdentify the distribution: t with df = n − 2 = 18.
- 2 marksCompute the test statistic: t = β̂₁/SE(β̂₁) = 0.45/0.15 = 3.0.
- 1 markDecision rule: the two-tailed critical value is t₀.₀₂₅,₁₈ = 2.101. Since 3.0 > 2.101, reject H₀.
- 2 marksCoefficient of determination: in simple regression R² = r² = 0.60² = 0.36.
- 1 markConclude in context: there is significant evidence of a positive linear relationship; advertising spend explains about 36% of the variation in weekly sales.
Key terms
- Correlation coefficient r
- A unit-free measure of linear association between two variables, ranging from −1 to +1. It captures only linear strength and direction — a curved relationship can have r near 0 yet be strongly related.
- OLS slope β̂₁
- The least-squares estimate of how much ŷ changes per one-unit increase in x: β̂₁ = S_xy/S²_x. It minimises the sum of squared residuals between the observed and fitted values.
- Coefficient of determination R²
- The proportion of variation in Y explained by the model, R² = 1 − SSE/SST, between 0 and 1. In simple linear regression R² equals r².
- Confidence vs prediction interval
- A confidence interval estimates the MEAN response at a given x; a prediction interval estimates an INDIVIDUAL future value and is wider because it adds the variability of a single observation (a +1 inside the root).
Correlation & Simple Linear Regression FAQ
Why is a prediction interval wider than a confidence interval at the same x?
A confidence interval captures uncertainty about the average response at that x; a prediction interval must also capture the scatter of a single new observation around that average, so it adds an extra variance term (a +1 inside the standard-error root) and is always wider.
Does a significant slope prove advertising causes sales?
No. Regression establishes association, not causation. A significant slope means a linear relationship is statistically detectable, but omitted variables, reverse causality or confounding can drive it — a point the specification chapter develops further.
Exam move
Practise reading the slope's Estimate, Std. Error, t value and Pr(>|t|) straight off an R lm summary, and convert the slope into a one-sentence interpretation in the units of the problem. Keep df = n − 2 and remember R² = r² as quick self-checks.