QBUS5001 · Foundation In Data Analytics For Business
Simple Linear Regression
Module 10 fits a straight line to two variables: the least-squares estimates b₁ = Σ(xᵢ−x̄)(yᵢ−ȳ)/Σ(xᵢ−x̄)² and b₀ = ȳ − b₁x̄, giving the fitted line Ŷ = b₀ + b₁X. You interpret the slope (expected change in Y per unit of X) and the intercept, decompose variation as SST = SSR + SSE, and report model fit with R² and the standard error of the regression (SER).
This module is where the descriptive covariance of Module 1 becomes a predictive tool, and it sets up the diagnostics and inference that follow in Module 11.
What this chapter covers
- 01Population model E(Y|X) = β₀ + β₁X and the error term
- 02Least-squares slope b₁ and intercept b₀
- 03Fitted values Ŷ and residuals e = Y − Ŷ
- 04Interpreting the slope and intercept in business terms
- 05Sums of squares: SST = SSR + SSE
- 06R² = SSR/SST as the proportion of variation explained
- 07Standard error of the regression (SER)
- 08Prediction and the danger of extrapolation
Estimating a simple linear regression line
- 1 markCompute the means: x̄ = (2+3+5+6+9)/5 = 25/5 = 5; ȳ = (6+7+10+11+16)/5 = 50/5 = 10.
- 1 markCompute Σ(x−x̄)(y−ȳ): (−3)(−4) + (−2)(−3) + (0)(0) + (1)(1) + (4)(6) = 12 + 6 + 0 + 1 + 24 = 43.
- 1 markCompute Σ(x−x̄)²: 9 + 4 + 0 + 1 + 16 = 30.
- 1 markSlope: b₁ = 43/30 = 1.4333.
- 1 markIntercept: b₀ = ȳ − b₁x̄ = 10 − 1.4333×5 = 10 − 7.1667 = 2.8333. Line: Ŷ = 2.8333 + 1.4333X.
- 1 markInterpret: each additional $100 of advertising spend (one unit of X) is associated with about $1,433 more in weekly sales (since Y is in $000s).
Key terms
- Least-squares estimates
- The slope b₁ and intercept b₀ that minimise the sum of squared residuals; b₁ = Σ(x−x̄)(y−ȳ)/Σ(x−x̄)² and b₀ = ȳ − b₁x̄.
- Residual (e)
- The vertical gap between an observed value and the fitted line, eᵢ = yᵢ − ŷᵢ; least squares minimises the sum of their squares.
- SST, SSR, SSE
- Total (SST), explained/regression (SSR) and unexplained/error (SSE) sums of squares; they satisfy SST = SSR + SSE.
- Coefficient of determination (R²)
- R² = SSR/SST, the fraction of the variation in Y explained by the model; it ranges from 0 to 1 in simple regression.
- Standard error of the regression (SER)
- SER = √(SSE/(n−k−1)), the typical size of a residual; smaller values indicate tighter fit around the line.
Simple Linear Regression FAQ
How do I interpret the intercept b₀?
It is the expected value of Y when X = 0. Often this is outside the range of the data (e.g. zero advertising), so treat it as a mathematical anchor for the line rather than a meaningful business prediction unless X = 0 is realistic.
What does R² tell me and what does it not?
R² is the proportion of variation in Y the model explains. It does not tell you whether the relationship is causal, whether the model assumptions hold, or whether predictions outside the data range are safe.
Why is extrapolation risky?
The estimated line is only supported by the range of X observed. Predicting Y for X values far outside that range assumes the linear relationship continues, which the data cannot justify and which often fails in practice.
Exam move
Practise the slope-intercept calculation by hand on a five- or six-point dataset until the table method is second nature, then verify with Excel's Data Analysis → Regression so you can read the same output the exam shows. Always finish a regression answer with a slope interpretation in the variables' actual units — that sentence is reliably worth a mark and is where rushed answers fall short.