University of Adelaide · FACULTY OF BUSINESS & ECONOMICS

ECON1012 · Data Analytics

- one subject, every graph, every model, every mark
Business and Economics14 Chapters6-page Bible
Our own words - no uploaded lecturer files
Updated for this semester
Chapter 9 of 11 · ECON 1012

Correlation & the Regression Model

Correlation & the Regression Model (Module 9, Week 9) is where ECON 1012 starts studying the relationship between two numerical variables. You start with the scatter diagram and its pattern gallery, then measure association two ways: covariance s_xy, whose sign gives the direction but whose size is hard to judge, and the coefficient of correlation r = s_xy/(s_x·s_y), a unit-free number always between −1 and +1. The module then builds the simple linear regression model Yᵢ = β₀ + β₁Xᵢ + εᵢ — an explanatory X, a response Y and an error term — and introduces the OLS estimator, which chooses β̂₀ and β̂₁ to minimise the sum of squared residuals, giving a fitted line Ŷ = β̂₀ + β̂₁X to interpret and predict with. 'Correlation is not causation' is itself examinable wording; Week 10 carries the same model into inference and model fit.

In this chapter

What this chapter covers

  • 01Scatter diagrams: positive linear, negative linear, nonlinear, or no relationship
  • 02Sample covariance s_xy = Σ(xᵢ − x̄)(yᵢ − ȳ)/(n − 1) — sign gives direction, but size is 'all relative'
  • 03Correlation r = s_xy/(s_x·s_y): unit-free, always between −1 and +1
  • 04Correlation ≠ causation — a huge r never proves X drives Y
  • 05Roles: explanatory X (independent) on the x-axis, response Y (dependent) on the y-axis
  • 06Population model Yᵢ = β₀ + β₁Xᵢ + εᵢ: slope β₁ = change in Y associated with a unit change in X
  • 07OLS picks β̂₀ and β̂₁ to minimise the sum of squared residuals Σûᵢ²
  • 08Prediction: plug X into Ŷ = β̂₀ + β̂₁X; residual ûᵢ = Yᵢ − Ŷᵢ is not the error term εᵢ
Worked example · free

Covariance, correlation and a first fitted line

Q [10 marks]. A suburban gym trials extra group classes. Over n = 5 weeks it records classes run (x) and casual visit passes sold (y): (2, 38), (4, 33), (6, 45), (8, 44), (10, 55). Summary sums: Σx = 30, Σy = 215, Σxy = 1380, Σx² = 220, Σy² = 9519. (a) Compute the sample covariance. (b) Compute the coefficient of correlation and interpret its direction and strength. (c) For a regression, which variable should be X and which Y? (d) An Excel regression on these data returns Ŷ = 29.5 + 2.25X. Interpret the slope, then predict passes sold in a week with 7 classes.
  • 2 marks(a) Shortcut formula: s_xy = [Σxy − (Σx)(Σy)/n]/(n − 1) = [1380 − (30 × 215)/5]/4 = (1380 − 1290)/4 = 90/4 = 22.5.
  • 2 marks(b) Standard deviations first: s_x² = [Σx² − (Σx)²/n]/(n − 1) = (220 − 180)/4 = 10, so s_x = √10 ≈ 3.162; s_y² = (9519 − 9245)/4 = 68.5, so s_y = √68.5 ≈ 8.276.
  • 2 marks(b) r = s_xy/(s_x·s_y) = 22.5/(3.162 × 8.276) = 22.5/26.17 ≈ 0.86 — a strong positive linear relationship: weeks with more classes tend to sell more passes.
  • 1 mark(c) Classes run is what the gym changes — the driver — so X = classes; passes sold is the outcome, so Y = passes. Assign roles from the story, never from which column is listed first.
  • 2 marks(d) Slope: each additional class per week is, on average, associated with about 2.25 more casual passes sold — state size and direction, and say 'associated with', not causal wording.
  • 1 mark(d) Prediction at X = 7: Ŷ = 29.5 + 2.25 × 7 = 29.5 + 15.75 = 45.25, i.e. about 45 passes.
(a) s_xy = 22.5; (b) r ≈ 0.86, a strong positive linear relationship; (c) X = classes run, Y = passes sold; (d) each extra class is associated with about 2.25 more passes on average, and the predicted sales for a 7-class week are Ŷ = 45.25 (about 45 passes).
Sia tip — Covariance answers only 'which direction?' — its size depends on the units, so never call 22.5 'strong'; strength comes from r. And the safe slope sentence is 'a one-unit increase in X is, on average, associated with a β̂₁ change in Y' — causal wording loses marks.
Glossary

Key terms

Covariance
A measure of how two variables move together: for a sample, s_xy = Σ(xᵢ − x̄)(yᵢ − ȳ)/(n − 1). Positive means same-direction movement, negative means opposite — but its magnitude depends on the units, so it cannot grade strength.
Coefficient of correlation
The covariance divided by the standard deviations of the variables: sample r = s_xy/(s_x·s_y), population ρ = σ_xy/(σ_x·σ_y). It always lies between −1 and +1; ±1 is a perfect linear relationship and 0 is no linear relationship.
Explanatory and response variables
The explanatory (independent) variable X is the driver, plotted on the x-axis; the response (dependent) variable Y is the outcome, plotted on the y-axis. Deciding the roles from the question's story is itself an examinable step.
Simple linear regression model
The population relationship Yᵢ = β₀ + β₁Xᵢ + εᵢ: the line β₀ + β₁X holds on average over the population, β₁ is the change in Y associated with a unit change in X, β₀ is the value of Y when X = 0, and the error term ε captures everything else. ECON 1012 uses one X variable only.
OLS estimator
Ordinary least squares: the rule that chooses β̂₀ and β̂₁ so the fitted regression line is as close as possible to the observed data, by minimising the sum of squared residuals Σûᵢ².
Residual
The gap between an observed value and the fitted line, ûᵢ = Yᵢ − Ŷᵢ. It estimates — but is not the same thing as — the unobservable error term εᵢ = Yᵢ − β₀ − β₁Xᵢ, which is measured from the population line.
FAQ

Correlation & the Regression Model FAQ

What is the difference between covariance and correlation in ECON 1012?

Both measure how two variables move together. Covariance gives the direction — positive, negative, or roughly zero — but its magnitude depends on the units, so judging whether a particular covariance is large is, as the module puts it, 'all relative'. Correlation divides the covariance by both standard deviations, producing a unit-free number between −1 and +1 that grades strength as well as direction. MCQs regularly contrast exactly these two facts.

Does a strong correlation prove that X causes Y?

No. 'Correlation is not causation' is taught explicitly, using spurious pairs with r above 0.95 between absurd, unrelated variables. A high r means a strong linear association, which can come from coincidence or a third variable driving both. When interpreting a slope, write 'associated with' rather than causal language — that wording is exactly what markers look for.

Do I have to compute the OLS coefficients by hand in Week 9?

Week 9 introduces the model and what OLS does — choose the line that minimises the sum of squared residuals — and the workshop's Excel activity computes the covariance and correlation coefficient. The estimation mechanics, slope significance test and R² arrive in Week 10. The final exam is hand-calculation with a calculator and provided tables, so practise the shortcut sums for s_xy and r now; check myLearning for exactly what the Module 9 quiz covers.

How do I decide which variable is X and which is Y?

X is the explanatory (independent) variable — the one you change or observe as the driver — and Y is the response (dependent) variable, the outcome you want to explain or predict. Read the question's story: 'the effect of advertising on sales' makes advertising X and sales Y, regardless of which column the table lists first.

Studying with AI? Sia — free AI economics tutor works through ECON 1012 step by step.

Study strategy

Exam move

Three r facts are near-guaranteed MCQ material: r lives in [−1, +1]; points exactly on a straight line mean r = +1 or −1, sign set by the slope; and r = 0.80 squares to r² = 0.64, read as '64% of the variation in Y is explained by X' — watch percent-versus-proportion distractors. In case-study answers, interpret a slope with size, direction and 'associated with'; causal wording is a marked error, as is calling a covariance 'strong' when its size is relative to the units. Keep the residual ûᵢ = Yᵢ − Ŷᵢ distinct from the error term εᵢ. Put the shortcut formulas for s_xy, s_x², s_y² and r on your A4 note sheet, and re-attempt the randomised Module 9 quiz on myLearning until they run without thinking.

A+Everything unlocked
Unlocks this Bible + your other University of Adelaide subjects - and 1,000+ Bibles across every Australian university.
Sia - your ECON1012 tutor, unlimited, worked the way the exam marks it
The full 6-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full ECON1012 Bible + your other University of Adelaide subjects解锁完整 ECON1012 Bible + University of Adelaide 全部科目
$25/mo