ADELAIDE · S1 2026 · FACULTY OF BUSINESS & ECONOMICS

ECON2515 · Intermediate Applied Econometrics Ii

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters8-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 2 of 10 · ECON 2515

Simple Linear Regression and the OLS Estimator

Week 2 is the hinge of ECON 2515: it moves from describing data to modelling a relationship with the population line E[y|x] = β₁ + β₂x + u, estimated from one sample by ordinary least squares (OLS) — the slope and intercept that minimise the sum of squared residuals Σû². The examined skills are separating the population line (β, u) from the estimated one (β̂, û), computing β̂₂ and β̂₁ by hand, listing the SLR.1–SLR.5 assumptions, and explaining unbiasedness and the standard error of the slope. The exam rewards interpretation and judgement over formula recall, since a formula sheet and statistical tables are provided.

In this chapter

What this chapter covers

011. Population regression function — E[y|x] = β₁ + β₂x + u; β are fixed unknown parameters, u holds all other factors
022. Linear in the parameters — β₁ + β₂ln(x) is fine; β₂²x is not (you may transform x, not the β's)
033. Sample regression function — ŷ = β̂₁ + β̂₂x; β̂ are statistics that vary sample to sample, û = y − ŷ
044. Error u vs residual û — u is the gap to the true line (unseen); û is the gap to the fitted line (observed)
055. Least-squares criterion — choose β̂₁, β̂₂ to minimise Σû² = Σ(y − ŷ)²; the line runs through (x̄, ȳ)
066. OLS slope & intercept — β̂₂ = Σ(x−x̄)(y−ȳ) / Σ(x−x̄)², β̂₁ = ȳ − β̂₂x̄; interpret in the variables' units
077. Assumptions SLR.1–SLR.5 — linearity, random sampling, variation in x, zero conditional mean E[u|x]=0, homoskedasticity
088. Unbiasedness (SLR.1–4) & variance (needs SLR.5) — Var(β̂₂) = σ² / Σ(x−x̄)², se = √(σ̂²/Σ(x−x̄)²), σ̂² = Σû²/(n−k)

Worked example · free

Hand-compute the OLS line and read the residual

Q [8 marks]. A bakery records morning foot-traffic x (hundreds of passers-by) and loaves sold y over 5 days: (x, y) = (2, 18), (3, 21), (4, 30), (6, 39), (5, 27). (a) Find β̂₂ and β̂₁. (b) Interpret both. (c) Give the fitted value and residual for the (4, 30) day.

+2Means: x̄ = (2+3+4+6+5)/5 = 4; ȳ = (18+21+30+39+27)/5 = 27.
+2Deviation table (x−x̄, y−ȳ): (−2,−9), (−1,−6), (0,3), (2,12), (1,0). Products (x−x̄)(y−ȳ): 18, 6, 0, 24, 0 → Σ = 48. Squares (x−x̄)²: 4, 1, 0, 4, 1 → Σ = 10.
+2(a) β̂₂ = 48 ÷ 10 = 4.8; β̂₁ = ȳ − β̂₂x̄ = 27 − 4.8·4 = 7.8. So ŷ = 7.8 + 4.8x. Check the centroid: 7.8 + 4.8·4 = 27 = ȳ ✓.
+1(b) β̂₂ = 4.8: each extra 100 passers-by is associated with about 4.8 more loaves sold, on average. β̂₁ = 7.8: predicted sales at zero foot-traffic — an anchor only (x = 0 is outside the data), not economically meaningful.
+1(c) At x = 4: ŷ = 7.8 + 4.8·4 = 27.0; residual û = y − ŷ = 30 − 27.0 = +3.0 (that day sold 3 loaves above the line).

ŷ = 7.8 + 4.8x; each extra 100 passers-by is associated with ≈ 4.8 more loaves. At x = 4 the fitted value is 27.0 and the residual is +3.0.

Sia tip — The whole calculation is one deviation table: slope = Σ(product) ÷ Σ(x-deviation²), then β̂₁ = ȳ − β̂₂x̄. Always confirm the line runs through (x̄, ȳ) — if β̂₁ + β̂₂x̄ ≠ ȳ, a mean or a deviation is wrong.

Glossary

Key terms

Population regression function (PRF): The true relationship in the whole population, y = β₁ + β₂x + u, with conditional mean E[y|x] = β₁ + β₂x. β₁ and β₂ are fixed, unknown parameters; the error u collects every other factor affecting y and is never observed.
Sample regression function (SRF): The line OLS fits to one sample, ŷ = β̂₁ + β̂₂x. The estimators β̂₁, β̂₂ are sample statistics — a different sample gives a different line — and û = y − ŷ is the residual.
Error u vs residual û: u is the population error: a point's vertical gap to the true (PRF) line, unobservable. û is the sample residual: the gap to the fitted (SRF) line, observable. They are not interchangeable, and OLS never sees u.
OLS / least-squares criterion: Ordinary least squares chooses β̂₁, β̂₂ to minimise the sum of squared residuals Σû² = Σ(y − ŷ)². Squaring treats over- and under-predictions symmetrically and yields closed-form estimators; the resulting line always passes through the centroid (x̄, ȳ).
OLS slope and intercept: β̂₂ = Σ(xᵢ−x̄)(yᵢ−ȳ) / Σ(xᵢ−x̄)² (the covariance of x and y over the variance of x) and β̂₁ = ȳ − β̂₂x̄. The slope is the ceteris-paribus change in mean y per one-unit rise in x; the intercept is often just a mathematical anchor.
SLR.1–SLR.5 assumptions: SLR.1 linear in parameters; SLR.2 random sampling; SLR.3 variation in x (needed to estimate a slope); SLR.4 zero conditional mean E[u|x]=0 (gives unbiasedness and a causal reading); SLR.5 homoskedasticity Var(u|x)=σ² (validates the usual standard errors).
Unbiasedness: Under SLR.1–SLR.4, E[β̂] = β: averaged over many samples the estimates centre on the truth. It is a property of the estimator across repeated samples, NOT a claim that any single estimate equals the true parameter.
Variance and standard error of β̂₂: Under SLR.1–SLR.5, Var(β̂₂) = σ² / Σ(xᵢ−x̄)² — it rises with error variance σ² and falls with spread in x and sample size. Since σ² is unknown it is estimated by σ̂² = Σû²/(n−k) (k = parameters), and se(β̂₂) = √(σ̂²/Σ(xᵢ−x̄)²) is the denominator of every slope t-test.

FAQ

Simple Linear Regression and the OLS Estimator FAQ

What is the difference between the PRF and the SRF?

The population regression function (PRF) is the true, unseen line y = β₁ + β₂x + u, whose coefficients β are fixed unknown constants. The sample regression function (SRF) ŷ = β̂₁ + β̂₂x is the line OLS estimates from one sample, whose coefficients β̂ are statistics that change from sample to sample. In one phrase: β is the truth, β̂ is the guess from data. This distinction shows up on almost every ECON 2515 paper, so state it in symbols and in words.

Is the residual û the same as the error u?

No, and mixing them up is a classic mark-loser. The error u is the vertical gap from a point to the TRUE population line, and it is unobservable. The residual û = y − ŷ is the gap to the FITTED sample line, and it is observable and sample-specific. OLS minimises Σû²; it never observes u. Also, 'OLS estimates u' is wrong — OLS estimates the parameters β (producing β̂).

How do I compute β̂₂ and β̂₁ by hand?

Build one deviation table. Find x̄ and ȳ, then the columns (x−x̄), (y−ȳ), their product, and (x−x̄)². The slope is β̂₂ = Σ(product) ÷ Σ(x−x̄)², and the intercept is β̂₁ = ȳ − β̂₂x̄. Finish with the free check: β̂₁ + β̂₂x̄ should equal ȳ exactly, because the OLS line always passes through the centroid (x̄, ȳ).

What does each SLR assumption actually buy me?

SLR.1–SLR.3 (linearity, random sampling, variation in x) let you fit a line at all — without variation in x the slope is undefined. SLR.4, zero conditional mean E[u|x]=0, is the one that makes β̂ unbiased and lets you read the slope causally; its failure (endogeneity) is the headline bias. SLR.5, homoskedasticity, makes the usual variance formula and standard errors valid. Unbiasedness needs SLR.1–4; the variance formula also needs SLR.5.

Does 'unbiased' mean my single estimate equals the true value?

No. Unbiasedness (E[β̂] = β) is a property of the estimator over MANY hypothetical samples: if you re-sampled endlessly and averaged the slope estimates they would centre on the truth. It says nothing about your one sample's number being 'right'. In the exam, phrase it as 'on average, across repeated samples' — that wording earns the mark.

When does a regression slope give a causal effect?

Only when SLR.4 (E[u|x]=0) holds. If an omitted variable — like ability in a wage-on-education regression — is correlated with x and also affects y, the estimate is biased and the slope is just an association. To sign the omitted-variable bias, sign two channels: (omitted → x) and (omitted → y); same sign gives upward bias, opposite signs give downward bias. Say 'associated with', not 'causes', until you can defend exogeneity.

Study strategy

Exam move

Drill three moves until they are automatic. (1) The vocabulary check: for any statement, decide whether it is about the population (β, u, PRF) or the sample (β̂, û, SRF) and use the matching symbol — this alone protects easy MCQ and short-answer marks. (2) The hand computation: from a small dataset, build the deviation table, get β̂₂ = Σ(x−x̄)(y−ȳ)/Σ(x−x̄)² and β̂₁ = ȳ − β̂₂x̄, then verify the centroid — this is the recurring Part-B calculation. (3) The assumption map: memorise which SLR condition does which job (SLR.3 estimability, SLR.4 unbiasedness/causality, SLR.5 valid standard errors) so you can answer 'which assumption is violated and what breaks?' — including the trap that heteroskedasticity leaves β̂ unbiased and only wrecks the SEs. Because the exam is application-focused with a formula sheet provided, spend your revision on interpreting and defending results, not on memorising formulae.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 244 of your ADELAIDE subjects - and 1,000+ Bibles across every Australian university.

Sia - your ECON2515 tutor, unlimited, worked the way the exam marks it

The full 8-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works