ECON1012 · Data Analytics
Comparing Two Populations
Comparing Two Populations (Module 8, Week 8) extends ECON 1012's one-sample toolkit to the question analysts actually ask: is the mean of one group different from another? The parameter is the difference μ₁ − μ₂, estimated by X̄₁ − X̄₂ — a single random variable with mean μ₁ − μ₂ and variance σ₁²/n₁ + σ₂²/n₂. Everything then splits into three cases: population variances known (a Z statistic, rarely usable in practice), unknown but assumed equal (the pooled-variance t-test with df = n₁ + n₂ − 2), and unknown and unequal (the unequal-variances t-test with its own df formula). For each case you build confidence intervals for μ₁ − μ₂ and run the same six-step hypothesis test and p-value rules learned in Week 7. The module closes with a caution: an observed difference between groups is a comparison, not by itself proof of causation.
What this chapter covers
- 01Parameter of interest: μ₁ − μ₂, estimated by X̄₁ − X̄₂ — an unbiased, consistent estimator that is itself a single random variable
- 02E(X̄₁ − X̄₂) = μ₁ − μ₂ and V(X̄₁ − X̄₂) = σ₁²/n₁ + σ₂²/n₂ — variances ADD, even for a difference
- 03Case 1 — σ₁², σ₂² known: Z = [(X̄₁ − X̄₂) − (μ₁ − μ₂)] / √(σ₁²/n₁ + σ₂²/n₂)
- 04Case 2 — unknown but assumed equal: pooled sₚ² = [(n₁−1)s₁² + (n₂−1)s₂²]/(n₁ + n₂ − 2), t with df = n₁ + n₂ − 2
- 05Case 3 — unknown and assumed different: t = [(X̄₁ − X̄₂) − (μ₁ − μ₂)] / √(s₁²/n₁ + s₂²/n₂) with the unequal-variances df formula
- 06Confidence intervals: (x̄₁ − x̄₂) ± (critical value) × (standard error), swapping in the case's SE
- 07Hypothesis tests: H₀: μ₁ − μ₂ = 0 vs <, ≠ or > — same six steps and p-value rules as Week 7
- 08When variances can be treated as equal, prefer the pooled t: its df is at least as large as the unequal-variances df
Pooled-variance t-test and CI for two branch means
- 2 marks(a) Step 1 — hypotheses: let μ₁ be Northside's mean daily sales and μ₂ Southside's. H₀: μ₁ − μ₂ = 0 versus H₁: μ₁ − μ₂ > 0 — the worded claim 'sells more' makes this a right-tail test.
- 1 mark(a) Step 2 — test statistic: variances are unknown but assumed equal, so use the pooled-variance t statistic with df = n₁ + n₂ − 2 = 12 + 12 − 2 = 22.
- 1 mark(a) Steps 3 & 4 — significance level and decision rule: α = 0.05; reject H₀ if t₀ > t₀.₀₅,₂₂ = 1.717 (from the t table).
- 2 marks(a) Step 5 — pooled variance: sₚ² = [(12−1)(20) + (12−1)(28)] / 22 = (220 + 308)/22 = 528/22 = 24.
- 2 marks(a) Step 5 (cont.) — standard error and statistic: SE = √(sₚ²(1/n₁ + 1/n₂)) = √(24 × (1/12 + 1/12)) = √4 = 2, so t₀ = (87 − 82 − 0)/2 = 2.50.
- 2 marks(a) Step 6 — conclusion: since t₀ = 2.50 > 1.717, reject H₀ in favour of H₁. The one-tail p-value lies between 0.01 and 0.025 (t₀.₀₂₅,₂₂ = 2.074 and t₀.₀₁,₂₂ = 2.508), below α. There is sufficient evidence to infer that Northside's mean daily sales are higher, at a significance level of 5%.
- 2 marks(b) 95% CI for μ₁ − μ₂: (x̄₁ − x̄₂) ± t₀.₀₂₅,₂₂ × SE = 5 ± 2.074 × 2 = 5 ± 4.148 → (0.85, 9.15) cups per day.
Key terms
- Difference between two means (μ₁ − μ₂)
- The parameter of interest when comparing two populations, estimated by X̄₁ − X̄₂ — an unbiased and consistent estimator that behaves as a single random variable with mean μ₁ − μ₂ and variance σ₁²/n₁ + σ₂²/n₂.
- Independent samples
- Two random samples drawn separately from two populations, so no observation in one sample is linked to any observation in the other; this is the setting for all of Module 8's intervals and tests.
- Pooled variance sₚ²
- The combined variance estimate sₚ² = [(n₁−1)s₁² + (n₂−1)s₂²]/(n₁ + n₂ − 2), used when the two unknown population variances are assumed equal; each sample's variance is weighted by its degrees of freedom nᵢ − 1.
- Equal-variances (pooled) t-test
- The two-sample test t = [(X̄₁ − X̄₂) − (μ₁ − μ₂)] / √(sₚ²(1/n₁ + 1/n₂)) with df = n₁ + n₂ − 2; preferred whenever there is sufficient evidence that the variances are equal, because its degrees of freedom are at least as large as the unequal-variances alternative.
- Unequal-variances t-test
- The two-sample test t = [(X̄₁ − X̄₂) − (μ₁ − μ₂)] / √(s₁²/n₁ + s₂²/n₂), used when the unknown variances are assumed different; its degrees of freedom come from a separate formula using s₁²/n₁ and s₂²/n₂ and are usually fractional.
- Standard error of X̄₁ − X̄₂
- The square root of σ₁²/n₁ + σ₂²/n₂ (or its sample-based version); because the samples are independent, the variances of the two sample means ADD — they are never subtracted, even though the estimator is a difference.
Comparing Two Populations FAQ
How do I choose between Z, the pooled t, and the unequal-variances t in ECON 1012?
Ask one question: what do you know about the population variances? Both known → Z (the course notes this case is hardly used in practice, since variances are rarely known). Unknown but assumed equal → the pooled-variance t with df = n₁ + n₂ − 2. Unknown and assumed different → the unequal-variances t with its own df formula. The course's rule of thumb: whenever there is sufficient evidence that the variances are equal, prefer the pooled test, because its degrees of freedom are at least as large.
Is a two-population test likely on the ECON 1012 final exam?
The final exam (50%, 180 minutes, invigilated, covering Weeks 1-10) has 25 MCQs plus 3 case-study questions, and the official FAQ says the case studies resemble workshop examples. A six-step two-sample t-test with Type I/II error sub-questions is one of the classic case-study shapes for this course, so practise writing the full six steps by hand — the exam is hand-calculation with Z and t tables provided, and you may bring one double-sided A4 note sheet.
What degrees of freedom do I use for a two-sample t-test?
Pooled (equal-variances) test: df = n₁ + n₂ − 2 — a common slip is writing n₁ + n₂ − 1. Unequal-variances test: df = (s₁²/n₁ + s₂²/n₂)² / [(s₁²/n₁)²/(n₁−1) + (s₂²/n₂)²/(n₂−1)], which usually comes out fractional. The course materials do not state a rounding convention for the fractional case, so use a nearby row of the provided t table (rounding down is the conservative habit) and check the current unit outline or myLearning if in doubt.
If my two-sample test is significant, does that prove one group causes the difference?
No. A significant test says the observed gap is unlikely to be pure sampling noise — it says nothing about WHY the gap exists. If people select themselves into groups (customers who choose a product, workers who choose a shift), the groups can differ for many reasons besides the treatment you care about. Week 8's takeaway is exactly this: comparing two populations is not the same as identifying a causal effect.
Studying with AI? Sia — free AI economics tutor works through ECON 1012 step by step.
Exam move
Train the case decision until it is reflexive: σ² known → Z; unknown but assumed equal → pooled t with df = n₁ + n₂ − 2; unknown and unequal → the unequal-variances t. The marks live in the details: weight each sample variance by nᵢ − 1 (not nᵢ) inside sₚ², keep variances — not standard deviations — in every formula, and remember variances add in the standard error even though the estimator is a difference. Read the tail from the wording: 'faster' or 'more' is one-tailed, 'differ' is two-tailed. Write all six steps every time and conclude with 'reject / do not reject H₀' plus a plain-English sentence — never 'accept H₀'. Pair each test with Type I/II reasoning: exam case studies bolt those sub-questions on.