ECMT1010 · Introduction To Economic Statistics
Inference for Proportions
Weeks 8–9 apply the normal-formula machinery to categorical data: the SE for a proportion √(p(1−p)/n), the CI for one proportion p̂ ± z*·SE, the z-test for one proportion, and the two-proportion test that uses a pooled p̂ under H₀: p₁ = p₂. It is examined as short-answer 'set up, substitute, conclude in context' — the recurring trap is using p̂ in the SE for a test when you must use the null value p₀ (one sample) or the pooled p̂ (two samples).
What this chapter covers
- 011. The SE for a proportion (CLT formula): SE = √(p(1 − p)/n)
- 022. CI for one proportion: p̂ ± z*·√(p̂(1 − p̂)/n), valid when np̂ ≥ 10 and n(1 − p̂) ≥ 10
- 033. HT for one proportion: z = (p̂ − p₀)/√(p₀(1 − p₀)/n) — use the NULL value p₀ in the SE
- 044. Why the CI uses p̂ but the test uses p₀ (the SE is computed under the assumed truth)
- 055. Difference in two proportions: SE = √(p̂₁(1 − p̂₁)/n₁ + p̂₂(1 − p̂₂)/n₂) for a CI
- 066. The pooled proportion p̂ = (count₁ + count₂)/(n₁ + n₂) for the two-proportion TEST
- 077. Two-proportion z-test: z = (p̂₁ − p̂₂)/√(p̂(1 − p̂)(1/n₁ + 1/n₂))
- 088. Checking the CLT conditions for proportions before trusting the normal approximation
A two-proportion test with a pooled proportion
- 2 marksDefine parameters and state hypotheses. Let p₁, p₂ be the true add-to-cart proportions for designs A and B. H₀: p₁ = p₂ versus Hₐ: p₁ ≠ p₂ (two-sided).
- 1 markCompute the sample proportions: p̂₁ = 92/400 = 0.230 and p̂₂ = 99/360 = 0.275.
- 1 markCompute the pooled proportion under H₀: p̂ = (92 + 99)/(400 + 360) = 191/760 ≈ 0.251.
- 2 marksCompute the SE with the pooled p̂: SE = √(0.251·0.749·(1/400 + 1/360)) = √(0.18805·0.004778) = √0.0008985 ≈ 0.0300.
- 1 markCompute the test statistic: z = (p̂₁ − p̂₂)/SE = (0.230 − 0.275)/0.0300 = −0.045/0.0300 ≈ −1.50.
- 1 markApply the decision rule and conclude: the two-sided 5% critical value is z* = 1.96; since |−1.50| = 1.50 < 1.96, do not reject H₀. There is only weak evidence of a difference in add-to-cart rates between the two designs.
Key terms
- SE for a proportion
- The standard error of a sample proportion, SE = √(p(1 − p)/n). Which p you plug in depends on the task: p̂ for a confidence interval, the null value p₀ for a one-sample test, and the pooled p̂ for a two-sample test.
- CI for a proportion
- An interval p̂ ± z*·√(p̂(1 − p̂)/n) estimating the population proportion. It is valid when the success/failure counts are large enough (np̂ ≥ 10 and n(1 − p̂) ≥ 10).
- One-proportion z-test
- A test of H₀: p = p₀ using z = (p̂ − p₀)/√(p₀(1 − p₀)/n). The SE is built from the NULL value p₀, because the sampling distribution is computed assuming H₀ is true.
- Difference in two proportions
- The comparison p̂₁ − p̂₂. For a confidence interval the SE keeps the two samples separate: √(p̂₁(1 − p̂₁)/n₁ + p̂₂(1 − p̂₂)/n₂).
- Pooled proportion
- The combined estimate p̂ = (count₁ + count₂)/(n₁ + n₂) used in the SE of a two-proportion TEST, because under H₀: p₁ = p₂ there is a single common proportion to estimate.
- CLT conditions for a proportion
- The normal approximation is reliable when there are at least about 10 successes and 10 failures in each group (np ≥ 10 and n(1 − p) ≥ 10). Below this, use a simulation method instead.
Inference for Proportions FAQ
Why does the one-proportion test use p₀ in the SE but the CI uses p̂?
Because the standard error must be computed under the relevant assumption. A hypothesis test asks 'how surprising is this data IF H₀ is true?', so the sampling distribution — and hence the SE — is built using the assumed null value p₀: SE = √(p₀(1 − p₀)/n). A confidence interval makes no such assumption; it just estimates the true proportion, so it uses your best estimate p̂: SE = √(p̂(1 − p̂)/n).
When do I pool the two proportions and when do I keep them separate?
Pool for the two-proportion TEST. Under H₀: p₁ = p₂ both groups share a single proportion, so you combine the counts into one pooled p̂ = (count₁ + count₂)/(n₁ + n₂) and use it in the SE. For a two-proportion confidence INTERVAL there is no such null, so you keep p̂₁ and p̂₂ separate in the SE. A simple rule: tests pool, intervals do not.
What conditions must hold before I use these formulas?
The normal approximation needs roughly at least 10 successes and 10 failures in each sample: np ≥ 10 and n(1 − p) ≥ 10. With small samples or a proportion close to 0 or 1 these fail and the bell shape is a poor fit, so you should switch to a simulation method (bootstrap for a CI, randomization for a test). Always check and state the conditions before applying the z formula.
How do I interpret the result of a proportion test?
Translate the decision back into the context with a strength-of-evidence sentence. If you reject H₀, say there is 'significant evidence that the proportion is greater/less/different…'; if you do not reject, say there is 'insufficient (or only weak) evidence that…'. Never say a proportion test 'proves' anything, and always state it about the population proportion, not the observed sample proportion.
Exam move
The whole chapter rewards getting the SE right, so make a one-line lookup table you can recall under pressure: CI for one p uses p̂; test for one p uses p₀; CI for two p keeps them separate; test for two p pools. Practise spotting from the wording whether you have one proportion or two, and whether it is a CI or a test, before you write any formula. Drill the two-proportion pooled test end to end because it has the most moving parts (two p̂s, a pooled p̂, a combined SE, a z, a critical value, a conclusion) and is a favourite long-answer question. Always check np ≥ 10 and n(1 − p) ≥ 10 first, and close with a strength-of-evidence sentence in context — 'weak/strong/insufficient evidence that…' — which earns the final mark.