ECMT1010 · Introduction To Economic Statistics
Hypothesis Testing & Randomization
Weeks 5–6 build the formal test of evidence: state a null and an alternative hypothesis in population parameters, build a randomization distribution in StatKey under a true H₀, read off the p-value, and decide against the significance level α. It is examined as MCQ and short-answer — the marks come from the H₀/Hₐ setup, the correct one- vs two-sided p-value, the reject/inconclusive decision, the one-sentence conclusion in context, and naming Type I vs Type II errors.
What this chapter covers
- 011. Null H₀ (contains '=') vs alternative Hₐ (>, <, or ≠), always in population parameters
- 022. The randomization distribution: simulate statistics from a world where H₀ is true
- 033. Reallocate (experiments) vs shift/shuffle (observational) to make H₀ true
- 044. The p-value: probability of a result this extreme (or more) if H₀ is true
- 055. One-sided (one tail) vs two-sided (both tails) p-values
- 066. Significance level α: p < α → reject H₀ (significant); p > α → inconclusive
- 077. The CI ↔ HT equivalence: a two-sided test at α matches a (1 − α) CI
- 088. Type I error (reject a true H₀, prob α), Type II error (keep a false H₀, prob β), and power = 1 − β
A one-sided test from a randomization distribution
- 2 marksDefine the parameter and state the hypotheses. Let p = the true proportion of customers who buy a coffee. The claim is 'at least 60%', and we are testing whether it is actually lower, so H₀: p = 0.60 versus Hₐ: p < 0.60 (one-sided, left tail).
- 1 markRead the p-value off the randomization distribution: it is the proportion of simulated statistics at or below the observed 0.533 = 0.072.
- 1 markApply the decision rule: 0.072 > 0.05 = α, so do not reject H₀.
- 1 markConclude in context: the result is not statistically significant at the 5% level — there is insufficient evidence that fewer than 60% of customers buy a coffee.
Key terms
- Null vs alternative hypothesis
- H₀ is the 'no effect / no difference' claim and always contains '='; Hₐ is the claim you are gathering evidence for and contains >, <, or ≠. Both are stated in terms of population parameters, not sample statistics.
- Randomization distribution
- The distribution of a statistic simulated from a world where H₀ is true — built by reallocating responses (experiments) or shifting/shuffling the data (observational). The observed statistic is compared against it to get the p-value.
- p-value
- The probability of obtaining a statistic as extreme as — or more extreme than — the observed one, assuming H₀ is true. A small p-value means the data would be surprising under H₀, which is evidence against it.
- Significance level (α)
- The pre-set threshold for the p-value (commonly 0.05). If p < α you reject H₀ and call the result statistically significant; if p > α the result is inconclusive and you do not reject H₀.
- Type I and Type II error
- A Type I error rejects a true H₀ (probability α); a Type II error fails to reject a false H₀ (probability β). Power = 1 − β is the chance of correctly detecting a real effect, and it rises with a larger sample, a larger effect, or a larger α.
- CI ↔ HT equivalence
- A two-sided hypothesis test at level α gives the same decision as a (1 − α) confidence interval: you reject H₀ exactly when the (1 − α) CI excludes the null value.
Hypothesis Testing & Randomization FAQ
What does the p-value actually tell me?
It tells you how surprising your data would be if the null hypothesis were true — specifically, the probability of a statistic at least as extreme as the one you observed under H₀. A small p-value means 'data like this rarely happens when H₀ holds', which is evidence against H₀. It is NOT the probability that H₀ is true, and it is NOT the probability you made a mistake.
How do I decide whether the test is one-sided or two-sided?
Look at the research claim. If it specifies a direction — 'less than', 'more than', 'increases', 'reduces' — use a one-sided Hₐ (< or >) and one tail of the randomization distribution. If it just asks whether there is any difference or change, use a two-sided Hₐ (≠) and both tails, which makes the p-value roughly double the one-sided value. Set the direction from the question before you compute anything.
If I do not reject H₀, have I proved it is true?
No. 'Do not reject H₀' means the evidence was not strong enough to rule it out at your chosen α — the result is inconclusive, not proof. Absence of evidence is not evidence of absence: a small sample can fail to detect a real effect (a Type II error). The correct wording is 'there is insufficient evidence that…', never 'we have shown there is no effect'.
What is the difference between a Type I and a Type II error?
A Type I error is a false alarm — you reject a null hypothesis that is actually true, and its probability is α (so a 5% test makes a Type I error 5% of the time when H₀ holds). A Type II error is a miss — you fail to reject a null that is actually false, with probability β. Power = 1 − β is the chance of catching a real effect. Lowering α to guard against false alarms raises β, so there is always a trade-off.
Exam move
Treat every test as a five-line ritual and rehearse it until it is automatic: define the parameter, state H₀ and Hₐ in that parameter, find the p-value (one tail or two), compare with α, then write a one-sentence conclusion in context using the word 'evidence'. The single biggest mark-loser is stating hypotheses about the sample statistic instead of the population parameter, so always use μ or p, never x̄ or p̂. Understand the randomization distribution conceptually — it is the world where H₀ is true — so you can explain why the p-value is a tail area. Memorise the error table (Type I = false alarm = α; Type II = miss = β; power = 1 − β) and the CI↔HT shortcut, both of which appear as MCQs, and practise saying 'inconclusive' rather than 'we proved H₀'.