MAST90105 · Methods Of Mathematical Statistics
Hypothesis Testing and Neyman–Pearson
A hypothesis test pits a null H0 against an alternative Ha and uses the data to decide between them, accepting two kinds of error: a Type I error (reject a true null, with probability α — the significance level) and a Type II error (fail to reject a false null, probability β); the test’s power is 1 − β, its chance of catching a real effect. The chapter’s theoretical centrepiece is the Neyman–Pearson lemma: for a simple-versus-simple test, the most powerful test at level α rejects when the likelihood ratio exceeds a threshold — the optimality result that justifies likelihood-ratio tests and the standard statistics that follow. Those standard tests are then assembled from the sampling distributions built earlier: the one-sample t for a mean (σ unknown), the χ² test for a variance, and the large-sample z test for a proportion. Throughout, the chapter ties testing to intervals (a level-α two-sided test rejects exactly when the parameter falls outside the (1−α) CI) and keeps the p-value interpretation honest.
What this chapter covers
- 019.1 Null vs alternative; the logic of a test
- 029.2 Type I / Type II errors, α, β and power
- 039.3 The Neyman–Pearson lemma and the most powerful test
- 049.4 Likelihood-ratio tests
- 059.5 The standard tests: one-sample t, variance χ², proportion z
- 069.6 The test–CI duality and the p-value
Worked example: a one-sample t-test for a mean
- +1Hypotheses and test. H0: μ = 500 vs Ha: μ ≠ 500 (two-sided). With σ unknown, use the one-sample t-statistic with n−1 = 24 df.
- +1Standard error. S/√n = 10/√25 = 10/5 = 2.
- +1Test statistic. t = (X̄ − μ0)/(S/√n) = (496 − 500)/2 = −2.0.
- +1Critical value. Two-sided 5% with 24 df: reject if |t| > t0.025,24 = 2.064 (provided table).
- +1Decide. |−2.0| = 2.0 < 2.064, so we do NOT reject H0 at the 5% level.
- +1Interpret. The evidence that the mean fill differs from 500 ml is not quite significant at 5% — the observed 4 ml shortfall is within sampling noise for this n and S. (Equivalently, 500 lies just inside the 95% CI.)
Key terms
- Type I and Type II errors
- A Type I error rejects a true null (probability α, the significance level); a Type II error fails to reject a false null (probability β). Lowering α raises β for fixed n — the fundamental trade-off a test balances.
- Power
- 1 − β, the probability a test correctly rejects a false null — its sensitivity to a real effect. Power rises with sample size, effect size and α, and falls with variability. The Neyman–Pearson test maximises it at fixed α.
- Neyman–Pearson lemma
- For a simple null versus a simple alternative, the most powerful level-α test rejects when the likelihood ratio L(θ1)/L(θ0) exceeds a constant chosen to give size α. It is the optimality theorem behind likelihood-ratio tests and the usual statistics.
- Likelihood-ratio test (LRT)
- A general test that rejects for small values of the ratio of the maximised likelihood under H0 to the maximised likelihood overall; −2·ln of that ratio is approximately χ² in large samples (Wilks). It extends Neyman–Pearson to composite hypotheses.
- p-value
- The probability, computed under H0, of a test statistic at least as extreme as the one observed. Reject when it is below α. It is not the probability that H0 is true — a routinely mis-stated definition.
Hypothesis Testing and Neyman–Pearson FAQ
What is the difference between the significance level and the power?
The significance level α is the probability of a Type I error — rejecting a true null — which you fix in advance, commonly at 5%. Power is 1 − β, the probability of correctly rejecting a false null. You control α by the cut-off; power then depends on the true effect size, the sample size and the variability. For a fixed n there is a trade-off: shrinking α reduces power.
What does the Neyman–Pearson lemma give me?
It identifies the best possible test for a simple null against a simple alternative: among all tests with significance level α, the one that rejects when the likelihood ratio exceeds a threshold has the highest power. That is why the standard t, χ² and z tests are the right ones — they are (or approximate) likelihood-ratio tests — and why ‘reject when the likelihood ratio is large’ is the organising principle of the chapter.
How are confidence intervals and tests related?
They are two views of the same inference. A two-sided level-α test of H0: θ = θ0 rejects exactly when θ0 falls outside the (1−α) confidence interval for θ. So you can read a test off an interval and vice versa, which is a fast way to check your work — if the null value sits inside the 95% CI, a 5% two-sided test does not reject.
Exam move
Fix the four-line test template in memory — hypotheses, standardised statistic, table critical value, decision plus interpretation — and run it for the three workhorse tests (mean t, variance χ², proportion z) until the right statistic jumps out from the cue. Understand the Neyman–Pearson lemma well enough to derive a simple-versus-simple rejection region from the likelihood ratio, since that is the chapter’s theory mark. Keep α, β and power straight and be precise about the p-value (it is computed under H0, and is not P(H0 true)). Use the test–CI duality to cross-check decisions, and remember the provided table supplies the quantiles but never the setup.