ECMT1010 · Introduction To Economic Statistics
The Normal Distribution & the CLT
Weeks 6 and 8 are the bridge from simulation to formulas. The normal distribution describes bell-shaped data, the standard normal N(0,1) lets you convert any value to a z-score and read areas and percentiles, and the Central Limit Theorem guarantees the sampling distribution of a mean or proportion is approximately normal for large n. It is examined as MCQ and short-answer: standardise a value, find a tail area, compute the SD of a sample mean, and check the CLT conditions.
What this chapter covers
- 011. The normal density: symmetric, bell-shaped, with inflection points at μ ± σ
- 022. The standard normal N(0,1) and standardising z = (x − μ)/σ; back-converting x = μ + z·σ
- 033. Reading areas (probabilities) and percentiles from the N(0,1) table
- 044. The Central Limit Theorem: the sampling distribution of a mean/proportion is ≈ normal for large n
- 055. CLT conditions: n ≥ 30 for a mean; np ≥ 10 and n(1 − p) ≥ 10 for a proportion
- 066. The SD of the sample mean SD(X̄) = σ/√n, and why bigger n means a tighter distribution
- 077. Normal-approximation confidence intervals: statistic ± z*·SE
- 088. The standard z* critical values: 1.645 (90%), 1.960 (95%), 2.576 (99%)
A normal probability and the CLT for a sample mean
- 3 marks(a) Standardise $165: z = (165 − 120)/30 = 45/30 = 1.50. From the N(0,1) table the area above z = 1.50 is about 0.067, so about 6.7% of households pay more than $165.
- 2 marks(b) The SD of the sample mean is SD(X̄) = σ/√n = 30/√36 = 30/6 = $5.
- 1 mark(c) Yes — by the CLT, with n = 36 ≥ 30 the sampling distribution of the mean is approximately normal regardless of the population shape (and here the population is already normal, which makes it exactly normal).
Key terms
- Normal distribution
- A symmetric, bell-shaped distribution determined by its mean μ and SD σ, written x ~ N(μ, σ). Its inflection points sit at μ ± σ, and the 68–95–99.7 rule describes the areas within 1, 2 and 3 SDs.
- Standard normal N(0,1)
- The normal distribution with mean 0 and SD 1. Any normal value standardises to it via z = (x − μ)/σ, and the bound N(0,1) table converts z-scores into areas and percentiles.
- z-score (standardising)
- z = (x − μ)/σ rescales a value to N(0,1) units — how many SDs it sits from the mean. Back-converting gives x = μ + z·σ to turn a percentile into a raw value.
- Central Limit Theorem (CLT)
- For a large enough sample the sampling distribution of the sample mean (or proportion) is approximately normal, centred on μ (or p), regardless of the population's shape — the result that licenses all the z- and t-based formula procedures.
- CLT conditions
- The rules of thumb for 'large enough': n ≥ 30 for a quantitative mean, and np ≥ 10 and n(1 − p) ≥ 10 for a proportion. If they fail, the normal approximation is unreliable and you should use a simulation method.
- z* critical value
- The N(0,1) multiplier in statistic ± z*·SE for a chosen confidence level: z* = 1.645 for 90%, 1.960 for 95%, and 2.576 for 99%. These are read straight from the provided table.
The Normal Distribution & the CLT FAQ
What is the difference between σ and σ/√n?
σ is the standard deviation of individual values in the population — how spread out single observations are. σ/√n is the standard deviation of the sample mean (often called its standard error) — how much the average of n observations bounces around. The mean is far less variable than an individual value, which is why σ/√n shrinks as n grows. Use σ for a question about one value and σ/√n for a question about a sample average.
How do I read an 'above' or 'between' probability from the table?
First standardise the boundary(ies) with z = (x − μ)/σ, then sketch the curve and shade the area you want. The bound N(0,1) table usually gives the area to the LEFT of z. For an 'above' probability, look up the left area and subtract from 1; for a 'between' probability, subtract the smaller left area from the larger. Always shade first so you know which arithmetic to do.
Why does the Central Limit Theorem matter so much?
Because it lets you use the normal (and t) formula procedures even when the underlying data are not normal. The CLT says that for a large enough sample the distribution of the sample mean (or proportion) is approximately normal regardless of the population's shape, centred on the true parameter with SD σ/√n. Without it, the z* confidence intervals and z-tests for the rest of the unit would not be justified.
What are the conditions for the CLT to apply?
For a mean, the usual rule of thumb is n ≥ 30 (smaller is fine if the population itself is roughly normal). For a proportion the conditions are np ≥ 10 and n(1 − p) ≥ 10 — you need enough expected 'successes' and 'failures' for the bell shape to hold. If the conditions fail (small n, or a very skewed population, or a proportion near 0 or 1 with small n), the normal approximation is unreliable and the simulation methods (bootstrap, randomization) are safer.
Exam move
Make standardising second nature: z = (x − μ)/σ to go from a raw value to N(0,1), and x = μ + z·σ to go back from a percentile to a raw value. Always sketch the bell curve and shade the region before touching the table, because the table gives left areas and most mark-losses come from grabbing the wrong area. Drill the σ-versus-σ/√n distinction with a mix of 'one household' and 'sample of n households' questions until it is reflexive. Memorise the three z* values (1.645/1.960/2.576) and the CLT conditions (n ≥ 30 for a mean; np and n(1 − p) ≥ 10 for a proportion), since both turn up as quick MCQs and as the justification line examiners want before you apply a normal formula.