MAST90105 · Methods Of Mathematical Statistics
Estimator Properties and the Cramér–Rao Lower Bound
An estimator θ̂ is graded on three numbers: its bias E[θ̂] − θ (how far off on average), its variance (how much it bounces), and the mean squared error MSE = variance + bias² that combines them — the bias–variance decomposition that explains why a slightly biased estimator can beat an unbiased one. The chapter then asks the deep question: among all unbiased estimators, how small can the variance possibly be? The answer is the Cramér–Rao lower bound (CRLB), Var(θ̂) ≥ 1/[nI(θ)], where Fisher information I(θ) = −E[ℓ″(θ)] measures how sharply the log-likelihood peaks. An unbiased estimator that hits the bound is efficient. The bound only holds under regularity conditions — and the chapter is careful about when they fail (again, the Uniform(0,θ) support-dependent case). It closes with the large-sample payoff: the MLE is asymptotically normal and achieves the CRLB as n grows, which is the theoretical reason MLEs are the default.
What this chapter covers
- 017.1 Bias, variance and MSE
- 027.2 The bias–variance decomposition
- 037.3 Fisher information I(θ) = −E[ℓ″]
- 047.4 The Cramér–Rao lower bound and efficiency
- 057.5 Regularity conditions — and when they fail
- 067.6 Asymptotic normality of the MLE
Worked example: Fisher information and the CRLB for a Poisson rate
- +1Log-density. For one observation, ln f(x; μ) = −μ + x·ln μ − ln(x!); the score is ∂/∂μ = −1 + x/μ.
- +1Second derivative. ∂²/∂μ² ln f = −x/μ².
- +1Fisher information (per observation). I(μ) = −E[−X/μ²] = E[X]/μ² = μ/μ² = 1/μ.
- +1CRLB. For n observations the bound is Var(μ̂) ≥ 1/[nI(μ)] = μ/n.
- +1The MLE’s variance. μ̂ = X̄, and Var(X̄) = Var(X)/n = μ/n (the Poisson variance is μ).
- +1Conclude. Var(μ̂) = μ/n equals the CRLB, so X̄ is the efficient (minimum-variance unbiased) estimator of the Poisson rate.
Key terms
- Bias
- E[θ̂] − θ — the systematic error of an estimator. An estimator is unbiased when this is zero. Bias is not automatically bad: a little bias can buy a large variance reduction and lower the overall MSE.
- Mean squared error (MSE)
- E[(θ̂ − θ)²] = Var(θ̂) + [bias(θ̂)]² — the bias–variance decomposition. It is the single number that trades accuracy against precision and lets you compare a biased estimator with an unbiased one.
- Fisher information
- I(θ) = −E[ℓ″(θ)] = E[(ℓ′(θ))²] — how sharply the log-likelihood curves at the truth. Larger information means the parameter is more identifiable and the achievable variance is smaller.
- Cramér–Rao lower bound (CRLB)
- Under regularity conditions, any unbiased estimator satisfies Var(θ̂) ≥ 1/[nI(θ)]. It is the floor on precision; an unbiased estimator that meets it is efficient. The bound fails when the support depends on θ.
- Asymptotic normality of the MLE
- As n grows, √n(θ̂MLE − θ) converges to N(0, 1/I(θ)) — the MLE is approximately normal with variance equal to the CRLB. This justifies Wald confidence intervals and large-sample tests built from the MLE.
Estimator Properties and the Cramér–Rao Lower Bound FAQ
Why prefer an estimator with lower MSE over an unbiased one?
Because MSE = variance + bias² is what actually measures closeness to the truth. An unbiased estimator can have a large variance, and a slightly biased estimator with much smaller variance can have a lower MSE and so be closer on average. Unbiasedness is a desirable property, not the goal; minimising MSE (or finding the minimum-variance unbiased estimator) is the real objective.
What is Fisher information, intuitively?
It measures how much a sample tells you about the parameter, through the curvature of the log-likelihood at the truth: a sharply peaked log-likelihood (large −ℓ″) pins the parameter down tightly, so the information is high and the achievable variance is low. Its reciprocal, scaled by sample size, is the Cramér–Rao lower bound — the best precision any unbiased estimator can reach.
When does the Cramér–Rao bound not apply?
When the regularity conditions fail — most importantly when the support of the distribution depends on the parameter, as in Uniform(0,θ). There the usual differentiation-under-the-integral steps that derive the bound are invalid, and an estimator can actually have a variance below the naive bound. Always check the regularity conditions (especially a parameter-free support) before quoting the CRLB.
Exam move
Treat this chapter and point estimation as the joint heart of the final exam and over-prepare both. Be able to compute bias, variance and MSE for a given estimator, and state the bias–variance decomposition from memory. For the CRLB, practise the full pipeline — differentiate the log-density twice, take the negative expectation for Fisher information, divide by n for the bound, then compare an estimator’s variance to decide efficiency. Put the CRLB formula, the definition of Fisher information, and the regularity conditions (especially ‘support must not depend on θ’) on your A4 sheet, because the table cannot give you any of them.