University of Sydney · S1 2026 · FACULTY OF SCIENCE

DATA1001 · Foundations Of Data Science

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters5-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 6 of 7 · DATA1001

Sampling Distributions

This chapter is the hinge between a single sample and a statement about the population. You want a parameter (the true mean μ or proportion p) but all you have is a statistic (x̄ or p̂) computed from one sample. The central limit theorem is the engine: the sampling distribution of the mean (or sum) is approximately Normal for large n, centred on the parameter and narrowing as n grows — regardless of the population's own shape. That lets you attach a margin of error and build a confidence interval: estimate ± z×SE. The subtlety the exam tests hardest is what "95% confident" actually means — it is a statement about the long-run reliability of the procedure, not the probability that a particular interval contains the parameter. The chapter also covers the bias types that no sample size cures, and the bootstrap as a resampling way to get an SE when a formula is awkward.

In this chapter

What this chapter covers

01Parameter vs statistic; the sampling frame
02The central limit theorem: the sampling distribution of the mean
03Bias types revisited (size cures variance, not bias)
04Confidence intervals: estimate ± z×SE
05What '95% confident' really means; the bootstrap

Worked example · free

Worked example: a confidence interval for a proportion

Q [6 marks]. In a simple random sample of n = 400 voters, 220 say they will vote Yes. (a) Find the sample proportion and its standard error. (b) Build an approximate 95% confidence interval for the true proportion. (c) State precisely what "95% confident" means here.

+1(a) p̂ = 220/400 = 0.55.
+1(a) SE = √(p̂(1−p̂)/n) = √(0.55×0.45/400) ≈ 0.0249.
+2(b) 95% CI = p̂ ± 1.96×SE = 0.55 ± 1.96×0.0249 = 0.55 ± 0.049 = (0.501, 0.599).
+2(c) If we repeated this sampling procedure many times, about 95% of the intervals built this way would contain the true proportion.

p̂ = 0.55 with SE ≈ 0.025, giving an approximate 95% CI of (0.501, 0.599); "95% confident" describes the procedure — about 95% of intervals built this way capture the true proportion — not the chance that this one interval does.

Sia tip — The classic deduction is saying "there is a 95% probability the true proportion is in (0.501, 0.599)". Once the interval is computed it either contains the parameter or it doesn't; the 95% is the long-run success rate of the method. Get this wording exactly right — it is examined every year.

Glossary

Key terms

Parameter: A fixed, unknown number describing the whole population — the true mean μ or proportion p. Inference is the business of estimating a parameter from a statistic and quantifying how far off the estimate might be.
Statistic: A number computed from the sample — the sample mean x̄ or proportion p̂ — used to estimate the corresponding parameter. It varies from sample to sample, and that variation is the sampling distribution.
Central limit theorem (CLT): The result that the sampling distribution of the mean (or sum) is approximately Normal for large n, centred on the parameter and with SE shrinking like 1/√n — whatever the shape of the population. It is what lets us use the Normal curve for inference.
Confidence interval: A range built as estimate ± z×SE that is designed to capture the parameter a stated proportion of the time (e.g. 95% with z = 1.96). Its width is the margin of error; it narrows as n grows.
Bootstrap: A resampling method: draw new samples (with replacement) from the data itself, recompute the statistic each time, and use the spread of those values as the standard error. It estimates an SE empirically when a formula is awkward or unavailable.

FAQ

Sampling Distributions FAQ

What's the difference between a parameter and a statistic?

A parameter is a fixed but unknown number describing the whole population (the true mean μ or proportion p); a statistic is a number you compute from your sample (x̄ or p̂) to estimate it. You never see the parameter directly — inference is using the statistic, plus its standard error, to say something credible about the parameter.

What does the central limit theorem actually let me do?

It lets you treat the sampling distribution of a mean or proportion as approximately Normal for large n, no matter how skewed the population is. That single fact is why you can attach a Normal-based margin of error, build confidence intervals and run z-tests. The catch is "large enough n": heavy skew or small samples need more data before the Normal approximation is safe.

What does '95% confidence' really mean?

It is a statement about the procedure, not about one interval. If you repeated the whole sampling-and-interval process many times, about 95% of the intervals you built would contain the true parameter. For a specific computed interval, the parameter is either in it or not — there is no 95% probability attached to that one interval. Saying otherwise is the most penalised CI error.

When would I use the bootstrap instead of a formula?

When a clean SE formula is awkward, unavailable, or relies on assumptions you can't justify — for example the SE of a median or a complicated statistic. You resample (with replacement) from your own data many times, recompute the statistic each time, and read the SE off the spread of those bootstrap values. It is a general-purpose, computer-based route to the same margin-of-error reasoning.

Study strategy

Exam move

Keep the parameter-statistic distinction front of mind: you estimate a fixed unknown parameter with a variable statistic, and the central limit theorem tells you that statistic is approximately Normal around the parameter with SE shrinking like 1/√n. For confidence intervals, drill the mechanics (estimate ± z×SE) but spend most of your effort on the interpretation — "95% of intervals built this way capture the parameter" — because that wording is examined relentlessly and the probability-of-one-interval phrasing loses the mark. Remember that bias is a fixed offset no sample size removes, and keep the bootstrap in your toolkit for when a formula SE is awkward.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 25 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your DATA1001 tutor, unlimited, worked the way the exam marks it

The full 5-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works