University of Sydney · S1 2026 · FACULTY OF SCIENCE

DATA1001 · Foundations Of Data Science

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters5-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 3 of 7 · DATA1001

The Normal Distribution

The Normal model — the bell curve N(μ, σ²) — is the bridge from describing data to reasoning about chance. Its master device is the standard unit, or z-score: z = (x − x̄)/s tells you how many SDs a value sits from its mean, putting any value on a common scale. From there the 68–95–99.7 empirical rule lets you split a roughly Normal distribution by eye, and the Normal table (pnorm / qnorm) lets you read an exact area below a value or find the value at a given percentile. The model also frames the idea of measurement error — observed = exact + chance error — which underpins the whole inference half of the course. Master the z-score and reading areas both ways (value → area with pnorm, area → value with qnorm) and you have the engine that every later test reuses.

In this chapter

What this chapter covers

01The Normal curve N(μ, σ²) and its shape
02Standard units / z-scores: z = (x − x̄)/s
03The 68–95–99.7 empirical rule
04Reading areas and percentiles off the Normal table (pnorm / qnorm)
05Measurement error: observed = exact + chance error

Worked example · free

Worked example: z-scores, areas and percentiles

Q [6 marks]. Exam scores are approximately Normal with mean 70 and SD 5. (a) A student scores 78. Find the z-score and the percentage scoring below them. (b) Use the empirical rule to give the range covering the middle 95% of scores. (c) What score is the 90th percentile?

+1(a) z-score: z = (78 − 70)/5 = 8/5 = 1.6.
+1(a) Area below: pnorm(1.6) ≈ 0.945, so about 94.5% of students score below 78.
+2(b) Empirical rule: the middle 95% lies within ±2 SD of the mean, i.e. 70 ± 2×5 = 60 to 80.
+2(c) 90th percentile: qnorm(0.90) ≈ 1.28, so the score = 70 + 1.28×5 ≈ 76.4.

z = 1.6 (about 94.5% score below 78); the middle 95% of scores lie between 60 and 80; and the 90th percentile is about 76.4. Standardise first, then read areas with pnorm and values with qnorm.

Sia tip — Watch the direction: pnorm goes value → area (how much is below), qnorm goes area → value (what score sits at this percentile). The most common slip is converting back to raw units the wrong way — always rebuild as x = mean + z×SD.

Glossary

Key terms

Normal distribution: The symmetric bell-shaped curve N(μ, σ²), fully described by its mean and SD. Many measurements and (via the central limit theorem) most sample statistics are approximately Normal, which is why it is the reference curve for inference.
Standard units (z-score): z = (x − x̄)/s, the number of SDs a value sits from its mean. It puts any value on a common, unitless scale and is the key that unlocks the Normal table — and the OV−EV part of the inference engine.
68–95–99.7 rule: For a roughly Normal distribution, about 68% of values fall within 1 SD of the mean, 95% within 2 SD, and 99.7% within 3 SD. It lets you split a distribution and sanity-check Normal-table answers by eye.
pnorm and qnorm: The two directions of the Normal table. pnorm(z) returns the area (probability) below a z-score; qnorm(p) returns the z-score (or, scaled, the value) at a given cumulative probability. pnorm is value → area; qnorm is area → value.
Measurement error: The idea that an observed measurement = the exact value + chance error. Repeated measurements of the same quantity scatter around the truth with an SD that quantifies the instrument's precision — the seed of all later standard-error reasoning.

FAQ

The Normal Distribution FAQ

What is a z-score and why standardise?

A z-score, z = (x − x̄)/s, counts how many SDs a value sits above or below its mean. Standardising puts values measured in different units on one common scale, so you can compare them and read probabilities straight off the single standard Normal table. It is also the OV−EV piece of the (OV−EV)/SE engine that runs the whole inference half of the course.

When can I use the 68-95-99.7 rule?

Only when the distribution is roughly Normal (symmetric, bell-shaped). Then about 68% of the data lie within 1 SD of the mean, 95% within 2 SD and 99.7% within 3 SD. It is a fast way to give a range or sanity-check a table answer, but it is an approximation — for exact areas or non-round z-scores use pnorm/qnorm.

What's the difference between pnorm and qnorm?

They run in opposite directions. pnorm(z) takes a z-score and returns the area (probability) below it — value to area. qnorm(p) takes a probability and returns the z-score at that percentile — area to value. "What % score below 78?" uses pnorm; "what is the 90th-percentile score?" uses qnorm. Mixing them up is the classic Normal-model error.

How does measurement error fit in?

DATA1001 frames a measurement as observed = exact + chance error. If you weigh the same mass several times, the readings scatter around the true value, and the SD of those repeats measures the instrument's precision. This is the conceptual seed of the standard error: a statistic, like a single measurement, has a predictable spread around the quantity it estimates.

Study strategy

Exam move

Make the z-score automatic: standardise first (z = (x − x̄)/s), then decide which direction you need. For "how much is below / above / between", go value → area with pnorm and the empirical rule; for "what value sits at this percentile", go area → value with qnorm and rebuild raw units as x = mean + z×SD. Sanity-check every table answer against 68–95–99.7. Keep the measurement-error framing (observed = exact + chance error) in mind, because it is the bridge to the standard error and the inference engine in the back half of the course.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 25 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your DATA1001 tutor, unlimited, worked the way the exam marks it

The full 5-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works