University of Sydney · S1 2026 · FACULTY OF SCIENCE

DATA1001 · Foundations Of Data Science

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters5-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 5 of 7 · DATA1001

Probability and the Box Model

Probability is the engine room of inference, and DATA1001 keeps it concrete with Freedman's box model: imagine the chance process as drawing tickets from a box. Two rules run everything — add probabilities for "or" (mutually exclusive events), multiply for "and" (independent events) — with care for whether you draw with or without replacement. The box model then gives the two quantities every later test needs. For the sum of n draws: EV = n×(box mean) and SE = √n×(box SD). For the average: EV = box mean and SE = (box SD)/√n. Crucially the SE grows like √n for sums but shrinks like 1/√n for averages — this is why bigger samples give more precise estimates, and it is the law of large numbers in action. These EV and SE formulas are the OV, EV and SE that the hypothesis-testing engine plugs into.

In this chapter

What this chapter covers

01Probability rules: add for 'or', multiply for 'and'
02With vs without replacement; independence
03The binomial idea
04The box model: tickets, draws, EV and SE for the sum and the average
05Standard-error reasoning and the law of large numbers

Worked example · free

Worked example: EV and SE from a box

Q [6 marks]. A roulette-style box has 18 tickets marked +1 (win) and 20 marked −1 (lose). You bet $1 on red each spin and play n = 100 spins. (a) Find the box mean and box SD. (b) Find the EV and SE of your net result. (c) Give an approximate 95% range for the net result.

+1(a) Box mean = (18×(+1) + 20×(−1))/38 = −2/38 ≈ −0.0526.
+1(a) Box SD: the tickets are +1 and −1 in fractions 18/38 and 20/38; the SD ≈ 0.999 (essentially 1).
+1(b) EV of the sum = n×(box mean) = 100×(−0.0526) ≈ −$5.26.
+1(b) SE of the sum = √n×(box SD) = √100×0.999 ≈ $10.
+2(c) Approx 95% range = EV ± 2×SE = −5.26 ± 20, i.e. about −$25 to +$15.

The box mean is −0.0526 and box SD ≈ 1, so over 100 spins the net result has EV ≈ −$5.26 and SE ≈ $10; about 95% of the time you finish between roughly −$25 and +$15. The negative EV is the house edge; the SE measures the swing.

Sia tip — Keep the two SE formulas straight: the sum's SE = √n×(box SD) grows with n, while the average's SE = (box SD)/√n shrinks with n. The house edge (negative EV) is fixed per spin; what changes with n is the spread relative to the EV.

Glossary

Key terms

Box model: Freedman's device for any chance process: tickets in a box, drawn n times. It converts a real process into a box mean and box SD, from which the EV and SE of the sum or average follow mechanically — the foundation for every later inference.
Expected value (EV): The long-run average outcome. For the sum of n draws, EV = n×(box mean); for the average, EV = box mean. It is the centre that the OV (observed value) is compared against in the inference engine.
Standard error (SE): The SD of a chance quantity — how much a sum or average wobbles from repetition to repetition. For the sum, SE = √n×(box SD); for the average, SE = (box SD)/√n. The average's SE shrinks like 1/√n, which is why bigger samples are more precise.
Independence: Two events are independent when one occurring does not change the probability of the other; then P(A and B) = P(A)×P(B). Drawing with replacement keeps draws independent; drawing without replacement makes them dependent.
Law of large numbers: As the number of draws grows, the observed average converges to the box mean (the EV), because the average's SE shrinks like 1/√n. It is why long-run frequencies stabilise and why larger samples estimate parameters more precisely.

FAQ

Probability and the Box Model FAQ

When do I add probabilities and when do I multiply?

Add for "or" with mutually exclusive events: P(A or B) = P(A) + P(B). Multiply for "and" with independent events: P(A and B) = P(A)×P(B). The catch is replacement — drawing without replacement changes the probabilities on later draws, so the events are no longer independent and you must update the fractions as you go.

What is the box model and why is it so central?

The box model represents any chance process as tickets in a box that you draw n times. It is central because it gives you, mechanically, the EV and SE of the sum and the average — and those are exactly the ingredients the hypothesis-testing engine needs. Learn to build the right box (what's on the tickets, in what proportions) and the rest of the inference course is plugging numbers into formulas.

Why does the standard error sometimes grow and sometimes shrink with n?

Both happen, for different quantities. The SE of the sum is √n×(box SD), which grows with n — totals get more variable in absolute terms. The SE of the average is (box SD)/√n, which shrinks with n — averages get more precise. The shrinking-average SE is the law of large numbers and the reason larger samples give tighter estimates.

How do EV and SE connect to hypothesis testing?

Directly. The (OV−EV)/SE engine that runs every test in the next chapter uses exactly these quantities: OV is the observed statistic, EV is what you'd expect if the null were true (the box mean), and SE is the box-model standard error. So mastering the box model here is mastering the denominator and centre of every test you'll meet later.

Study strategy

Exam move

Get fluent at building the box: decide what is written on the tickets and in what proportions, then read off the box mean and box SD. Memorise the two pairs — sum: EV = n×mean, SE = √n×SD; average: EV = mean, SE = SD/√n — and keep clear that the sum's SE grows while the average's SE shrinks with n. Use "add for or, multiply for and", and always check replacement before assuming independence. Because these EV and SE values become the EV and SE of the testing engine, practise them until they are automatic; the back half of the course rewards the box model heavily.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 25 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your DATA1001 tutor, unlimited, worked the way the exam marks it

The full 5-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works