MAST90105 · Methods Of Mathematical Statistics
Probability Foundations and Bayes
Everything later in the course is written in the grammar of probability, so the subject opens by nailing that grammar down: the three axioms, the handful of set identities (complement, union, the inclusion–exclusion rule), and the two ways of counting equally-likely outcomes — permutations when order matters, combinations when it does not. On top of that sits the engine of all inference: conditional probability P(A|B) = P(A∩B)/P(B), the multiplication and total-probability rules that decompose an event across a partition, and Bayes’ theorem, which flips a conditional to update a prior into a posterior. The chapter closes on independence — the precise statement P(A∩B) = P(A)P(B), and why ‘mutually exclusive’ is its opposite, not its synonym. Master Bayes here and the Bayesian estimation chapter later is just this rule applied to a likelihood and a prior.
What this chapter covers
- 011.1 The three axioms and the basic set identities
- 021.2 Counting: permutations vs combinations
- 031.3 Conditional probability and the multiplication rule
- 041.4 The law of total probability over a partition
- 051.5 Bayes’ theorem — flipping the conditional
- 061.6 Independence vs mutual exclusivity
Worked example: Bayes’ theorem on a diagnostic test
- +1Name the method. We want P(D | +) from P(+ | D) — a conditional flip, so use Bayes’ theorem with the law of total probability in the denominator.
- +1Total probability of a positive: P(+) = P(+|D)P(D) + P(+|D′)P(D′) = 0.95(0.01) + 0.04(0.99) = 0.0095 + 0.0396 = 0.0491.
- +1Bayes: P(D | +) = P(+|D)P(D) / P(+) = 0.0095 / 0.0491 = 0.1935.
- +1Interpret: despite a positive result the chance of disease is only ~19%, because the disease is rare and false positives swamp true positives — the base-rate effect.
Key terms
- Conditional probability
- P(A | B) = P(A∩B)/P(B) for P(B) > 0 — the probability of A once B is known to have occurred. It rescales the sample space to B, and is the building block of the multiplication rule, total probability and Bayes.
- Law of total probability
- If B₁,…,Bₖ partition the sample space, then P(A) = ∑P(A|Bᵢ)P(Bᵢ) — you reach A by summing over every disjoint route to it. It supplies the denominator in Bayes’ theorem.
- Bayes’ theorem
- P(Bᵢ | A) = P(A|Bᵢ)P(Bᵢ) / ∑P(A|Bⱼ)P(Bⱼ) — it flips a conditional, turning the easy direction P(A|B) and a prior P(B) into the posterior P(B|A). It is the whole of Bayesian inference in one line.
- Independence
- A and B are independent iff P(A∩B) = P(A)P(B), equivalently P(A|B) = P(A) — knowing B tells you nothing about A. Distinct from mutually exclusive events, which cannot both occur and so are dependent (each excludes the other).
- Permutation vs combination
- A permutation counts ordered arrangements, nPr = n!/(n−r)!; a combination counts unordered selections, nCr = n!/[r!(n−r)!]. Order matters for permutations and not for combinations — the single most common counting error to get right first.
Probability Foundations and Bayes FAQ
What is the difference between mutually exclusive and independent?
They are opposites, not synonyms. Mutually exclusive means the events cannot both happen, P(A∩B) = 0, so if one occurs the other cannot — that is maximal dependence. Independent means P(A∩B) = P(A)P(B), so one happening tells you nothing about the other. Two events with positive probability cannot be both mutually exclusive and independent.
When do I use a permutation versus a combination?
Ask whether order matters. If arranging r items out of n in a sequence (a podium, a password), use a permutation nPr. If choosing an unordered subset (a committee, a hand of cards), use a combination nCr. A quick check: a combination times r! equals the corresponding permutation, because each unordered selection has r! orderings.
Why does a positive test still leave a low probability of disease?
Because the disease is rare. Bayes weighs the likelihood by the prior, so with a 1% base rate the small false-positive rate applied to the large healthy group produces more false positives than the test produces true positives. The posterior, not the sensitivity, is the answer to ‘do I have it?’ — always compute the full total-probability denominator.
Exam move
Get the vocabulary watertight before the calculations: write the axioms, the conditional-probability definition, the total-probability sum and Bayes’ theorem on one card and practise re-deriving Bayes from the multiplication rule. Drill the base-rate diagnostic problem until the structure — likelihood × prior over a total-probability denominator — is automatic, because the same shape returns in the Bayesian estimation chapter as posterior ∝ prior × likelihood. Keep ‘independent’ and ‘mutually exclusive’ strictly apart, and always confirm whether a counting question cares about order before reaching for nPr or nCr.