Download PDF
A4 landscape · ~7pt body · 6 cols · ink + yellow highlight only · the whole unit on two sides 2 sides · MAT9004 · Monash
MAT9004

Mathematical Foundations for Data Science & AI

Monash University · Faculty of Science
Exam Revision
Sem 1 2026 · Side 1 of 2
All six topics · the continuous half
SIDE 1/2   CONTINUOUS · Single-variable calculus · Optimisation · Integration · Linear algebra · Eigenvalues · A=PDP⁻¹ · Multivariable calculus · Hessian test Revision sheet · closed-book exam Compiled by AskSia · mapped to the MAT9004 syllabus · asksia.ai/cheatsheet/monash-mat9004

0 · Exam Blueprintread first

One paper, six disjoint maths worlds. The final is 60% of the unit AND a hurdle — you must score ≥45/100 on the exam itself to pass, regardless of coursework.

ItemDetail
Weight60% · hurdle ≥45%
Coursework2 assign×10% + 5 quiz×4%
Duration~3 h 10 min · e-exam
Materialsclosed-book · NO calc
Formula sheetprovided in paper
Template~36 questions, fixed slots

~36-slot pattern: ~31 short-answer (1.5–3 marks, answer is an integer or lowest-terms a/b, no spaces, no decimals) + 5 long-answer (6 marks: global extrema · Hessian classify · eigen/diagonalise · free-variable system · two-stage Bayes).

This is a REVISION sheet — you cannot bring it in. The exam gives its own formula sheet, so memorise the METHODS, not the formulas: the recipe wins marks, the formula is handed to you.

Sia → Everything must be hand-computable. Practise Gaussian elimination, char-poly eigenvalues and Bayes by hand until fluent — no calculator on the day.

1 · Derivatives · RulesArea 1 · L4–5

f'(a) = slope of tangent = limx→a (f(x)−f(a))/(x−a). |x| is not differentiable at 0.

f(x)f'(x)
c (const)0
xᵇb·x^(b−1)
ln(a)·aˣ
ln x1/x
logₐ x1/(ln(a)·x)

Combining rules(c·f)'=c·f' · (f±g)'=f'±g'
product: (fg)' = f'g + fg'
chain: (f(g(x)))' = g'(x)·f'(g(x))

Common chains(e^(cx+d))' = c·e^(cx+d)
(ln(cx+d))' = c/(cx+d)
(a^(cx+d))' = c·ln(a)·a^(cx+d)

nth derivative f⁽ⁿ⁾ = differentiate n times (f''=f⁽²⁾). Worked tangent slope: for h(x)=x·e^(2x), h'=e^(2x)(1+2x) ⇒ h'(0)=1 (product rule, then evaluate).

1b · Function TypesL1–3

Convex: chord lies on/above plot ⇔ f''≥0. Concave: chord below ⇔ f''≤0. Lines are both.

Transform-plot trick: log-log linear ⇒ power law (slope −a); log-lin linear ⇒ exponential (slope ln a); lin-log ⇒ logarithmic.

Log & exponential rulesaˣaʸ=a^(x+y) · (aˣ)ʸ=a^(xy) · a⁰=1
logₐ(xy)=logₐx+logₐy · logₐ(xᵇ)=b logₐx
change of base: logₐx = log_b x / log_b a

Line through 2 points: m=(y₂−y₁)/(x₂−x₁), then y=mx+b with b=y₁−mx₁; zero at x=−b/m.

Injective = distinct inputs give distinct outputs; surjective = image fills codomain; bijective = both ⇔ has an inverse.

2 · Optimisation (1 var)Area 1 · L5–6 ★

Stationary point s: f'(s)=0. Sign of f' gives increase (f'>0) / decrease (f'<0).

2nd-derivative test (at stationary a)f''(a) > 0 ⇒ local min
f''(a) < 0 ⇒ local max
f''(a) = 0 ⇒ inconclusive (x³,x⁴,−x⁴)

First-deriv (sign) test: at stationary m, f' goes +→− ⇒ max; −→+ ⇒ min. Inflection = concavity changes.

Global extrema on [c,d]candidates = stationary pts + singular pts (f' undef) + endpoints c,d.
Evaluate f at ALL ⇒ largest = max, smallest = min.

Shortcut: local min of a convex f is the global min; local max of concave = global max.

Quadratic roots: x²+ax+b=0 ⇒ x=−a/2±√(a²/4−b); real iff a²≥4b. (x−u)(x−v)=x²−(u+v)x+uv.

Sia → The long-answer "global extrema on [a,b]" question forgets the endpoints at your peril — they are candidates too. Always tabulate f at every candidate.

2b · Worked · Extrema on intervalrenumbered

f'(t)=t³−5t²+6t=t(t−2)(t−3) on [−1,3]. Stationary t=0,2,3.

f''=3t²−10t+6: f''(0)=6>0 (min), f''(2)=−2<0 (max), f''(3)=3>0 (min). Compare f at {−1,0,2,3} ⇒ pick global max/min by value.

Cubic variant: f(x)=2x³−3x²−12x+4 on [−3,3] ⇒ stationary x=−1,2; global max (−1,11), global min at the endpoint (−3,−41). The endpoint wins — count it.

2c · Worked · Can min surfaceapplied

Volume πr²h = 128π ⇒ h=128/r². Surface f(r)=2π(r²+128/r); f'=2π(2r−128/r²)=0 ⇒ r³=64 ⇒ r=4. f''>0 (convex) ⇒ global min; h=128/16=8.

Page-layout variant: printable A=(x−4)(y−6) with xy=294 ⇒ A=318−6x−1176/x; A'=−6+1176/x²=0 ⇒ x=14, A''<0 (concave, so a max), y=21.

Convexity-on-interval trap: for f''=6ax−12 to be neither convex nor concave on (2,3), require f'' to change sign there ⇒ solve for the parameter range, don't just plug one point.

Constrained product, variants: max xy s.t. 2x+3y=60 ⇒ 150; s.t. x+3y=60 ⇒ 300. Same substitute-then-optimise recipe; check it's a max (f''<0), not a min.

Singular points (f' undefined, e.g. a corner like |x| at 0) are candidates too — don't only solve f'=0. The Extreme Value Theorem guarantees a continuous f on [c,d] attains both extrema.

2d · RSS / least squaresapplication

Residual sum of squaresRSS = Σᵢ (yᵢ − f(xᵢ))²

Fit f(x)=2x−1 to (2,1),(3,4),(5,2): preds 3,5,9; residuals −2,−1,−7 ⇒ RSS = 4+1+49 = 54. Squares are differentiable & punish big errors.

Workflow: ① f', solve f'=0 + find singular pts; ② classify with f'' (or sign change); ③ compare f at stationary/singular/boundary; use convexity if available.

3 · Integration · FTCArea 1 · L7

Antiderivative F: F'=f, unique up to +c. ∫f dx = F(x)+c.

f(x)∫f dx
xᵃ (a≠−1)x^(a+1)/(a+1)+c
x⁻¹ln|x|+c
e^(ax)(1/a)e^(ax)+c

Fundamental Theorem of Calculus∫ₐᵇ f(x)dx = F(b) − F(a)
G(x)=∫ₐˣ f(z)dz ⇒ G'=f

Linearity ∫(f+g)=∫f+∫g, ∫cf=c∫f. Additivity ∫ₐᶜ=∫ₐᵇ+∫_bᶜ (piecewise).

Rate ⇒ total: if E'(x)=rate, total change = ∫ rate dx.

Sia → A definite integral is signed area — area below the x-axis counts negative. Sketch first if signs are in doubt.

3b · Worked · Definite + FTCrenumbered

∫₀² (x³−6x²)dx = [x⁴/4 − 2x³]₀² = 4 − 16 = −12.

Rate→total (battery): cost falls at rate (x+1)⁻², start $5. D(4)=5+∫₀⁴ −(x+1)⁻²dx = 5 + [(x+1)⁻¹]₀⁴ = 5 + (1/5 − 1) = $4.20 = 21/5. Antiderivative chosen so D(0)=5.

Look-up integrals (used in probability): ∫xe⁻ˣdx = −e⁻ˣ(x+1)+c; ∫e^(−x²/2)dx is not elementary (the normal's normaliser uses erf — hence z-tables). ∫₂(6x²+6x−4)dx-type slots are routine power-rule.

3c · Σ / Π Notationslots 1–2

Σ_{x=a}^{b} f(x)=f(a)+…+f(b); Π is the product. Empty sum = 0, empty product = 1. e.g. Σ_{k=1}^{4} k² = 1+4+9+16 = 30; Π_{k=1}^{4} k = 4! = 24. ℕ={0,1,2,…} here (0 is natural). "iff" = if and only if.

4 · VectorsArea 2 · L8

ℝᵈ = column d-tuples; add & scale component-wise.

Dot product & normv·w = v₁w₁+…+v_dw_d
‖v‖ = √(v·v) = √(v₁²+…+v_d²)
orthogonal ⇔ v·w = 0

Linear comb. w=a₁v₁+…+aₙvₙ. Linearly dependent = one vᵢ is a combo of the rest. Pairwise-orthogonal nonzero vectors are independent.

Worked norm: ‖(3,12,−4)‖=√(9+144+16)=√169=13; ‖(−8,9,−12)‖=√289=17. Orthogonal solve: (2,−1,z)·(3,4,1)=0 ⇒ 6−4+z=0 ⇒ z=−2.

Line interval joining u,v = {αu+(1−α)v : α∈[0,1]}. Geometrically a vector = displacement (direction + length, no fixed position).

4b · Inverse FunctionsL2

g=f⁻¹ ⇔ g(f(x))=x and f(g(y))=y. f⁻¹(x) ≠ 1/f(x). To find f⁻¹: solve y=f(x) for x. f has an inverse iff bijective (injective + surjective).

Worked: f(x)=x²+4x on [0,∞), find f⁻¹(32): x²+4x=32 ⇒ x²+4x−32=0 ⇒ (x+8)(x−4)=0 ⇒ x=4 (take the non-negative root).

4c · Constrained Productsubstitution

Max xy s.t. 2x+5y=100: y=(100−2x)/5 ⇒ maximise (1/5)(100x−2x²); deriv 100−4x=0 ⇒ x=25, y=10 ⇒ xy=250. Substitute the constraint, reduce to one variable, then optimise — also a Lagrange-style multivariable framing.

5 · Matrices & SystemsArea 2 · L8–10 ★

Mult: A (m×n)·B (n×r) = (m×r); (AB)ᵢⱼ = row i of A · col j of B. AB ≠ BA in general. A(BC)=(AB)C.

Gaussian elimination — 3 valid row ops① swap two rows
② multiply a row by a nonzero k
③ add a multiple of one row to another

Row-reduce Ax=b to upper-triangular, then back-substitute. Apply ops top-to-bottom, NOT simultaneously.

OutcomeSignal
No solutionrow 0=3 (contradiction)
Uniquefull pivots
∞ manyrow 0=0 ⇒ free var t

Free variable ⇒ write solution in vector form (point + t·direction). Fewer equations than variables ⇒ usually ∞ many.

6 · Determinant & InverseArea 2 · L10

2×2 det & inversedet[a b; c d] = ad − bc
A⁻¹ = (1/det A)·[d −b; −c a]

A invertible ⇔ det(A) ≠ 0. det(AB)=det(A)det(B); det(I)=1. Identity Iₙ: 1s on diagonal, AI=IA=A.

Solve via inverseA invertible ⇒ Ax=b has unique x = A⁻¹b

Worked: det[1,1;−1,1]=1−(−1)=2 ⇒ invertible. det[1,1;1,1]=0 ⇒ not invertible (repeated row). Singular-parameter Q: set ad−bc=0, solve.

If A is square but not invertible, Ax=b has either no solution or infinitely many (never a unique one).

6b · Worked · Ax=brenumbered

3x+y=2, 5x+2y=3 ⇒ det=1, A⁻¹=[2,−1;−5,3]; x=A⁻¹b = (2·2−1·3, −5·2+3·3) = (1,−1). Reuse A⁻¹ for many b.

6c · Worked · Gaussian elimrenumbered

Solve  x+2y−z=6 ; −x+y+2z=3 ; x+y−z=8.

R2+R1, R3−R1: gives 3y+z=9 and −y=2 ⇒ y=−2; back-sub z=9−3y=15? recheck signs in your own working — the discipline is clear one column at a time, top-to-bottom, then back-substitute and verify in the original.

6d · Free-Variable Systemsparametrise

Underdetermined (fewer equations than unknowns) ⇒ a row collapses to 0=0 ⇒ one free variable t∈ℝ.

Express each pivot variable in terms of t, then write the solution set as point + t·direction (a line) — e.g. (x,y,z,w) = (a,b,c,0) + t(p,q,r,1). State "for all t∈ℝ"; two free variables ⇒ a plane.

Two-variable products (Hadamard) exist but are NOT the matrix product used here — always use the row·column rule. (kA)B = k(AB).

Worked 2×2 mult: [1,2;0,1]·[3;4] = [1·3+2·4; 0·3+1·4] = [11;4]. Mult is defined only when columns of A = rows of B.

A vector is an m×1 matrix. Identity Iₙ acts as 1: AIₙ=A. If BA=I then AB=I, so each is the other's inverse. A(B+D)=AB+AD distributes.

7 · Eigenvalues & EigenvectorsArea 2 · L11–12 ★

For square A: Ax = λx, x ≠ 0. x = eigenvector, λ = eigenvalue. Zero vector is never an eigenvector; λ=0 can be an eigenvalue.

Recipe① eigenvalues: det(A − λI) = 0 (char. poly)
② eigenvectors: solve (A − λI)x = 0
(always ∞ many — scalar multiples)

If v is an eigenvector so is cv (c≠0). Char-poly degree = n; roots = the eigenvalues.

DiagonalisationA = P D P⁻¹ · P = (v₁|…|vₙ), D = diag(λᵢ)
⇒ Aⁿ = P Dⁿ P⁻¹ (Dⁿ = diag raised to n)

Construct when n distinct eigenvalues. Use: linear recurrences/Markov processes aₙ=Vⁿa₀; long run ruled by the largest eigenvalue (=1 for stochastic); PageRank = eigenvector for λ=1.

Sia → P⁻¹ is usually not needed for the long-answer — examiners want eigenvalues, eigenvectors and the assembled P, D. Show det(A−λI)=0 explicitly.

7b · Worked · Diagonalise 2×2renumbered

A=[0,2;−1,3]: det(A−λI)=λ²−3λ+2=(λ−1)(λ−2) ⇒ λ=1,2.

λ=2: (A−2I)x=0 ⇒ v=[1,1]ᵀ. λ=1: v=[2,1]ᵀ. So P=[1,2;1,1], D=[2,0;0,1], A=PDP⁻¹.

Sanity: trace 0+3=3=1+2 ✓; det 0·3−2·(−1)=2=1·2 ✓.

7c · Worked · Aⁿ recurrencematrix powers

cₙ₊₁=4cₙ+4uₙ, uₙ₊₁=cₙ+4uₙ ⇒ A=[4,4;1,4]. det(A−λI)=λ²−8λ+12=(λ−2)(λ−6) ⇒ λ=2,6; eigenvectors [2,−1]ᵀ,[2,1]ᵀ. Long-run ratio → dominant eigenvalue 6.

7d · Diagonal Matriceswhy diag is easy

D (Dᵢⱼ=0 off-diagonal): Dⁿ raises each diagonal entry to n. That is the whole point of A=PDP⁻¹ — push the power onto D where it's trivial, then conjugate back.

Char-poly check: for a 2×2, det(A−λI)=λ² − (trace)λ + det. So λ²−3λ+2 came from trace 3, det 2. Sum of eigenvalues = trace; product = det — a fast sanity check.

7e · Worked · Eigenvector unknownshort slot

Given λ is an eigenvalue and an eigenvector of the form (x,1,z)ᵀ, solve (A−λI)v=0 row-by-row for x and z (a small linear system). Likewise "find the missing entry of A⁻¹": use A·A⁻¹=I and read off one equation.

7f · Eigen Recipe Recapstep list

  1. Form A−λI; expand det(A−λI)=0 to the characteristic polynomial
  2. Solve for the roots λ₁,…,λₙ (the eigenvalues)
  3. For each λ, row-reduce (A−λI)x=0; the free variable gives the eigenvector (pick the neatest scalar multiple)
  4. Distinct λ ⇒ assemble P=(v₁|…|vₙ), D=diag(λᵢ) ⇒ A=PDP⁻¹
  5. For powers/recurrences quote Aⁿ=PDⁿP⁻¹

8 · Multivariable CalculusArea 3 · L13–16 ★

Partial derivative fₓ: differentiate, treat y as constant; f_y treats x as constant.

Gradient∇f = [fₓ ; f_y]
points in direction of steepest increase
∇f ⟂ the level set through the point

Level set of value c = {(x,y): f=c}; level curves never cross. Linear approx: f(x+Δx,y+Δy) ≈ f + fₓΔx + f_yΔy.

Stationary point∇f = 0 ⇔ fₓ = 0 AND f_y = 0

Types: local min, local max, saddle (max one way, min the other). For nice f, mixed partials equal: fₓy = f_yx.

9 · Hessian TestArea 3 · L15–16 ★

Hessian & discriminantH = [fₓₓ fₓy ; f_yx f_yy]
D = det H = fₓₓ·f_yy − (fₓy)²

D = det HVerdict
D>0, fₓₓ>0local min
D>0, fₓₓ<0local max
D<0saddle
D=0inconclusive

Convex (2-var): fₓₓ≥0, f_yy≥0 AND det H≥0 everywhere ⇒ local min = global min.

Sia → Solve ∇f=0 by factoring, not expanding — e.g. fₓ=x(x−4y)=0 splits into x=0 or x=4y. Test every stationary point with its own H.

9b · Worked · Classify all stat. ptsrenumbered

f=2x+x²+x²y−2xy²: ∇f=(2+2x+2xy−2y², x²−4xy); H=[2+2y, 2x−4y; 2x−4y, −4x].

f_y=x(x−4y)=0. Cases give (0,±1),(−4,−1): det H=−16<0 ⇒ saddles; (−3,−3): det H>0, fₓₓ>0 ⇒ local min.

9c · Worked · ∇g & det Hshort slots

g=2xy(x²+y): ∇g at (2,1) = (26,24). Stationary of f=x−ln(x²+y²): ∇f=0 ⇒ (2,0). Given ∇f=(2xy²+2x, 2x²y−3y²) ⇒ det H(1,−1)=16.

9d · How the Islands Connectrecurring themes

  • Saddle points have no 1-var analogue — they're why the 2-var test needs det H, not just fₓₓ
  • Convexity ⇒ global extremum in BOTH 1-var (f''≥0) and 2-var (det H≥0 + fₓₓ≥0)
  • Matrix multiplication drives linear systems, Aⁿ eigen-powers, Markov/PageRank AND graph walk-counting (Aᵏ)
  • Gaussian elimination solves Ax=b AND finds eigenvectors via (A−λI)x=0
  • Integration is the engine for continuous probability (pdf normalisation, E, Var on side 2)

Formula Beltside 1

(xᵇ)'=bx^(b−1) · (eˣ)'=eˣ · chain g'·f'(g)
2nd-deriv: f''>0 min, <0 max, =0 ?
∫xᵃ=x^(a+1)/(a+1) · FTC F(b)−F(a)
det₂=ad−bc · Ax=b ⇒ x=A⁻¹b
det(A−λI)=0 · Aⁿ=PDⁿP⁻¹
det H = fₓₓf_yy−fₓy² (<0 saddle)

asksia.ai/cheatsheet/
mat9004 · side 1/2
AskSiaCheatsheet Series
Independent revision aid · check learning.monash.edu for exam rules · © 2026
flip → for side 2 · counting, probability & graphs
MAT9004
Mathematical Foundations for Data Science & AI
Monash University · Faculty of Science
Exam Revision
Sem 1 2026 · Side 2 of 2
Counting · probability · graphs
SIDE 2/2   DISCRETE · Counting · Selection types · Binomial · Inclusion–exclusion · Probability · Bayes · Distributions · Graphs · Trees · Euler/Hamilton Revision sheet · closed-book exam Compiled by AskSia · mapped to the MAT9004 syllabus · asksia.ai/cheatsheet/monash-mat9004

10 · Counting · Basic RulesArea 4 · L17

Product & sum rulesAND (sequence) ⇒ multiply: |S| = Π kⱼ
OR (disjoint cases) ⇒ add: |S| = Σ |Sⱼ|
complement: |good| = |S| − |bad|

Keyword test: "and"/"then" ⇒ ×; "or"/disjoint cases ⇒ +. n! = 1·2·…·n; 0! = 1. Sequences with kⱼ choices each ⇒ |S| = Π kⱼ.

11 · Four Selection TypesArea 4 · L17–18 ★

Draw r from n — ask: order matter? repeats allowed?

No repeatWith repeat
Orderedn!/(n−r)!
UnorderedC(n,r)C(n+r−1, r)

Binomial coefficientC(n,r) = "n choose r" = n!/(r!(n−r)!)

Permutation of all n = n!. Unordered-with-repetition ↔ multisets (stars & bars: r stars, n−1 bars). Ordered-no-repeat = n(n−1)…(n−r+1).

Sia → Decide the two yes/no questions before writing anything — "order? repeats?" picks the cell. The classic trap is reading "choose a committee" (unordered) as a permutation.

11b · Worked · Selectionsrenumbered

Password 10 symbols from 31 = ordered+repeat = 31¹⁰. All-distinct = 31·30·…·22 = 31!/21!.

Committee C(12,9)=C(12,3)=220. C(11,6)=462.

Buy 8 loaves, 6 types (unordered+repeat) = C(13,8)=C(13,5)=1287.

4-digit all-odd, no repeat: 5·4·3·2 = 120. 3-digit all-even, no repeat, no leading 0: 4·4·3 = 48.

Alphabetical-order passwords (letters must be in order) ⇒ one per unordered selection: no repeats = C(26,10); with repeats = C(35,10).

12 · Binomial TheoremArea 4 · L18

Expansion & identities(x+y)ⁿ = Σ_{r=0}^{n} C(n,r) x^(n−r) yʳ
row sum: Σ_r C(n,r) = 2ⁿ
Pascal: C(n,r)=C(n−1,r−1)+C(n−1,r)
symmetry: C(n,r)=C(n,n−r)

2ⁿ = total subsets of an n-set. Grid paths across m×k = a binomial coefficient (choose where the up-moves go).

"Express count as a power, find n": 32⁷ = (2⁵)⁷ = 2³⁵ ⇒ n=35; 27⁷ = (3³)⁷ = 3²¹ ⇒ n=21. (Rewrite the base as a prime power, then match exponents.) Symmetry C(n,r)=C(n,n−r) keeps the arithmetic small.

12b · Exactly-k & Positionschoose-then-fill

Count arrangements with exactly k of a special kind: choose the positions × fill them × fill the rest. e.g. a 10-symbol string with exactly 3 specials (5 each) and 7 letters (26 each) = C(10,3)·5³·26⁷.

"At least one" via complement: count = (total) − (none). e.g. fleets with ≥1 van = C(235,3) − C(171,3) (all fleets minus van-free fleets) — usually far easier than summing cases. Same idea as Pr(Ā)=1−Pr(A).

13 · Inclusion–Exclusion & PigeonholeArea 4 · L18

Two / three sets|A∪B| = |A|+|B| − |A∩B|
|A∪B∪C| = Σ singles − Σ pairs + |A∩B∩C|

None-of-N (general I–E): |U| − Σ singles + Σ pairs − Σ triples + … + (−1)ᴺ(all-N). Same structure as Pr(A∪B).

Pigeonholen items, m boxes, n > m ⇒ some box ≥ 2
generalised ⇒ some box ≥ ⌈n/m⌉

Use: "must two share…?" or "can all be distinct?" — e.g. 10 people, degrees in {0,…,9}: 0 and 9 can't coexist ⇒ two share a friend-count.

Worked I–E: |U|=100, |A|=30, |B|=40, |A∩B|=10 ⇒ satisfying neither = 100−30−40+10 = 40. (Same add-subtract pattern as Pr(A∪B) — the two topics share one identity.)

Three-set "none": |U|−Σ|Aᵢ|+Σ|Aᵢ∩Aⱼ|−|A₁∩A₂∩A₃|. Venn diagrams make the signs obvious. The general rule alternates sign: +singles? no — subtract singles, add pairs, subtract triples…

14 · Probability FoundationsArea 5 · L19

Sample space S; Pr:S→[0,1], Σ Pr(s)=1. Event A⊆S, Pr(A)=Σ_{x∈A}Pr(x).

Core rulesuniform: Pr(A) = |A|/|S| (pure counting)
complement: Pr(Ā) = 1 − Pr(A)
union: Pr(A∪B)=Pr(A)+Pr(B)−Pr(A∩B)
independent: Pr(A∩B)=Pr(A)·Pr(B)

Mutually exclusive ⇒ Pr(A∩B)=0 ⇒ Pr(A∪B)=Pr(A)+Pr(B). Independent repeated trials: Pr((s₁,s₂))=Pr(s₁)Pr(s₂).

Worked: Pr(A)=0.3, Pr(B)=0.4, Pr(A∪B)=0.5 ⇒ Pr(A∩B)=0.3+0.4−0.5=0.2; since 0.2≠0.3·0.4=0.12, A and B are not independent.

Each trial yields exactly one outcome; the unit's probability is discrete apart from the continuous distributions in L22.

15 · Conditional & BayesArea 5 · L20 ★

Conditional & multiplicationPr(A|B) = Pr(A∩B)/Pr(B)
⇒ Pr(A∩B) = Pr(A|B)·Pr(B)

Total probability & BayesPr(B) = Pr(B|A)Pr(A) + Pr(B|Ā)Pr(Ā)
Pr(A|B) = Pr(B|A)Pr(A) / Pr(B)

Independence ⇔ Pr(A)=Pr(A|B). Use a tree diagram; watch the base-rate fallacy (rare-disease tests).

Sia → The long-answer Bayes is two steps: total probability first (the denominator), then divide. Write both fractions — method marks survive an arithmetic slip.

15b · Worked · Two-stage Bayesrenumbered

Supplier A: 20% of parts, 20% late. B: 80%, 5% late.

Pr(late)=0.2·0.2+0.8·0.05 = 0.04+0.04 = 0.08. Pr(A|late)=0.04/0.08 = 1/2.

Commuter variant: walks 10% (late 60%), drives 90% (late 20%). Pr(walk|late)=0.06/(0.06+0.18)=1/4; Pr(drove|on-time)=0.72/0.76=18/19.

16 · Random VariablesArea 5 · L21 ★

RV X: S→ℝ. Distribution = list Pr(X=x). Independent ⇔ Pr(X=x,Y=y)=Pr(X=x)Pr(Y=y) for all x,y.

Expectation & varianceE[X] = Σ pᵢxᵢ (weighted average)
Var[X] = E[(X−μ)²] = E[X²] − μ²
σ = √Var[X]

AlgebraE[X+Y]=E[X]+E[Y] (even if dependent!)
E[kX]=kE[X] · Var[kX]=k²Var[X]
indep: E[XY]=E[X]E[Y], Var[X+Y]=Var[X]+Var[Y]

Linearity-of-expectation (indicator trick): E[Σ Xᵢ]=Σ E[Xᵢ] works even when the Xᵢ are dependent — e.g. expected #adjacent-odd-pairs in a 4-digit PIN = 3·(1/4) = 3/4.

16b · Worked · E & Varrenumbered

Die X (1–6): E=3.5. Coin game: 2H→+24, 1H→+2, 0H→−8 ⇒ E = 24·¼+2·½−8·¼ = 6+1−2 = 5.

Two indep uniforms: Var(2X+3Y)=4Var(X)+9Var(Y). X∈{3,7,11} probs ¼,¼,½ ⇒ E[X]=8. Var uses E[X²]−μ², not E[X]².

17 · DistributionsArea 5 · L22 ★

DistEVar
Bernoulli(p)pp(1−p)
Binomial(n,p)npnp(1−p)
Geometric(p)(1−p)/p(1−p)/p²
Unif {a..b}(a+b)/2((b−a+1)²−1)/12
Unif [a,b](a+b)/2(b−a)²/12
Expon(λ)1/λ1/λ²
Normal(μ,σ)μσ²

Binomial Pr(X=k)=C(n,k)pᵏ(1−p)^(n−k); geometric Pr(X=k)=p(1−p)ᵏ (k failures first). Z=(X−μ)/σ = standard normal score.

Recognise the model: single success/fail ⇒ Bernoulli; #successes in n trials ⇒ Binomial; #failures before first success ⇒ Geometric; sums/large counts ⇒ tend to Normal (CLT intuition).

17b · Continuous · PDFnormalise!

PDF requirementsf ≥ 0 everywhere AND ∫ f over domain = 1
Pr(a≤X≤b)=∫ₐᵇ f dx · Pr(X=a)=0

E[X]=∫ x f dx; Var=∫ x²f dx − (E[X])². Find k: h=kx(1−x) on (0,1) ⇒ ∫=k/6=1 ⇒ k=6. Variant kx²(1−x) ⇒ k=12. Always set ∫f=1 first, then compute moments.

Worked E,Var: h(x)=(x−1)² on (0,2): ∫h=1 ✓, E[X]=∫x h=1, Var=∫x²h−1=1/5. (Integration is the engine for continuous distributions — Area 1 feeding Area 5.)

Continuous uniform on [a,b]: f=1/(b−a). Exponential(λ) f=λe^(−λx), x≥0 — memoryless, the Poisson waiting time. ≤ and < are interchangeable since Pr(X=a)=0.

17c · Sum of Independent RVsconvolution

Distribution of Z=X+Y: sum probabilities over all (x,y) with x+y=z (enumerate the pairs). For a count of successes across n independent Bernoulli(p) trials, the sum is exactly Binomial(n,p).

Law of large numbers (intuition): the average of n iid trials → E[X] as n→∞.

18 · Graphs · VocabularyArea 6 · L23

Graph = vertices + edges (pairs). Simple = no loops, no parallel edges. Walk = vertex sequence joined by edges; length = #steps. Path = walk with all-distinct vertices.

Connected = every pair joined by a path. Cycle = walk start=end, length ≥3 (simple). k-regular = every vertex degree k.

Same graph can be drawn many ways — crossings/curvature don't matter.

19 · Degrees & HandshakingArea 6 · L23 ★

Handshaking lemmaΣ deg(v) = 2·|E|
⇒ #edges = (Σ deg)/2
⇒ an EVEN number of vertices have odd degree

Degree = #incident edges. Path: endpoints deg 1, internal deg 2; cycle: all deg 2; k-regular: all deg k. Σ deg odd ⇒ no graph; |E| from a valid sequence = Σ/2.

Feasibility tool: if Σ deg is odd, no such graph. No 3-regular graph on 7 vertices (Σ=21 odd).

Sia → "Can such a graph exist?" ⇒ reach for handshaking: sum the degrees, check it's even, and check it doesn't exceed the max possible (or break tree limits).

19b · Worked · Degree sequencerenumbered

(1,1,1,1,2,2,4): Σ=12 ⇒ 6 edges. (1,2,2,2,3,3,3): Σ=16 ⇒ 8 edges. (3,3,3,1,1): Σ=11 odd ⇒ impossible.

3-regular on 10 vertices ⇒ 10·3/2 = 15 edges. (3,3,3,3,4): Σ=16 ⇒ 8 edges. (3,2,1,0) on 4 vertices: Σ even but no such graph (deg-3 needs all others as neighbours, contradicting deg-0).

20 · TreesArea 6 · L24 ★

Tree = connected + no cycles. Equivalent: any two vertices joined by a unique path; deleting any edge disconnects it; adding any edge makes a cycle.

Edge counta tree on n vertices has exactly n − 1 edges

So 10 vertices + 8 edges can't be connected. Spanning tree = tree inside G with all vertices (delete one edge per cycle). Removable edges keeping connectivity = |E| − (|V|−1). A tree with a Hamilton path is a path.

20b · Counting Cycle Graphsrenumbered

How many distinct cycle graphs on {A,B,C,D}? Each vertex has degree 2; fix A's two neighbours (C(3,2)=3 ways) and each choice completes uniquely ⇒ 3 cycles. (Counting meets graphs.)

20c · Graph Drawingssame graph

Two drawings are the same graph if vertices & edges match — bends, crossings and positions are irrelevant. Deleting any vertex of a cycle leaves a connected path.

Walk vs path: a walk may repeat vertices; a path may not. If any walk A→B exists, a path A→B also exists. Digraph edges have direction; a multigraph allows loops/parallel edges. A cycle has length ≥3 in a simple graph.

21 · Adjacency MatrixArea 6 · L24 ★

A: aᵢⱼ=1 if vᵢ adjacent to vⱼ, else 0. Symmetric for undirected simple graphs. From A: #vertices = dimension; #edges = (count of 1s)/2; #components by tracing connectivity.

Walk counting — links to Area 2(Aᵏ)ᵢⱼ = number of walks of length k from vᵢ to vⱼ

Compute walk counts by matrix powers (same machinery as eigen/diagonalisation). Worked read-off: a 9×9 0/1 matrix with twelve 1s ⇒ 9 vertices, 6 edges; trace components.

(A²)ᵢᵢ = degree of vᵢ (number of length-2 closed walks = its neighbours). The diagonal of A is all 0 for a simple graph (no loops).

Sia → Aᵏ ties graphs back to linear algebra — if you can diagonalise A you can get walk counts for any k. Same toolbox, two topic islands.

22 · Euler vs HamiltonArea 6 · L24

TypeVisitsExists iff
Euler circuitevery edge once, closedconnected & ALL degrees even
Euler trailevery edge once, openconnected & ≤2 odd-deg
Hamilton cycleevery vertex oncehard (no easy test)

Euler = edges, Hamilton = vertices. Königsberg's bridges: 4 odd-degree vertices ⇒ no Euler trail. Hamilton sufficient (Dirac): simple, n≥3, every degree ≥ n/2 ⇒ hamiltonian (not necessary). Hamiltonicity relates to P vs NP.

Mnemonic: Euler walks the edges (every bridge once); Hamilton visits the towns (every vertex once). Euler has a clean degree test; Hamilton has none easy — that's the whole difference.

A connected graph always has a spanning tree (delete one edge from each cycle until acyclic) — the backbone for counting removable edges.

22b · Worked · Tree feasibilityrenumbered

Can a 101-vertex graph with ten vertices of degree 11 be a tree? Σ deg ≥ 11·10 + 91·1 = 201 ⇒ |E| ≥ 201/2 ⇒ > 100 = n−1. No — a tree needs exactly 100 edges.

Cycle from edges: adjacency 1s at (2,3),(2,4),(3,5),(4,5) form the walk 2-3-5-4-2 = a cycle ⇒ not a tree (trees are acyclic).

22c · Distinct-degree trappigeonhole

Can 10 people all have a different number of friends? Degrees must lie in {0,…,9}, but degree 0 and degree 9 cannot coexist (the deg-9 person is friends with everyone, contradicting the deg-0 person) ⇒ only 9 usable values for 10 people ⇒ two must share. No.

22d · Removable Edgesspanning tree

Edges you can delete while staying connected = |E| − (|V|−1) (drop one per independent cycle until only a spanning tree remains). 3-regular on 10 vertices: 15 − 9 = 6 removable.

23 · Method-Trigger Checklistread the verb

Question says…Reach for
"max/min on [a,b]"stat pts + endpoints
"max/min/saddle f(x,y)"∇f=0 then det H
"solve the system"Gaussian elim
"not invertible"set det = 0
"A=PDP⁻¹ / Aⁿ"det(A−λI)=0
"how many ways"order? repeat? ⇒ cell
"committee / choose"C(n,r)
"given … find Pr(·|·)"total prob + Bayes
"E[·] / Var(·) from pdf"normalise, ∫xf, ∫x²f
"# edges from degrees"handshaking /2
"walks of length k"(Aᵏ)ᵢⱼ
"can graph exist?"parity / n−1 edges

23b · Counting → Probabilitythe bridge

Uniform probability is pure counting: Pr(A)=|A|/|S|, so every Area-4 selection formula feeds straight into Area 5. Binomial probabilities ARE C(n,k) weighted by pᵏ(1−p)^(n−k).

Independence test for RVs: one mismatched cell Pr(X=x,Y=y)≠Pr(X=x)Pr(Y=y) disproves independence; check before using Var[X+Y]=Var[X]+Var[Y].

Inclusion–exclusion ↔ Pr(A∪B) are the same identity; counting ↔ uniform probability via |A|/|S|; Aᵏ walk-counting ↔ eigen-powers. Six islands, one toolbox.

24 · Exam Disciplinedon't lose marks

  • Short-answer = integer or lowest-terms a/b, no spaces, no decimals
  • No calculator — keep arithmetic clean, check by substitution
  • Long-answer: show the working; method marks survive a wrong final number
  • Gaussian elim: ops top-to-bottom, never simultaneously; verify in the original system
  • Global extrema: include the endpoints
  • Bayes: compute the denominator (total prob) first
  • Hand-written parts: scan + upload in the 30 min after the e-exam
  • Counting: ask order? repeats? before picking a formula
  • "Is this graph possible?" ⇒ check degree parity and the n−1 edge bound
  • Eigen long-answer: show det(A−λI)=0 and assemble P, D explicitly
Sia → The formula sheet is given — so spend revision time on recipes, not memorising formulas: be able to run each method end-to-end, by hand, under time.

Formula Beltside 2

ordered: n!/(n−r)!, nʳ · unord: C(n,r), C(n+r−1,r)
(x+y)ⁿ=ΣC(n,r)x^(n−r)yʳ · Σ_r C(n,r)=2ⁿ
Pr(A|B)=Pr(B|A)Pr(A)/Pr(B)
Var[X]=E[X²]−μ² · Bin: E=np,Var=np(1−p)
pigeonhole ⌈n/m⌉ · uniform Pr=|A|/|S|
Σ deg=2|E| · tree: n−1 edges · (Aᵏ)ᵢⱼ=walks

asksia.ai/cheatsheet/
mat9004 · side 2/2
AskSiaCheatsheet Series
Independent revision aid · check learning.monash.edu · not affiliated with Monash · © 2026
good luck.   drill the methods.

Want one for YOUR exact syllabus?

Sia is your free desktop study agent. Drop your Monash University MAT9004 slides — Sia builds a sheet tailored to YOUR exam. Better than this library because it knows YOUR materials.

↓ Download Sia · Free