Mathematical Foundations for Data Science & AI
Sem 1 2026 · Side 1 of 2
All six topics · the continuous half
0 · Exam Blueprintread first
One paper, six disjoint maths worlds. The final is 60% of the unit AND a hurdle — you must score ≥45/100 on the exam itself to pass, regardless of coursework.
| Item | Detail |
|---|---|
| Weight | 60% · hurdle ≥45% |
| Coursework | 2 assign×10% + 5 quiz×4% |
| Duration | ~3 h 10 min · e-exam |
| Materials | closed-book · NO calc |
| Formula sheet | provided in paper |
| Template | ~36 questions, fixed slots |
~36-slot pattern: ~31 short-answer (1.5–3 marks, answer is an integer or lowest-terms a/b, no spaces, no decimals) + 5 long-answer (6 marks: global extrema · Hessian classify · eigen/diagonalise · free-variable system · two-stage Bayes).
This is a REVISION sheet — you cannot bring it in. The exam gives its own formula sheet, so memorise the METHODS, not the formulas: the recipe wins marks, the formula is handed to you.
1 · Derivatives · RulesArea 1 · L4–5
f'(a) = slope of tangent = limx→a (f(x)−f(a))/(x−a). |x| is not differentiable at 0.
| f(x) | f'(x) |
|---|---|
| c (const) | 0 |
| xᵇ | b·x^(b−1) |
| aˣ | ln(a)·aˣ |
| eˣ | eˣ |
| ln x | 1/x |
| logₐ x | 1/(ln(a)·x) |
Combining rules(c·f)'=c·f' · (f±g)'=f'±g'
product: (fg)' = f'g + fg'
chain: (f(g(x)))' = g'(x)·f'(g(x))
Common chains(e^(cx+d))' = c·e^(cx+d)
(ln(cx+d))' = c/(cx+d)
(a^(cx+d))' = c·ln(a)·a^(cx+d)
nth derivative f⁽ⁿ⁾ = differentiate n times (f''=f⁽²⁾). Worked tangent slope: for h(x)=x·e^(2x), h'=e^(2x)(1+2x) ⇒ h'(0)=1 (product rule, then evaluate).
1b · Function TypesL1–3
Convex: chord lies on/above plot ⇔ f''≥0. Concave: chord below ⇔ f''≤0. Lines are both.
Transform-plot trick: log-log linear ⇒ power law (slope −a); log-lin linear ⇒ exponential (slope ln a); lin-log ⇒ logarithmic.
Log & exponential rulesaˣaʸ=a^(x+y) · (aˣ)ʸ=a^(xy) · a⁰=1
logₐ(xy)=logₐx+logₐy · logₐ(xᵇ)=b logₐx
change of base: logₐx = log_b x / log_b a
Line through 2 points: m=(y₂−y₁)/(x₂−x₁), then y=mx+b with b=y₁−mx₁; zero at x=−b/m.
Injective = distinct inputs give distinct outputs; surjective = image fills codomain; bijective = both ⇔ has an inverse.
2 · Optimisation (1 var)Area 1 · L5–6 ★
Stationary point s: f'(s)=0. Sign of f' gives increase (f'>0) / decrease (f'<0).
2nd-derivative test (at stationary a)f''(a) > 0 ⇒ local min
f''(a) < 0 ⇒ local max
f''(a) = 0 ⇒ inconclusive (x³,x⁴,−x⁴)
First-deriv (sign) test: at stationary m, f' goes +→− ⇒ max; −→+ ⇒ min. Inflection = concavity changes.
Global extrema on [c,d]candidates = stationary pts + singular pts (f' undef) + endpoints c,d.
Evaluate f at ALL ⇒ largest = max, smallest = min.
Shortcut: local min of a convex f is the global min; local max of concave = global max.
Quadratic roots: x²+ax+b=0 ⇒ x=−a/2±√(a²/4−b); real iff a²≥4b. (x−u)(x−v)=x²−(u+v)x+uv.
2b · Worked · Extrema on intervalrenumbered
f'(t)=t³−5t²+6t=t(t−2)(t−3) on [−1,3]. Stationary t=0,2,3.
f''=3t²−10t+6: f''(0)=6>0 (min), f''(2)=−2<0 (max), f''(3)=3>0 (min). Compare f at {−1,0,2,3} ⇒ pick global max/min by value.
Cubic variant: f(x)=2x³−3x²−12x+4 on [−3,3] ⇒ stationary x=−1,2; global max (−1,11), global min at the endpoint (−3,−41). The endpoint wins — count it.
2c · Worked · Can min surfaceapplied
Volume πr²h = 128π ⇒ h=128/r². Surface f(r)=2π(r²+128/r); f'=2π(2r−128/r²)=0 ⇒ r³=64 ⇒ r=4. f''>0 (convex) ⇒ global min; h=128/16=8.
Page-layout variant: printable A=(x−4)(y−6) with xy=294 ⇒ A=318−6x−1176/x; A'=−6+1176/x²=0 ⇒ x=14, A''<0 (concave, so a max), y=21.
Convexity-on-interval trap: for f''=6ax−12 to be neither convex nor concave on (2,3), require f'' to change sign there ⇒ solve for the parameter range, don't just plug one point.
Constrained product, variants: max xy s.t. 2x+3y=60 ⇒ 150; s.t. x+3y=60 ⇒ 300. Same substitute-then-optimise recipe; check it's a max (f''<0), not a min.
Singular points (f' undefined, e.g. a corner like |x| at 0) are candidates too — don't only solve f'=0. The Extreme Value Theorem guarantees a continuous f on [c,d] attains both extrema.
2d · RSS / least squaresapplication
Residual sum of squaresRSS = Σᵢ (yᵢ − f(xᵢ))²
Fit f(x)=2x−1 to (2,1),(3,4),(5,2): preds 3,5,9; residuals −2,−1,−7 ⇒ RSS = 4+1+49 = 54. Squares are differentiable & punish big errors.
Workflow: ① f', solve f'=0 + find singular pts; ② classify with f'' (or sign change); ③ compare f at stationary/singular/boundary; use convexity if available.
3 · Integration · FTCArea 1 · L7
Antiderivative F: F'=f, unique up to +c. ∫f dx = F(x)+c.
| f(x) | ∫f dx |
|---|---|
| xᵃ (a≠−1) | x^(a+1)/(a+1)+c |
| x⁻¹ | ln|x|+c |
| e^(ax) | (1/a)e^(ax)+c |
Fundamental Theorem of Calculus∫ₐᵇ f(x)dx = F(b) − F(a)
G(x)=∫ₐˣ f(z)dz ⇒ G'=f
Linearity ∫(f+g)=∫f+∫g, ∫cf=c∫f. Additivity ∫ₐᶜ=∫ₐᵇ+∫_bᶜ (piecewise).
Rate ⇒ total: if E'(x)=rate, total change = ∫ rate dx.
3b · Worked · Definite + FTCrenumbered
∫₀² (x³−6x²)dx = [x⁴/4 − 2x³]₀² = 4 − 16 = −12.
Rate→total (battery): cost falls at rate (x+1)⁻², start $5. D(4)=5+∫₀⁴ −(x+1)⁻²dx = 5 + [(x+1)⁻¹]₀⁴ = 5 + (1/5 − 1) = $4.20 = 21/5. Antiderivative chosen so D(0)=5.
Look-up integrals (used in probability): ∫xe⁻ˣdx = −e⁻ˣ(x+1)+c; ∫e^(−x²/2)dx is not elementary (the normal's normaliser uses erf — hence z-tables). ∫₂(6x²+6x−4)dx-type slots are routine power-rule.
3c · Σ / Π Notationslots 1–2
Σ_{x=a}^{b} f(x)=f(a)+…+f(b); Π is the product. Empty sum = 0, empty product = 1. e.g. Σ_{k=1}^{4} k² = 1+4+9+16 = 30; Π_{k=1}^{4} k = 4! = 24. ℕ={0,1,2,…} here (0 is natural). "iff" = if and only if.
4 · VectorsArea 2 · L8
ℝᵈ = column d-tuples; add & scale component-wise.
Dot product & normv·w = v₁w₁+…+v_dw_d
‖v‖ = √(v·v) = √(v₁²+…+v_d²)
orthogonal ⇔ v·w = 0
Linear comb. w=a₁v₁+…+aₙvₙ. Linearly dependent = one vᵢ is a combo of the rest. Pairwise-orthogonal nonzero vectors are independent.
Worked norm: ‖(3,12,−4)‖=√(9+144+16)=√169=13; ‖(−8,9,−12)‖=√289=17. Orthogonal solve: (2,−1,z)·(3,4,1)=0 ⇒ 6−4+z=0 ⇒ z=−2.
Line interval joining u,v = {αu+(1−α)v : α∈[0,1]}. Geometrically a vector = displacement (direction + length, no fixed position).
4b · Inverse FunctionsL2
g=f⁻¹ ⇔ g(f(x))=x and f(g(y))=y. f⁻¹(x) ≠ 1/f(x). To find f⁻¹: solve y=f(x) for x. f has an inverse iff bijective (injective + surjective).
Worked: f(x)=x²+4x on [0,∞), find f⁻¹(32): x²+4x=32 ⇒ x²+4x−32=0 ⇒ (x+8)(x−4)=0 ⇒ x=4 (take the non-negative root).
4c · Constrained Productsubstitution
Max xy s.t. 2x+5y=100: y=(100−2x)/5 ⇒ maximise (1/5)(100x−2x²); deriv 100−4x=0 ⇒ x=25, y=10 ⇒ xy=250. Substitute the constraint, reduce to one variable, then optimise — also a Lagrange-style multivariable framing.
5 · Matrices & SystemsArea 2 · L8–10 ★
Mult: A (m×n)·B (n×r) = (m×r); (AB)ᵢⱼ = row i of A · col j of B. AB ≠ BA in general. A(BC)=(AB)C.
Gaussian elimination — 3 valid row ops① swap two rows
② multiply a row by a nonzero k
③ add a multiple of one row to another
Row-reduce Ax=b to upper-triangular, then back-substitute. Apply ops top-to-bottom, NOT simultaneously.
| Outcome | Signal |
|---|---|
| No solution | row 0=3 (contradiction) |
| Unique | full pivots |
| ∞ many | row 0=0 ⇒ free var t |
Free variable ⇒ write solution in vector form (point + t·direction). Fewer equations than variables ⇒ usually ∞ many.
6 · Determinant & InverseArea 2 · L10
2×2 det & inversedet[a b; c d] = ad − bc
A⁻¹ = (1/det A)·[d −b; −c a]
A invertible ⇔ det(A) ≠ 0. det(AB)=det(A)det(B); det(I)=1. Identity Iₙ: 1s on diagonal, AI=IA=A.
Solve via inverseA invertible ⇒ Ax=b has unique x = A⁻¹b
Worked: det[1,1;−1,1]=1−(−1)=2 ⇒ invertible. det[1,1;1,1]=0 ⇒ not invertible (repeated row). Singular-parameter Q: set ad−bc=0, solve.
If A is square but not invertible, Ax=b has either no solution or infinitely many (never a unique one).
6b · Worked · Ax=brenumbered
3x+y=2, 5x+2y=3 ⇒ det=1, A⁻¹=[2,−1;−5,3]; x=A⁻¹b = (2·2−1·3, −5·2+3·3) = (1,−1). Reuse A⁻¹ for many b.
6c · Worked · Gaussian elimrenumbered
Solve x+2y−z=6 ; −x+y+2z=3 ; x+y−z=8.
R2+R1, R3−R1: gives 3y+z=9 and −y=2 ⇒ y=−2; back-sub z=9−3y=15? recheck signs in your own working — the discipline is clear one column at a time, top-to-bottom, then back-substitute and verify in the original.
6d · Free-Variable Systemsparametrise
Underdetermined (fewer equations than unknowns) ⇒ a row collapses to 0=0 ⇒ one free variable t∈ℝ.
Express each pivot variable in terms of t, then write the solution set as point + t·direction (a line) — e.g. (x,y,z,w) = (a,b,c,0) + t(p,q,r,1). State "for all t∈ℝ"; two free variables ⇒ a plane.
Two-variable products (Hadamard) exist but are NOT the matrix product used here — always use the row·column rule. (kA)B = k(AB).
Worked 2×2 mult: [1,2;0,1]·[3;4] = [1·3+2·4; 0·3+1·4] = [11;4]. Mult is defined only when columns of A = rows of B.
A vector is an m×1 matrix. Identity Iₙ acts as 1: AIₙ=A. If BA=I then AB=I, so each is the other's inverse. A(B+D)=AB+AD distributes.
7 · Eigenvalues & EigenvectorsArea 2 · L11–12 ★
For square A: Ax = λx, x ≠ 0. x = eigenvector, λ = eigenvalue. Zero vector is never an eigenvector; λ=0 can be an eigenvalue.
Recipe① eigenvalues: det(A − λI) = 0 (char. poly)
② eigenvectors: solve (A − λI)x = 0
(always ∞ many — scalar multiples)
If v is an eigenvector so is cv (c≠0). Char-poly degree = n; roots = the eigenvalues.
DiagonalisationA = P D P⁻¹ · P = (v₁|…|vₙ), D = diag(λᵢ)
⇒ Aⁿ = P Dⁿ P⁻¹ (Dⁿ = diag raised to n)
Construct when n distinct eigenvalues. Use: linear recurrences/Markov processes aₙ=Vⁿa₀; long run ruled by the largest eigenvalue (=1 for stochastic); PageRank = eigenvector for λ=1.
7b · Worked · Diagonalise 2×2renumbered
A=[0,2;−1,3]: det(A−λI)=λ²−3λ+2=(λ−1)(λ−2) ⇒ λ=1,2.
λ=2: (A−2I)x=0 ⇒ v=[1,1]ᵀ. λ=1: v=[2,1]ᵀ. So P=[1,2;1,1], D=[2,0;0,1], A=PDP⁻¹.
Sanity: trace 0+3=3=1+2 ✓; det 0·3−2·(−1)=2=1·2 ✓.
7c · Worked · Aⁿ recurrencematrix powers
cₙ₊₁=4cₙ+4uₙ, uₙ₊₁=cₙ+4uₙ ⇒ A=[4,4;1,4]. det(A−λI)=λ²−8λ+12=(λ−2)(λ−6) ⇒ λ=2,6; eigenvectors [2,−1]ᵀ,[2,1]ᵀ. Long-run ratio → dominant eigenvalue 6.
7d · Diagonal Matriceswhy diag is easy
D (Dᵢⱼ=0 off-diagonal): Dⁿ raises each diagonal entry to n. That is the whole point of A=PDP⁻¹ — push the power onto D where it's trivial, then conjugate back.
Char-poly check: for a 2×2, det(A−λI)=λ² − (trace)λ + det. So λ²−3λ+2 came from trace 3, det 2. Sum of eigenvalues = trace; product = det — a fast sanity check.
7e · Worked · Eigenvector unknownshort slot
Given λ is an eigenvalue and an eigenvector of the form (x,1,z)ᵀ, solve (A−λI)v=0 row-by-row for x and z (a small linear system). Likewise "find the missing entry of A⁻¹": use A·A⁻¹=I and read off one equation.
7f · Eigen Recipe Recapstep list
- Form A−λI; expand det(A−λI)=0 to the characteristic polynomial
- Solve for the roots λ₁,…,λₙ (the eigenvalues)
- For each λ, row-reduce (A−λI)x=0; the free variable gives the eigenvector (pick the neatest scalar multiple)
- Distinct λ ⇒ assemble P=(v₁|…|vₙ), D=diag(λᵢ) ⇒ A=PDP⁻¹
- For powers/recurrences quote Aⁿ=PDⁿP⁻¹
8 · Multivariable CalculusArea 3 · L13–16 ★
Partial derivative fₓ: differentiate, treat y as constant; f_y treats x as constant.
Gradient∇f = [fₓ ; f_y]
points in direction of steepest increase
∇f ⟂ the level set through the point
Level set of value c = {(x,y): f=c}; level curves never cross. Linear approx: f(x+Δx,y+Δy) ≈ f + fₓΔx + f_yΔy.
Stationary point∇f = 0 ⇔ fₓ = 0 AND f_y = 0
Types: local min, local max, saddle (max one way, min the other). For nice f, mixed partials equal: fₓy = f_yx.
9 · Hessian TestArea 3 · L15–16 ★
Hessian & discriminantH = [fₓₓ fₓy ; f_yx f_yy]
D = det H = fₓₓ·f_yy − (fₓy)²
| D = det H | Verdict |
|---|---|
| D>0, fₓₓ>0 | local min |
| D>0, fₓₓ<0 | local max |
| D<0 | saddle |
| D=0 | inconclusive |
Convex (2-var): fₓₓ≥0, f_yy≥0 AND det H≥0 everywhere ⇒ local min = global min.
9b · Worked · Classify all stat. ptsrenumbered
f=2x+x²+x²y−2xy²: ∇f=(2+2x+2xy−2y², x²−4xy); H=[2+2y, 2x−4y; 2x−4y, −4x].
f_y=x(x−4y)=0. Cases give (0,±1),(−4,−1): det H=−16<0 ⇒ saddles; (−3,−3): det H>0, fₓₓ>0 ⇒ local min.
9c · Worked · ∇g & det Hshort slots
g=2xy(x²+y): ∇g at (2,1) = (26,24). Stationary of f=x−ln(x²+y²): ∇f=0 ⇒ (2,0). Given ∇f=(2xy²+2x, 2x²y−3y²) ⇒ det H(1,−1)=16.
9d · How the Islands Connectrecurring themes
- Saddle points have no 1-var analogue — they're why the 2-var test needs det H, not just fₓₓ
- Convexity ⇒ global extremum in BOTH 1-var (f''≥0) and 2-var (det H≥0 + fₓₓ≥0)
- Matrix multiplication drives linear systems, Aⁿ eigen-powers, Markov/PageRank AND graph walk-counting (Aᵏ)
- Gaussian elimination solves Ax=b AND finds eigenvectors via (A−λI)x=0
- Integration is the engine for continuous probability (pdf normalisation, E, Var on side 2)
Formula Beltside 1
(xᵇ)'=bx^(b−1) · (eˣ)'=eˣ · chain g'·f'(g)
2nd-deriv: f''>0 min, <0 max, =0 ?
∫xᵃ=x^(a+1)/(a+1) · FTC F(b)−F(a)
det₂=ad−bc · Ax=b ⇒ x=A⁻¹b
det(A−λI)=0 · Aⁿ=PDⁿP⁻¹
det H = fₓₓf_yy−fₓy² (<0 saddle)