MAT9004 · Mathematical Foundations For Data Science And Ai
Multivariable Calculus
Multivariable calculus is single-variable calculus done in two variables at once — the maths behind optimising a model with more than one parameter. You start with functions of two variables and their surfaces, read them through level sets and contour maps, then differentiate with partial derivatives (first and second order, including the cross-partials and Clairaut's symmetry). The two key objects are the gradient ∇f — the vector of first partials that points in the direction of steepest ascent — and the Hessian of second partials. Optimisation mirrors the 1-D method: find stationary points where ∇f = 0, then classify each with the Hessian determinant test (the flagship long-answer). The chapter closes with global optimisation and convexity, which is exactly why some machine-learning training problems are easy and others are hard.
What this chapter covers
- 013.1 Functions of two variables and their surfaces
- 023.2 Level sets and contour maps
- 033.3 Partial derivatives
- 043.4 Second-order and cross-partials (Clairaut)
- 053.5 The gradient ∇f — direction of steepest ascent
- 063.6 First-order approximation and the tangent plane
- 073.7 Stationary points: ∇f = 0
- 083.8 The Hessian determinant test
- 093.9–3.10 Full worked classification and global optimisation
- 103.11 Convexity — why training is easy or hard
Worked example: partial derivatives and the gradient
- +1(a) Differentiate in x, treat y as constant: fx = ∂/∂x (x2y + 3y2) = 2xy.
- +1(a) Differentiate in y, treat x as constant: fy = ∂/∂y (x2y + 3y2) = x2 + 6y.
- +1(b) Evaluate at (1, 2): fx(1, 2) = 2·1·2 = 4; fy(1, 2) = 12 + 6·2 = 13.
- +1Write the gradient: ∇f(1, 2) = (4, 13).
Key terms
- Partial derivative
- The derivative of a multivariable function with respect to one variable while holding the others constant, written fx or ∂f/∂x. It measures how f changes as you move along just one axis.
- Gradient
- The vector of first partial derivatives, ∇f = (fx, fy). It points in the direction of steepest ascent, and its zero locates the stationary points: ∇f = 0.
- Hessian
- The matrix of second partial derivatives of a two-variable function. Its determinant at a stationary point classifies that point as a maximum, minimum or saddle — the multivariable analogue of the second-derivative test.
- Saddle point
- A stationary point that is a maximum along one direction and a minimum along another, so it is neither. The Hessian test flags it when the Hessian determinant is negative.
- Convexity
- A function is convex if its graph curves upward everywhere; a convex function has a single global minimum, which is why convex training problems are easy to optimise and non-convex ones are not.
Multivariable Calculus FAQ
How do I take a partial derivative correctly?
Differentiate with respect to the named variable and treat every other variable as a constant. For fx, treat y as a fixed number, so a pure-y term like 3y2 differentiates to zero. The most common slip is forgetting to hold the other variable constant, or dropping a term that survives.
How does the Hessian test classify a stationary point?
At a stationary point where ∇f = 0, compute the Hessian determinant D = fxxfyy − (fxy)2. If D > 0 and fxx > 0 it is a local minimum; if D > 0 and fxx < 0 a local maximum; if D < 0 it is a saddle point; and if D = 0 the test is inconclusive.
What is the gradient actually telling me?
The gradient ∇f at a point is a vector that points in the direction in which the function increases fastest, and its length is that maximum rate of increase. Setting ∇f = 0 finds the flat points — the candidate maxima, minima and saddles — which is the first step of two-variable optimisation.
Why does convexity matter for AI?
Training a model means minimising a loss function. If the loss is convex it has a single global minimum and gradient methods reliably find it. If it is non-convex there can be many local minima and saddle points, so optimisation can get stuck — which is exactly why the chapter ties convexity to whether training is easy or hard.
Exam move
The flagship long-answer is a full two-variable classification, so drill the chain end to end: compute both first partials, set ∇f = 0 and solve the system for every stationary point, then build the Hessian and evaluate D = fxxfyy − (fxy)2 at each point to classify it. Use Clairaut's symmetry (fxy = fyx) as a free check on your cross-partials. Lay the work out cleanly — partials, gradient equation, stationary points, Hessian, verdict — because each stage carries marks. Keep the gradient's meaning (direction of steepest ascent) and the saddle case (D < 0) at your fingertips, since the wording of the conclusion is itself worth a mark.