COMP5318 · Machine Learning and Data Mining
Neural Networks & Backpropagation
This chapter of University of Sydney COMP5318 Machine Learning and Data Mining shows how a network is built from simple neurons a = f(w·x + b), why a single perceptron is only a linear classifier (it cannot solve XOR), and how a sigmoid multilayer perceptron (MLP) is trained by gradient descent using backpropagation. In the closed-book final it is examined as a by-hand mechanism: one forward pass, one backpropagation step, a parameter count, and the vanishing-gradient idea.
What this chapter covers
- 01Write a neuron as a = f(w.x + b) and name the roles of the weights, bias and activation
- 02Use the perceptron step activation and see why one unit draws a single linear boundary
- 03Explain why XOR is not linearly separable and needs a hidden layer
- 04Apply the perceptron learning rule w_new = w_old + e.x, b_new = b_old + e
- 05State the sigmoid s(z) = 1/(1 + e^-z) and its derivative s'(z) = o(1 - o)
- 06Run a forward pass through a small sigmoid MLP, hidden units first then the output
- 07Compute the output error signal d = (t - o).o(1 - o) and back-propagate to hidden units
- 08Update weights with the rule dw = eta.d.o_source and update biases the same way
- 09Count trainable parameters of a Dense layer as n_in.n_out + n_out
- 10Describe the vanishing-gradient problem and fixes such as ReLU, better init and dropout
One perceptron learning-rule update, by hand
- +1Weighted sum: n = w1.x1 + w2.x2 + b = 0.2(1) + 0.3(0) + (-0.6) = -0.4.
- +1Activation and error: n = -0.4 < 0, so the step gives a = 0. The error is e = t - a = 1 - 0 = +1, so the point was misclassified.
- +1Update (add e.xi to each weight and e to the bias): w1 = 0.2 + 1(1) = 1.2; w2 = 0.3 + 1(0) = 0.3, unchanged because its input was 0; b = -0.6 + 1 = 0.4.
- +1Re-check the same point: n = 1.2(1) + 0.3(0) + 0.4 = 1.6 >= 0, so a = 1 = t. The point is now classified correctly.
Key terms
- Neuron
- A unit that computes a = f(w.x + b): a weighted sum of its inputs plus a bias, passed through an activation function f.
- Perceptron
- A neuron with a step activation (output 1 if the weighted sum is >= 0, else 0). It is a linear classifier and cannot represent XOR.
- Sigmoid
- The activation s(z) = 1/(1 + e^-z), giving an output in (0, 1) with s(0) = 0.5; smooth and differentiable, with derivative s'(z) = o(1 - o).
- Multilayer perceptron (MLP)
- A feed-forward, fully-connected network with one or more hidden layers of sigmoid (or similar) neurons between the inputs and the output.
- Forward pass
- Computing outputs left to right: each neuron k forms z_k = sum_i w_ik.o_i + b_k, then o_k = s(z_k). Hidden units are computed before the output.
- Backpropagation
- The efficient computation of gradients for gradient descent: after a forward pass, an error signal delta is pushed backwards, so each weight is updated by dw = eta.delta.o_source.
- Error signal (delta)
- For an output neuron delta = (t - o).o(1 - o); for a hidden neuron delta = o(1 - o).sum_i w.delta_i, using the weights from the forward pass.
- Vanishing gradient
- Because the sigmoid derivative is at most 0.25 and shrinks toward 0 when a unit saturates, gradients multiplied back through many layers become tiny, so early layers learn very slowly.
Neural Networks & Backpropagation FAQ
Is neural networks examined by calculation or by explanation in COMP5318?
Both, and usually by calculation. The final is paper-based, 2 hours, closed book with only a non-programmable calculator, and neural networks appear as short numeric items: run a forward pass through a small sigmoid network, take one backpropagation step to update a weight, count trainable parameters of a Dense network, or explain the vanishing-gradient problem in a sentence. Marks are awarded for the method and the correct final number, so always show each substitution.
What is the difference between the perceptron rule and backpropagation?
The perceptron rule trains a single step-activation unit by correcting the weights with the raw error, w_new = w_old + e.x, and converges only if the data is linearly separable. Backpropagation trains a multilayer sigmoid network by gradient descent: it uses the differentiable sigmoid so it can compute an error signal delta at every neuron and update each weight by dw = eta.delta.o_source. In short, the perceptron rule is a one-layer, error-driven fix, while backpropagation propagates gradients through hidden layers.
Can AI help me with neural networks and backpropagation in COMP5318?
Yes, for building understanding rather than for handing in answers. A study tool like Sia can explain, step by step, how a forward pass flows from the hidden units to the output, why the sigmoid derivative is o(1 - o), or how an output delta differs from a hidden delta, and it can check your reasoning on practice numbers you make up. Use it to make the method automatic so you can reproduce it under closed-book exam conditions; do not use it to obtain answers for assessed quizzes or assignments, and always acknowledge any AI tools you use, as the University's academic-integrity policy requires.
Studying with AI? Sia — free AI machine learning tutor works through COMP5318 step by step.
Exam move
Treat backpropagation as a drill you can run without thinking. First, get the forward pass mechanical: compute every hidden unit's z and output, then the output unit, carrying about four decimals and remembering the sigmoid uses e to the minus z. Second, memorise the two delta rules, output delta = (t - o).o(1 - o) and hidden delta = o(1 - o).sum w.delta, and always compute all deltas with the old weights before applying any update dw = eta.delta.o_source. Third, keep the quick wins ready: the perceptron rule (weights on zero inputs never move, and it cannot do XOR) and the parameter count n_in.n_out + n_out per Dense layer with Flatten and pooling contributing zero. On a 2-hour (120-minute) paper, budget about one minute per mark, so a typical 8-mark forward-plus-backward part is roughly 8 minutes, comfortably inside the time. Keep the two hurdles in view: you need at least 40% in the final exam or your unit mark is capped at 45, and at least 50% overall to pass, and you should confirm the exact exam date on the Canvas exam timetable.