University of Sydney · FACULTY OF COMPUTER SCIENCE

COMP5318 · Machine Learning and Data Mining

- one subject, every graph, every model, every mark

Computer Science14 Chapters8-page Bible

Our own words - no uploaded lecturer files

Updated for this semester

Chapter 7 of 11 · COMP5318

Neural Networks & Backpropagation

This chapter of University of Sydney COMP5318 Machine Learning and Data Mining shows how a network is built from simple neurons a = f(w·x + b), why a single perceptron is only a linear classifier (it cannot solve XOR), and how a sigmoid multilayer perceptron (MLP) is trained by gradient descent using backpropagation. In the closed-book final it is examined as a by-hand mechanism: one forward pass, one backpropagation step, a parameter count, and the vanishing-gradient idea.

In this chapter

What this chapter covers

01Write a neuron as a = f(w.x + b) and name the roles of the weights, bias and activation
02Use the perceptron step activation and see why one unit draws a single linear boundary
03Explain why XOR is not linearly separable and needs a hidden layer
04Apply the perceptron learning rule w_new = w_old + e.x, b_new = b_old + e
05State the sigmoid s(z) = 1/(1 + e^-z) and its derivative s'(z) = o(1 - o)
06Run a forward pass through a small sigmoid MLP, hidden units first then the output
07Compute the output error signal d = (t - o).o(1 - o) and back-propagate to hidden units
08Update weights with the rule dw = eta.d.o_source and update biases the same way
09Count trainable parameters of a Dense layer as n_in.n_out + n_out
10Describe the vanishing-gradient problem and fixes such as ReLU, better init and dropout

Worked example · free

One perceptron learning-rule update, by hand

Q [4 marks]. A 2-input perceptron has weights w1 = 0.2, w2 = 0.3 and bias b = -0.6, with a step activation (output 1 if the weighted sum is >= 0, else 0). Present the training point x = (1, 0) whose target is t = 1. Compute the output, the error, apply the perceptron learning rule, and confirm the point is now classified correctly.

+1Weighted sum: n = w1.x1 + w2.x2 + b = 0.2(1) + 0.3(0) + (-0.6) = -0.4.
+1Activation and error: n = -0.4 < 0, so the step gives a = 0. The error is e = t - a = 1 - 0 = +1, so the point was misclassified.
+1Update (add e.xi to each weight and e to the bias): w1 = 0.2 + 1(1) = 1.2; w2 = 0.3 + 1(0) = 0.3, unchanged because its input was 0; b = -0.6 + 1 = 0.4.
+1Re-check the same point: n = 1.2(1) + 0.3(0) + 0.4 = 1.6 >= 0, so a = 1 = t. The point is now classified correctly.

New weights w = (1.2, 0.3) and bias b = 0.4; the point x = (1, 0) is now output as 1, matching its target. Only w1 and b changed because the second input was 0.

Sia tip — This fixed-increment perceptron rule has no separate learning rate (it acts as 1). A weight whose input is 0 never moves, and a correctly classified point (e = 0) leaves every weight unchanged. The rule is guaranteed to converge only when the classes are linearly separable, so it can never learn XOR.

Glossary

Key terms

Neuron: A unit that computes a = f(w.x + b): a weighted sum of its inputs plus a bias, passed through an activation function f.
Perceptron: A neuron with a step activation (output 1 if the weighted sum is >= 0, else 0). It is a linear classifier and cannot represent XOR.
Sigmoid: The activation s(z) = 1/(1 + e^-z), giving an output in (0, 1) with s(0) = 0.5; smooth and differentiable, with derivative s'(z) = o(1 - o).
Multilayer perceptron (MLP): A feed-forward, fully-connected network with one or more hidden layers of sigmoid (or similar) neurons between the inputs and the output.
Forward pass: Computing outputs left to right: each neuron k forms z_k = sum_i w_ik.o_i + b_k, then o_k = s(z_k). Hidden units are computed before the output.
Backpropagation: The efficient computation of gradients for gradient descent: after a forward pass, an error signal delta is pushed backwards, so each weight is updated by dw = eta.delta.o_source.
Error signal (delta): For an output neuron delta = (t - o).o(1 - o); for a hidden neuron delta = o(1 - o).sum_i w.delta_i, using the weights from the forward pass.
Vanishing gradient: Because the sigmoid derivative is at most 0.25 and shrinks toward 0 when a unit saturates, gradients multiplied back through many layers become tiny, so early layers learn very slowly.

FAQ

Neural Networks & Backpropagation FAQ

Is neural networks examined by calculation or by explanation in COMP5318?

Both, and usually by calculation. The final is paper-based, 2 hours, closed book with only a non-programmable calculator, and neural networks appear as short numeric items: run a forward pass through a small sigmoid network, take one backpropagation step to update a weight, count trainable parameters of a Dense network, or explain the vanishing-gradient problem in a sentence. Marks are awarded for the method and the correct final number, so always show each substitution.

What is the difference between the perceptron rule and backpropagation?

The perceptron rule trains a single step-activation unit by correcting the weights with the raw error, w_new = w_old + e.x, and converges only if the data is linearly separable. Backpropagation trains a multilayer sigmoid network by gradient descent: it uses the differentiable sigmoid so it can compute an error signal delta at every neuron and update each weight by dw = eta.delta.o_source. In short, the perceptron rule is a one-layer, error-driven fix, while backpropagation propagates gradients through hidden layers.

Can AI help me with neural networks and backpropagation in COMP5318?

Yes, for building understanding rather than for handing in answers. A study tool like Sia can explain, step by step, how a forward pass flows from the hidden units to the output, why the sigmoid derivative is o(1 - o), or how an output delta differs from a hidden delta, and it can check your reasoning on practice numbers you make up. Use it to make the method automatic so you can reproduce it under closed-book exam conditions; do not use it to obtain answers for assessed quizzes or assignments, and always acknowledge any AI tools you use, as the University's academic-integrity policy requires.

Studying with AI? Sia — free AI machine learning tutor works through COMP5318 step by step.

Study strategy

Exam move

Treat backpropagation as a drill you can run without thinking. First, get the forward pass mechanical: compute every hidden unit's z and output, then the output unit, carrying about four decimals and remembering the sigmoid uses e to the minus z. Second, memorise the two delta rules, output delta = (t - o).o(1 - o) and hidden delta = o(1 - o).sum w.delta, and always compute all deltas with the old weights before applying any update dw = eta.delta.o_source. Third, keep the quick wins ready: the perceptron rule (weights on zero inputs never move, and it cannot do XOR) and the parameter count n_in.n_out + n_out per Dense layer with Flatten and pooling contributing zero. On a 2-hour (120-minute) paper, budget about one minute per mark, so a typical 8-mark forward-plus-backward part is roughly 8 minutes, comfortably inside the time. Keep the two hurdles in view: you need at least 40% in the final exam or your unit mark is capped at 45, and at least 50% overall to pass, and you should confirm the exact exam date on the Canvas exam timetable.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 25 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your COMP5318 tutor, unlimited, worked the way the exam marks it

The full 8-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works