University of Sydney · FACULTY OF COMPUTER SCIENCE

COMP5318 · Machine Learning and Data Mining

- one subject, every graph, every model, every mark

Computer Science14 Chapters10-page Bible

Our own words - no uploaded lecturer files

Updated for this semester

Chapter 10 of 11 · COMP5318

Markov Models & HMMs

Markov chains and Hidden Markov Models are the sequence-modelling topic of COMP5318 Machine Learning and Data Mining at the University of Sydney, sitting alongside the unit's other examinable methods (k-NN, Naive Bayes, decision trees, PCA, neural networks and clustering). A Markov chain assumes the next state depends only on the current one; a Hidden Markov Model hides those states behind observations you must reason back from.

The examinable core is compact and calculator-friendly: multiply probabilities along a chain, run the Forward algorithm to evaluate a sequence, and run Viterbi to decode the most likely hidden path.

In this chapter

What this chapter covers

01State the first-order Markov assumption: the next state depends only on the current state, not the full history
02Read a transition matrix a_ij (row = from, column = to) and check that every row sums to 1
03Compute the probability of a state sequence: initial probability times one transition per step
04Identify the four parts of an HMM: hidden states, transition matrix A, emission matrix E, initial distribution A0
05Distinguish the three HMM problems: evaluation (Forward), decoding (Viterbi), learning (Baum-Welch / EM)
06Run the Forward algorithm (initialise, iterate with a SUM, terminate) to find P(observation sequence)
07Run the Viterbi algorithm (same recurrence with a MAX plus back-pointers) to find the most likely hidden path
08Explain why dynamic programming beats naive path enumeration: O(N^2 T) instead of O(N^T)
09Avoid the classic traps: reversing a transition, confusing transitions A with emissions E, and using sum where max is needed

Worked example · free

Probability of a state sequence in a two-state Markov chain

Q [3 marks]. A process moves between two states A and B. The initial distribution is P(A)=0.7, P(B)=0.3. The transition matrix (rows = from) is A->A=0.8, A->B=0.2, B->A=0.5, B->B=0.5. Find the probability of the sequence A -> A -> B -> B.

+1Write the rule. A length-4 sequence uses one initial factor and three transitions: P = P(A) . a_AA . a_AB . a_BB.
+1Read the factors from the initial distribution and the matrix rows: P(A)=0.7, a_AA=0.8, a_AB=0.2, a_BB=0.5.
+1Multiply along the chain: 0.7 x 0.8 = 0.56, x 0.2 = 0.112, x 0.5 = 0.056.

P(A -> A -> B -> B) = 0.056.

Sia tip — Count your factors first: K states use one initial probability and K-1 transitions (here 1 + 3). Always read a transition in the stated direction (row = from, column = to) - reversing a_AB into a_BA is the most common slip on this question.

Glossary

Key terms

Markov assumption: The memoryless property that the next state depends only on the current state, P(pi_t | pi_1..pi_{t-1}) = P(pi_t | pi_{t-1}), not on the earlier history.
Transition matrix (A): The table of state-to-state probabilities a_ij = P(next = j | current = i). Every row sums to 1 because from any state the process must go somewhere.
Initial distribution (A0): The probability of each possible starting state, P(pi_1). It also sums to 1.
Hidden Markov Model (HMM): A Markov chain whose states are hidden; each state emits an observation, and only the observations are seen. Defined by hidden states, transition matrix A, emission matrix E and initial distribution A0.
Emission matrix (E): The probabilities e_k(x) = P(observation x | state k). Each state's emission row sums to 1. It is separate from the transition matrix A.
Forward algorithm: Dynamic programming that evaluates P(observation sequence) by summing over hidden paths. Initialise f_k(1)=A0(k) e_k(x_1); iterate f_k(i)=e_k(x_i) SUM_j f_j(i-1) a_jk; terminate P(X)=SUM_k f_k(m). Cost O(N^2 T).
Viterbi algorithm: The decoding algorithm: the same recurrence as Forward but with MAX instead of sum, plus back-pointers. It returns the single most likely hidden state sequence.
Baum-Welch (EM): The learning algorithm that estimates an HMM's A, E and A0 from data using Expectation-Maximisation. Named as the third HMM problem but not a hand calculation in this unit.

FAQ

Markov Models & HMMs FAQ

What is the difference between the Forward and Viterbi algorithms?

They share the same trellis and the same initialisation, but they answer different questions and use a different operator. Forward SUMS over all incoming paths and returns a probability P(observation sequence) - it evaluates how likely the observations are. Viterbi takes the MAX incoming path (and stores back-pointers) and returns a sequence of states - it decodes the single most likely hidden path. A quick check in the exam: if the question asks for a probability, use Forward's sum; if it asks which states, use Viterbi's max and trace the pointers back.

Do I need to memorise the Forward and Viterbi formulas for the closed-book exam?

Yes - the COMP5318 final is closed book with a non-programmable calculator, so you should be able to write the recurrences from memory: initialise with initial-probability times emission, iterate by combining the previous column (via transitions) and multiplying by the current emission, and terminate by summing (Forward) or taking the argmax and tracing back (Viterbi). Practise on a small two-state, two-observation example until the trellis is automatic. Always confirm the exam format and date on the Canvas exam timetable.

Can AI help me with Markov models and HMMs in COMP5318?

Yes, as a study aid within the University's academic-integrity rules. A tutor like Sia can explain the Markov assumption step by step, walk you through a Forward or Viterbi trellis on a practice HMM, and check where your working went wrong - which builds the skill you need for the closed-book exam. What it must not do is hand you answers to submitted assessment or sit the exam for you, and any AI tool used in assessable work must be acknowledged. Use it to understand the method, not to bypass it.

Studying with AI? Sia — free AI machine learning tutor works through COMP5318 step by step.

Study strategy

Exam move

Treat this chapter as three small, repeatable computations rather than theory. First, drill the Markov-chain sequence rule (one initial factor, K-1 transitions) until reading a transition in the correct row=from, column=to direction is automatic. Next, learn the four HMM parts and keep transitions (A) and emissions (E) in separate columns of your working so you never multiply the wrong table. Then practise the Forward and Viterbi recurrences on tiny two-state, two-observation examples by hand: they share the same trellis, so the only thing to master is sum (Forward, gives a probability) versus max-plus-back-pointers (Viterbi, gives a path). Show every factor you multiply - method marks are awarded for the correct set-up even when a single product slips. Budget about one minute per mark, and confirm the exam format and date on the Canvas exam timetable.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 25 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your COMP5318 tutor, unlimited, worked the way the exam marks it

The full 10-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works