COMP5318 · Machine Learning and Data Mining
Markov Models & HMMs
Markov chains and Hidden Markov Models are the sequence-modelling topic of COMP5318 Machine Learning and Data Mining at the University of Sydney, sitting alongside the unit's other examinable methods (k-NN, Naive Bayes, decision trees, PCA, neural networks and clustering). A Markov chain assumes the next state depends only on the current one; a Hidden Markov Model hides those states behind observations you must reason back from.
The examinable core is compact and calculator-friendly: multiply probabilities along a chain, run the Forward algorithm to evaluate a sequence, and run Viterbi to decode the most likely hidden path.
What this chapter covers
- 01State the first-order Markov assumption: the next state depends only on the current state, not the full history
- 02Read a transition matrix a_ij (row = from, column = to) and check that every row sums to 1
- 03Compute the probability of a state sequence: initial probability times one transition per step
- 04Identify the four parts of an HMM: hidden states, transition matrix A, emission matrix E, initial distribution A0
- 05Distinguish the three HMM problems: evaluation (Forward), decoding (Viterbi), learning (Baum-Welch / EM)
- 06Run the Forward algorithm (initialise, iterate with a SUM, terminate) to find P(observation sequence)
- 07Run the Viterbi algorithm (same recurrence with a MAX plus back-pointers) to find the most likely hidden path
- 08Explain why dynamic programming beats naive path enumeration: O(N^2 T) instead of O(N^T)
- 09Avoid the classic traps: reversing a transition, confusing transitions A with emissions E, and using sum where max is needed
Probability of a state sequence in a two-state Markov chain
- +1Write the rule. A length-4 sequence uses one initial factor and three transitions: P = P(A) . a_AA . a_AB . a_BB.
- +1Read the factors from the initial distribution and the matrix rows: P(A)=0.7, a_AA=0.8, a_AB=0.2, a_BB=0.5.
- +1Multiply along the chain: 0.7 x 0.8 = 0.56, x 0.2 = 0.112, x 0.5 = 0.056.
Key terms
- Markov assumption
- The memoryless property that the next state depends only on the current state, P(pi_t | pi_1..pi_{t-1}) = P(pi_t | pi_{t-1}), not on the earlier history.
- Transition matrix (A)
- The table of state-to-state probabilities a_ij = P(next = j | current = i). Every row sums to 1 because from any state the process must go somewhere.
- Initial distribution (A0)
- The probability of each possible starting state, P(pi_1). It also sums to 1.
- Hidden Markov Model (HMM)
- A Markov chain whose states are hidden; each state emits an observation, and only the observations are seen. Defined by hidden states, transition matrix A, emission matrix E and initial distribution A0.
- Emission matrix (E)
- The probabilities e_k(x) = P(observation x | state k). Each state's emission row sums to 1. It is separate from the transition matrix A.
- Forward algorithm
- Dynamic programming that evaluates P(observation sequence) by summing over hidden paths. Initialise f_k(1)=A0(k) e_k(x_1); iterate f_k(i)=e_k(x_i) SUM_j f_j(i-1) a_jk; terminate P(X)=SUM_k f_k(m). Cost O(N^2 T).
- Viterbi algorithm
- The decoding algorithm: the same recurrence as Forward but with MAX instead of sum, plus back-pointers. It returns the single most likely hidden state sequence.
- Baum-Welch (EM)
- The learning algorithm that estimates an HMM's A, E and A0 from data using Expectation-Maximisation. Named as the third HMM problem but not a hand calculation in this unit.
Markov Models & HMMs FAQ
What is the difference between the Forward and Viterbi algorithms?
They share the same trellis and the same initialisation, but they answer different questions and use a different operator. Forward SUMS over all incoming paths and returns a probability P(observation sequence) - it evaluates how likely the observations are. Viterbi takes the MAX incoming path (and stores back-pointers) and returns a sequence of states - it decodes the single most likely hidden path. A quick check in the exam: if the question asks for a probability, use Forward's sum; if it asks which states, use Viterbi's max and trace the pointers back.
Do I need to memorise the Forward and Viterbi formulas for the closed-book exam?
Yes - the COMP5318 final is closed book with a non-programmable calculator, so you should be able to write the recurrences from memory: initialise with initial-probability times emission, iterate by combining the previous column (via transitions) and multiplying by the current emission, and terminate by summing (Forward) or taking the argmax and tracing back (Viterbi). Practise on a small two-state, two-observation example until the trellis is automatic. Always confirm the exam format and date on the Canvas exam timetable.
Can AI help me with Markov models and HMMs in COMP5318?
Yes, as a study aid within the University's academic-integrity rules. A tutor like Sia can explain the Markov assumption step by step, walk you through a Forward or Viterbi trellis on a practice HMM, and check where your working went wrong - which builds the skill you need for the closed-book exam. What it must not do is hand you answers to submitted assessment or sit the exam for you, and any AI tool used in assessable work must be acknowledged. Use it to understand the method, not to bypass it.
Studying with AI? Sia — free AI machine learning tutor works through COMP5318 step by step.
Exam move
Treat this chapter as three small, repeatable computations rather than theory. First, drill the Markov-chain sequence rule (one initial factor, K-1 transitions) until reading a transition in the correct row=from, column=to direction is automatic. Next, learn the four HMM parts and keep transitions (A) and emissions (E) in separate columns of your working so you never multiply the wrong table. Then practise the Forward and Viterbi recurrences on tiny two-state, two-observation examples by hand: they share the same trellis, so the only thing to master is sum (Forward, gives a probability) versus max-plus-back-pointers (Viterbi, gives a path). Show every factor you multiply - method marks are awarded for the correct set-up even when a single product slips. Budget about one minute per mark, and confirm the exam format and date on the Canvas exam timetable.