COMP5318 · Machine Learning and Data Mining
Naïve Bayes & Model Evaluation
This chapter of University of Sydney COMP5318 Machine Learning and Data Mining pairs a probabilistic classifier with the tools that judge every model. Naïve Bayes turns Bayes theorem into a fast, hand-computable classifier by assuming features are independent given the class, while model evaluation shows how to estimate error honestly with cross-validation and how to read a confusion matrix through accuracy, precision, recall and F1. Both halves are recurring short-calculation targets on the closed-book final.
What this chapter covers
- 01State Bayes theorem and name the prior, likelihood, posterior and evidence
- 02Apply the Naïve Bayes rule: classify to the class with the largest prior times product of likelihoods
- 03Estimate nominal likelihoods as training-count fractions
- 04Spot the zero-frequency problem and fix it with Laplace (add-one) smoothing
- 05Use the Gaussian density with per-class mean and sample standard deviation for a numeric feature
- 06Choose between holdout, k-fold (stratified) cross-validation and leave-one-out to estimate true error
- 07Keep tuning off the test set: validate hyperparameters on a validation set or by cross-validation
- 08Build a 2x2 confusion matrix (TP, FP, FN, TN) from a scenario
- 09Compute accuracy, precision, recall and F1, and know when accuracy misleads
Confusion-matrix metrics for a spam filter
- +1Build the four counts. TP = 90 (flagged and truly spam). FP = 100 - 90 = 10 (flagged but legitimate). FN = 120 - 90 = 30 (spam but missed). TN = 380 - 10 = 370 (legitimate and correctly left alone).
- +1Precision = TP/(TP+FP) = 90/100 = 0.90. Of everything flagged as spam, 90% really was spam.
- +1Recall = TP/(TP+FN) = 90/120 = 0.75. Of all the actual spam, the filter caught 75%.
- +1F1 = 2PR/(P+R) = 2(0.90)(0.75)/(0.90+0.75) = 1.35/1.65 = 0.818. Cross-check with 2*TP/(2*TP+FP+FN) = 180/220 = 0.818.
- +1Accuracy = (TP+TN)/total = (90+370)/500 = 460/500 = 0.92.
Key terms
- Bayes theorem
- P(H|E) = P(E|H) P(H) / P(E): it flips a conditional so you can get the posterior probability of a class H from the prior P(H) and the likelihood P(E|H).
- Naïve Bayes assumption
- Features are treated as conditionally independent given the class, so the joint likelihood factors into a product of per-feature likelihoods; the classifier picks the class maximising prior times that product.
- Zero-frequency problem
- A feature value that never co-occurs with a class in training gives a likelihood of 0, which zeroes the whole product; fixed by Laplace (add-one) smoothing.
- Laplace smoothing
- Add 1 to each likelihood numerator and m (the number of possible feature values) to the denominator so no probability is ever exactly zero.
- Gaussian Naïve Bayes
- For a numeric feature, use the normal density f(x) = 1/(sigma*sqrt(2*pi)) * exp(-(x-mu)^2 / (2*sigma^2)) with the mean and sample standard deviation estimated separately for each class.
- Cross-validation
- Split the data into k equal folds; each fold is the test set once while the others train, and the k scores are averaged. Stratified folds keep each class in its original proportion; 10-fold is the standard choice.
- Confusion matrix
- A 2x2 table of truth against prediction with counts TP, FP, FN, TN, from which accuracy, precision, recall and F1 are all computed.
- Precision vs recall
- Precision = TP/(TP+FP) is how many flagged positives were right; recall = TP/(TP+FN) is how many actual positives were found; F1 = 2PR/(P+R) is their harmonic mean.
Naïve Bayes & Model Evaluation FAQ
Why is Naïve Bayes called "naïve", and does the assumption break it?
It is naïve because it assumes the features are independent of each other once you know the class, which is rarely true of real data. In practice the classifier still performs well and is very cheap: one pass to count the training data, then a product of likelihoods to predict. The assumption is what lets you multiply the per-feature likelihoods instead of estimating a full joint distribution, so it is a deliberate simplification rather than a flaw.
When does accuracy mislead, and what should I report instead?
Accuracy misleads on imbalanced classes. If 95% of cases are negative, a model that always predicts negative scores 95% accuracy yet catches no positives at all. Report precision, recall and F1 in that situation: precision exposes false alarms, recall exposes missed positives, and F1 (their harmonic mean) is high only when both are good. Choose the metric that matches the cost of each error type.
Can AI help me with Naïve Bayes and evaluation metrics in COMP5318?
Yes, for understanding rather than for handing in answers. A study tool like Sia can explain, step by step, why a single zero likelihood needs Laplace smoothing, walk you through a Gaussian Naïve Bayes calculation on practice numbers, or check your reasoning about precision versus recall. Use it to make the method automatic so you can reproduce it under closed-book exam conditions; do not use it to obtain answers for assessed quizzes or assignments, and always acknowledge any AI tools you use, as the University's academic-integrity policy requires.
Studying with AI? Sia — free AI machine learning tutor works through COMP5318 step by step.
Exam move
Treat this chapter as two drills. First, make Naïve Bayes mechanical: write the priors, then the per-feature likelihoods (count fractions for nominal features, the normal density with each class's own mean and sample standard deviation for a numeric one), multiply prior by the product of likelihoods for each class, and take the largest. Rehearse the zero-frequency case so that Laplace smoothing is a reflex. Second, make the metrics automatic: from any scenario, build TP, FP, FN and TN before touching a formula, then precision, recall, F1 and accuracy each become a single division, and be ready to say when accuracy misleads on imbalanced data. Keep cross-validation and the validation-versus-test distinction clear, because a one-line concept question on it is easy marks. On a 2-hour (120-minute) paper budget about one minute per mark, so a 5-mark confusion-matrix question is roughly 5 minutes. Keep the two hurdles in view: you need at least 40% in the final exam to avoid your mark being capped at 45, and at least 50% overall to pass, and you should confirm the exact exam date on the Canvas exam timetable.