PUBH5010 · Epidemiology Methods And Uses
Measurement Error and Misclassification
If you mis-measure exposure or outcome, the number you report is the wrong number — and the key exam skill is predicting which way it goes wrong. Measurement quality has two faces: validity (does the instrument measure the truth, captured for a binary classification by its sensitivity and specificity) and reliability (does it give the same answer on repeat). Error in a categorical variable is misclassification, and its effect splits sharply by whether it is the same in the groups compared. Non-differential misclassification — error unrelated to the other variable, e.g. exposure mis-measured equally in cases and controls — characteristically biases a binary exposure’s estimate toward the null, diluting a real association (its predictability is why examiners love it). Differential misclassification — error that differs between groups, the classic being recall bias where cases remember exposures more keenly — can bias in either direction and is far harder to call. The chapter also separates exposure from outcome misclassification and ties instrument accuracy back to the sensitivity/specificity language you meet again in screening. The takeaway sentence: name the type, then commit to a direction (and admit when differential error makes the direction unknowable).
What this chapter covers
- 01Validity vs reliability of a measurement
- 02Misclassification of a categorical variable
- 03Non-differential error and the bias toward the null
- 04Differential error and recall bias — direction unpredictable
- 05Exposure vs outcome misclassification
- 06Sensitivity and specificity of a measurement instrument
- 07Stating which way the reported measure is wrong
Worked example: predicting the direction of non-differential error
- +1(a) Classify. The misclassification is the same in cases and controls (unrelated to outcome), so it is non-differential misclassification of exposure.
- +2(b) Direction. For a binary exposure, non-differential error biases the estimate toward the null: the observed OR will be below the true 2.0, closer to 1.
- +1(c) Mechanism. Mislabelling some truly-exposed people as unexposed (and vice versa) blurs the contrast between the groups, shrinking the apparent difference and so the measured association.
Key terms
- Validity
- How well an instrument measures the true value. For a binary classification it is summarised by sensitivity (detecting true positives) and specificity (detecting true negatives). A valid measure is accurate on average; reliability is a separate property (consistency on repeat).
- Misclassification
- Measurement error in a categorical variable — putting people in the wrong exposure or outcome category. Its effect on the estimate depends critically on whether it is the same in the groups compared (non-differential) or different (differential).
- Non-differential misclassification
- Error unrelated to the other variable — e.g. exposure mis-measured equally among cases and controls. For a binary exposure it characteristically biases the measure of association toward the null, diluting a true effect; this predictable direction is heavily examined.
- Differential misclassification
- Error that differs between the groups being compared — most famously recall bias, where cases recall past exposures more thoroughly than controls. It can bias the estimate either away from or toward the null, so its direction cannot be assumed.
- Recall bias
- A differential misclassification of exposure in case-control studies: people with the outcome search their memory harder for possible causes than those without it, systematically over-reporting exposure among cases and exaggerating the association.
Measurement Error and Misclassification FAQ
What's the difference between validity and reliability?
Validity is about accuracy — does the instrument measure the true value (for a yes/no measure, captured by sensitivity and specificity)? Reliability is about consistency — does it give the same answer when repeated? A scale can be reliable but invalid (consistently 2 kg too high) or valid on average but unreliable (right on average, noisy each time). Epidemiological bias mostly flows from invalidity, especially misclassification.
Why does non-differential misclassification bias toward the null?
Because mislabelling some exposed people as unexposed (and vice versa) equally in both compared groups blurs the contrast between them. The exposed and unexposed groups become more alike than they truly are, so the measured difference — and therefore the RR or OR — shrinks toward the no-effect value. For a binary exposure this direction is reliable enough to state with confidence.
Why is differential error harder to deal with?
Because the error is tied to group membership, it can either inflate or deflate the association depending on the pattern, so you cannot predict the direction without details. Recall bias, for instance, usually exaggerates a case-control association by over-reporting exposure in cases, but other differential patterns push the other way. The exam expects you to flag that the direction is not automatic.
Does it matter whether exposure or outcome is misclassified?
Yes — both distort the estimate, but the analysis differs. Non-differential misclassification of a binary exposure biases toward the null; misclassification of the outcome, or error in more than two categories, can behave differently. The reliable exam result is the binary-exposure, non-differential, toward-the-null case; for anything else, reason carefully and state your assumptions.
Exam move
Drive every measurement item to two answers: name the type (non-differential vs differential; exposure vs outcome) and commit to a direction. The bankable result is non-differential misclassification of a binary exposure → bias toward the null — learn the dilution mechanism so you can explain it, not just assert it. For differential error (recall bias the headline case), state that the direction depends on the pattern and is not automatic. Keep validity vs reliability straight, and connect instrument accuracy to the sensitivity/specificity language you reuse in the screening chapter.