MKTG90011 · Marketing Research
Principal Component Analysis
Principal Component Analysis (PCA) is a data-reduction technique: it turns many correlated metric items into a few uncorrelated components, keeping as much of the original variance as possible. It assumes those correlations are driven by an underlying latent factor, so a long battery of survey items can be summarised by a handful of dimensions you then name. The workflow is fixed. First, check the data suit PCA — enough cases (rule of thumb n > 10× the number of items), most correlations ≥ .30, the KMO measure > .50, and Bartlett's test of sphericity significant (Sig. < .05). Then decide how many components to keep — the eigenvalue > 1 rule (Kaiser) or the scree-plot elbow. Finally, read the (rotated) component matrix to see which items load on each component, and name each component from the items that load highly on it. The exam treats PCA as read-the-output items (is it suitable? how many components? which items load where?) and as a which-technique trigger: “reduce many correlated items to underlying dimensions → PCA”. It is a valid project H6 and a fix for multicollinearity in H5.
What this chapter covers
- 0111.1 What PCA does — many correlated items → a few components
- 02The latent-factor assumption
- 03Suitability checks: KMO > .50 and Bartlett's Sig. < .05
- 04Extraction: components ordered by variance (PC1, PC2, …)
- 05How many to keep: eigenvalue > 1 and the scree-plot elbow
- 06Loadings and naming components from the rotated matrix
- 07Out of scope: oblique rotation, principal-axis factoring, CFA
Worked example: decide suitability and number of components
- +1(a) Sample size. n = 300 with 12 items easily clears the n > 10×k rule (120), so there are enough cases.
- +1(a) Suitability statistics. KMO = .84 (> .50, in fact ‘meritorious’) and Bartlett's Sig. = .000 (< .05) — both pass, so the data are suitable for PCA.
- +1(b) Apply the eigenvalue rule. Keep components with eigenvalue > 1: the first three (4.1, 2.3, 1.4) qualify; the fourth (0.7) does not.
- +1(b) Conclude. Retain three components; the scree-plot elbow should fall after the third, confirming the choice. Then name each from its high-loading items.
Key terms
- Principal Component Analysis (PCA)
- A data-reduction technique that transforms many correlated metric items into a smaller set of uncorrelated components, retaining as much variance as possible, on the assumption of an underlying latent factor.
- KMO measure
- The Kaiser-Meyer-Olkin measure of sampling adequacy — it checks whether the variables share enough common variance for PCA. A value above .50 is the minimum; higher is better.
- Bartlett's test of sphericity
- A test of whether the correlation matrix differs from an identity matrix (i.e. whether variables are correlated enough to reduce). A significant result (Sig. < .05) is needed for PCA to be worthwhile.
- Eigenvalue
- The amount of total variance captured by a component. The Kaiser rule retains components with an eigenvalue greater than 1 — those that explain more variance than a single original item.
- Loading
- The correlation between an original item and a component. High loadings (the rotated component matrix) tell you which items define a component, which is how you name it.
Principal Component Analysis FAQ
What is PCA used for in marketing research?
To reduce a long battery of correlated survey items into a few interpretable dimensions — e.g. collapsing 20 loyalty questions into three components like ‘trust’, ‘commitment’ and ‘advocacy’. It simplifies analysis, removes redundancy, and can fix multicollinearity before a regression by replacing correlated predictors with components.
How do I know my data are suitable for PCA?
Check four things: enough cases (n > about 10 times the number of items), most inter-item correlations at least .30, the KMO measure above .50, and Bartlett's test significant (Sig. < .05). If KMO is too low or Bartlett is not significant, the items don't share enough common variance to reduce.
How many components should I keep?
Use the eigenvalue > 1 rule (Kaiser): keep every component that explains more variance than a single original item. Cross-check against the scree plot — retain components above the ‘elbow’ where the curve flattens. The two usually agree; if not, favour interpretability.
How do I name the components?
From the loadings in the rotated component matrix. Look at which items load most strongly (highest absolute loading) on each component, then give the component a label that captures what those items have in common. A component is only useful if its high-loading items tell a coherent story.
Exam move
Treat PCA as a checklist the exam asks you to run on SPSS output: suitability (KMO > .50, Bartlett Sig. < .05), how many (eigenvalue > 1 and the scree elbow), which loads where (the rotated component matrix), then name each component. Memorise the which-technique trigger — “reduce many correlated items to underlying dimensions” → PCA — and the out-of-scope list (oblique rotation, principal-axis factoring and CFA are flagged ‘not in the exam’). Remember PCA can also rescue a multicollinear regression (H5) by replacing correlated predictors, and it is a valid choice for project H6.