ECON20003 · Quantitative Methods 2
Cross-Validation, Ridge & LASSO Regression
Cross-Validation, Ridge & LASSO Regression is the conceptual machine-learning corner of the subject, focused on the bias-variance trade-off and overfitting. A model that is too flexible fits the training data beautifully but predicts new data badly; k-fold cross-validation estimates out-of-sample error by training on most folds and testing on the held-out fold, then averaging, which is how you tune the penalty strength λ. Ridge regression (an L2 penalty) shrinks coefficients toward zero but never exactly to zero, while LASSO (an L1 penalty) shrinks and can set some coefficients exactly to zero, performing variable selection.
What this chapter covers
- 01Bias-variance trade-off and overfitting
- 02k-fold cross-validation to estimate out-of-sample error and tune λ
- 03Ridge (L2): minimise SSE + λΣβⱼ² — shrinks toward 0, never exactly 0
- 04LASSO (L1): minimise SSE + λΣ|βⱼ| — shrinks AND selects (some β̂ⱼ = 0)
- 05Choosing λ by cross-validation
Tuning λ with 5-fold cross-validation
- 1 markRecall the procedure: for each fold, train on the other four folds and record the prediction error on the held-out fold; the cross-validated error is the average across folds.
- 2 marksSum the five fold errors: 12 + 10 + 11 + 13 + 9 = 55.
- 1 markAverage over k = 5 folds: CV error = 55/5 = 11.
- 1 markRepeat for every candidate λ to get one CV error per λ.
- 1 markChoose the λ with the lowest cross-validated error (some practitioners use the slightly larger one-standard-error λ for a simpler model). Larger λ shrinks coefficients more and, for LASSO, sets more of them to zero.
Key terms
- Bias-variance trade-off
- Simple models have high bias (they systematically miss) but low variance; flexible models have low bias but high variance (they overfit noise). Good prediction balances the two to minimise total out-of-sample error.
- k-fold cross-validation
- Split the data into k folds, train on k − 1 and test on the held-out fold, rotate through all folds, and average the test errors. It estimates out-of-sample error and is used to tune the penalty λ.
- Ridge regression (L2)
- Minimises SSE + λΣβⱼ², adding a penalty on the squared coefficients. It shrinks all coefficients toward zero to reduce variance but never sets any exactly to zero, so it does not select variables.
- LASSO regression (L1)
- Minimises SSE + λΣ|βⱼ|, penalising the absolute coefficients. It both shrinks coefficients and can set some exactly to zero, performing automatic variable selection alongside regularisation.
Cross-Validation, Ridge & LASSO Regression FAQ
What's the key difference between ridge and LASSO?
Both add a penalty on the size of the coefficients to control overfitting, but ridge (L2) shrinks every coefficient toward zero without eliminating any, while LASSO (L1) can drive some coefficients exactly to zero, so LASSO doubles as a variable-selection method. The penalty strength λ is tuned by cross-validation in both.
Why do we need cross-validation at all?
Because training error always improves as a model gets more complex, it cannot tell you whether the model will predict new data well. Cross-validation simulates out-of-sample testing by holding folds out, giving an honest error estimate to pick the λ (or model) that generalises best.
Exam move
This chapter is conceptual with no R-printout reading, so focus on crisp definitions: state the bias-variance trade-off, explain the k-fold procedure, and contrast ridge versus LASSO in one sentence each. Be ready to compute a simple averaged CV error and to say how λ is chosen.