MAST90139 · Statistical Modelling For Data Science
Statistical Modelling for Data Science
Statistical Modelling for Data Science teaches one engine — the generalised linear model (GLM) — and then watches it bend to every response type: binary, grouped-binomial, count, contingency-table and multicategory (nominal and ordinal) data, all fitted in R. The exam is a read-the-R-output gauntlet: given a glm() / multinom() / polr() printout you must name the model and link, do the deviance / ΔD-vs-χ² arithmetic by hand, and translate a coefficient into an odds / rate / cumulative-odds statement. This guide teaches each family to that standard, so you can read any printout and say exactly what it means.
What MAST90139 covers
Eight model families → one exam-ready map. Each links to its free chapter guide — and they are all the same generalised linear model.
How MAST90139 is assessed
| Component | Weight | Format |
|---|---|---|
| Assignment 1 — logistic regression | Weight: confirm | Binary / grouped logistic, odds-ratio interpretation · submitted to Gradescope (due ~early April) |
| Assignment 2 — binomial dose-response | Weight: confirm | Beetle-mortality dose-response, link comparison · Gradescope (due ~early May) |
| Assignment 3 — ordinal / multinomial | Weight: confirm | 3-category ordinal response with polr · Gradescope (due ~end May) |
| Final examination | Weight: confirm | Read-the-R-output across the whole GLM arc — weight, length and book-status are not in the supplied source; confirm the exact split and format in your subject guide |
Reading a glm() printout — logistic coefficient to odds ratio, mark by mark
- +2(a) Slope → odds ratio: in logistic regression the coefficient is on the log-odds scale, so eβ is the odds ratio: e0.80 ≈ 2.23. Each one-unit rise in x multiplies the odds of the event by about 2.2.
- +1(b) Linear predictor at x = 2.5: η = −2.0 + 0.80×2.5 = 0.
- +1(b) Inverse-logit: π = 1 / (1 + e−η) = 1 / (1 + e0) = 0.5 — the curve crosses 0.5 exactly where η = 0.
- +1(c) Likelihood-ratio test: ΔD = Dnull − Dresid = 180 − 150 = 30, on 99 − 98 = 1 df.
- +1(c) Compare to χ²: 30 ≫ χ²1, 0.05 = 3.84, so reject H₀ — x is a highly significant predictor.
Key terms
- Generalised linear model (GLM)
- A model with three parts: a random component (a response from an exponential-family distribution), a linear predictor η = Xβ, and a link function g that connects them, g(μ) = η. Linear, logistic, Poisson and log-linear regression are all special cases.
- Link function
- The function g that maps the mean μ onto the linear-predictor scale, g(μ) = η = Xβ. The logit link gives logistic regression (binary data), the log link gives Poisson regression (counts); each “canonical” link pairs with a distribution.
- Deviance
- The GLM analogue of the residual sum of squares: twice the log-likelihood gap between the saturated model and your fitted model. Smaller deviance means better fit; the drop in deviance between nested models is the likelihood-ratio test statistic.
- Odds ratio
- e raised to a logistic coefficient, eβ — the factor by which the odds of the event multiply for a one-unit rise in the predictor. Greater than 1 raises the odds, less than 1 lowers them; it is the headline output of any logistic fit.
- Overdispersion
- When count or grouped-binomial data vary more than the Poisson or binomial model allows (residual deviance far exceeds its degrees of freedom). It is fixed by estimating a dispersion parameter φ and refitting a quasi-likelihood model, then testing with an F test rather than χ².
MAST90139 FAQ
Is MAST90139 hard?
It is procedural once the GLM framework clicks, but it is dense. The trick is realising that logistic, Poisson, log-linear and ordinal regression are the same three-part model with a different distribution and link — learn that engine cold and the rest is pattern-matching. The difficulty is reading R output fast and interpreting coefficients correctly under exam time.
How is MAST90139 assessed?
The supplied source confirms three written assignments submitted to Gradescope (logistic, binomial dose-response, and an ordinal/multinomial study) plus a final examination. The exact weights, exam length and whether the exam is open- or closed-book are not stated in the supplied material — confirm them in the official subject guide / Handbook, as they shift between cohorts.
What is on the MAST90139 final exam?
A read-the-R-output gauntlet across the whole GLM arc: given a glm(), multinom() or polr() printout, name the model and link, do the deviance / ΔD-vs-χ² arithmetic by hand, and translate a coefficient into an odds ratio (logit), a rate ratio (log) or a cumulative odds ratio (proportional odds).
What maths and software do I need for MAST90139?
You need matrix algebra, likelihood and the normal/exponential-family distributions, plus comfort reading regression output. All fitting is done in R (the faraway package, with glm, polr and multinom); you are not asked to invert matrices by hand, but you are asked to read and interpret what R prints.
Is using AskSia for MAST90139 cheating?
No. AskSia is a study reference written in our own words — we host none of your lecturer's files, and Sia teaches you the method to read a printout and earn the marks; it does not complete or sit your assessments.
How to study for the exam
Build everything on Chapter 2, the GLM framework — random component, linear predictor and link — because every later model is that template with a new distribution and link. Then drill the three recurring exam moves until they are automatic: Name (response type → distribution → canonical link), Test (ΔD = D₀ − D₁ against χ² with the df difference for nested models; D against χ²(n−q) for grouped goodness-of-fit), and Interpret (eβ as an odds ratio, rate ratio, cross-product ratio or cumulative OR). Practise on real R printouts so you can do the deviance arithmetic and the coefficient sentence at speed — that triple is the whole paper.