DATA1001 · Foundations Of Data Science
Foundations of Data Science
Foundations of Data Science teaches statistical thinking end to end — how data are produced (study design and causation), how to describe them (exploratory data analysis), how to model them (the Normal curve and regression), and how to reason from a sample to a conclusion (probability, sampling distributions and hypothesis testing). The final exam is 60% of your mark in a single 2-hour sitting, and the universal backstop for almost every other component. It is conceptual and interpretive, not a coding exam: you read a study, choose the right method, run the logic and say what it means in plain English. Nearly every inference question runs the same engine — the standardised distance (OV−EV)/SE — scaffolded by HATPC. This guide teaches each topic to that standard.
What DATA1001 covers
Seven exam topics → one exam-ready map, walking the course pipeline Exploring → Modelling → Sampling → Deciding. Each links to its free chapter guide.
How DATA1001 is assessed
| Component | Weight | Format |
|---|---|---|
| Final exam | 60% | One 2-hour written paper · conceptual & interpretive (not coding) · the universal backstop |
| Project 2 | 20% | Individual — EDA + client report, parts across the semester |
| Project 1 | 10% | Group reproducible report |
| Evaluate Quizzes | 5% | Weekly online — best 8 of 10 plus an Early task |
| Workshop participation | 5% | All weeks — attend and take part |
A proportion test by HATPC — the signature inference, mark by mark
- +1H — Hypotheses: H₀: p = 0.5 (fair); H₁: p ≠ 0.5 (two-sided).
- +1A — Assumptions: independent spins, np₀ = 50 and n(1−p₀) = 50 both ≥ 10, so the Normal approximation holds.
- +2T — Test statistic: OV = p̂ = 60/100 = 0.6; EV = 0.5; SE = √(0.5×0.5/100) = 0.05; z = (0.6 − 0.5)/0.05 = 2.0.
- +1P — P-value: a two-sided z = 2.0 gives p ≈ 2×0.0228 = 0.046 — just inside |z| > 1.96.
- +1C — Conclusion in context: 0.046 < 0.05, so reject H₀ — there is evidence the coin is biased toward heads (though the effect is borderline).
Key terms
- Confounder
- A lurking third variable linked to both the exposure and the outcome, so the raw association between them is untrustworthy. It is why an observational study cannot license the word ‘causes’ — only randomisation balances confounders, known and unknown.
- Resistance
- Robustness of a summary to outliers. The median and IQR are resistant (a few wild points barely move them); the mean and SD are not. The rule: skewed or outlier-heavy data → report the median and IQR.
- Standard units (z-score)
- How many SDs a value sits from its mean: z = (x − x̄)/s. It puts any value on a common scale, lets you read areas off the Normal curve, and is the OV−EV part of the inference engine made unitless.
- Standard error (SE)
- The SD of a statistic across repeated samples — how much a sample mean or proportion would wobble from sample to sample. It shrinks like 1/√n, and it is the denominator of the (OV−EV)/SE engine.
- P-value
- The probability, if the null hypothesis were true, of getting a test statistic as extreme as the one observed. A small p-value means the data are surprising under the null; it is not the probability that the null is true.
DATA1001 FAQ
Is DATA1001 hard?
Conceptually approachable but interpretation-dense: most marks reward reading a study correctly, picking the right method, and writing the one-sentence conclusion in context — not heavy maths or memorised code. The pressure is concentrated because the final exam is 60% in one sitting and backstops almost everything else.
How is DATA1001 assessed?
The final exam is 60% in a single 2-hour written sitting and is the universal backstop. The rest is Project 2 (about 20%, individual), Project 1 (about 10%, a group reproducible report), weekly Evaluate Quizzes (about 5%, best 8 of 10 plus an Early task) and workshop participation (about 5%). Confirm this year's exact weights on your own Canvas.
Is the DATA1001 exam a coding exam?
No. It is conceptual and interpretive: you will not write R from a blank screen. You are given studies, plots, summaries and small datasets and asked to choose the right method, run the logic and interpret in context. The same skeleton — (OV−EV)/SE read against a Normal or t curve — powers nearly every inference question.
Do I need to be good at R or maths for DATA1001?
You learn R in the Coding Milestones and Projects, but the exam tests statistical reasoning, not coding fluency. The maths is light: standard units, areas under a curve, a slope, and the (OV−EV)/SE ratio. The skill that earns marks is choosing the right tool and reading the answer correctly.
Is using AskSia for DATA1001 cheating?
No. AskSia is a study reference written in our own words — we host none of your lecturer's files, and Sia teaches you the method to earn the marks; it does not complete or sit your assessments.
How to study for the exam
Because the exam is 60% and the universal backstop — and nothing backstops the exam — over-invest in exam-style reasoning, and treat the projects as exam practice with a longer deadline. Drill the two recurring chains until they are automatic: read the study design → say what conclusion is legal (observational = association only; randomised = causation licensed), and state HATPC → compute (OV−EV)/SE → read the p-value or CI without the classic misreads. Every test — proportion z-test, t-test, slope test, chi-square — is the same standardised distance with a different EV, SE and reference curve, so master the one engine and fresh exam numbers cannot surprise you. The exam pays for the in-context sentence, not the arithmetic.