University of Sydney · S1 2026 · FACULTY OF SCIENCE

DATA1001 · Foundations Of Data Science

- one subject, every graph, every model, every mark
50% final exam · hurdle7 Chapters43-page Bible
Our own words - no uploaded lecturer files
Built to mirror S1 2026 · updated this semester
The Complete Exam Bible · S1 2026

Foundations of Data Science

— one course, one engine, every method, every mark

Foundations of Data Science teaches statistical thinking end to end — how data are produced (study design and causation), how to describe them (exploratory data analysis), how to model them (the Normal curve and regression), and how to reason from a sample to a conclusion (probability, sampling distributions and hypothesis testing). The final exam is 60% of your mark in a single 2-hour sitting, and the universal backstop for almost every other component. It is conceptual and interpretive, not a coding exam: you read a study, choose the right method, run the logic and say what it means in plain English. Nearly every inference question runs the same engine — the standardised distance (OV−EV)/SE — scaffolded by HATPC. This guide teaches each topic to that standard.

DATA1001 · University of Sydney
Assessment

How DATA1001 is assessed

ComponentWeightFormat
Final exam60%One 2-hour written paper · conceptual & interpretive (not coding) · the universal backstop
Project 220%Individual — EDA + client report, parts across the semester
Project 110%Group reproducible report
Evaluate Quizzes5%Weekly online — best 8 of 10 plus an Early task
Workshop participation5%All weeks — attend and take part
Worked example · free

A proportion test by HATPC — the signature inference, mark by mark

Q [6 marks]. A coin is spun n = 100 times and lands heads 60 times. Test, at the 5% level, whether the coin is fair (p = 0.5). Use HATPC and the (OV−EV)/SE engine, and interpret the result in context.
z0+1.96−1.96z = 2.0null curve (z)
  • +1H — Hypotheses: H₀: p = 0.5 (fair); H₁: p ≠ 0.5 (two-sided).
  • +1A — Assumptions: independent spins, np₀ = 50 and n(1−p₀) = 50 both ≥ 10, so the Normal approximation holds.
  • +2T — Test statistic: OV = p̂ = 60/100 = 0.6; EV = 0.5; SE = √(0.5×0.5/100) = 0.05; z = (0.6 − 0.5)/0.05 = 2.0.
  • +1P — P-value: a two-sided z = 2.0 gives p ≈ 2×0.0228 = 0.046 — just inside |z| > 1.96.
  • +1C — Conclusion in context: 0.046 < 0.05, so reject H₀ — there is evidence the coin is biased toward heads (though the effect is borderline).
z = (0.6 − 0.5)/0.05 = 2.0, two-sided p ≈ 0.046 < 0.05, so we reject the fair-coin hypothesis: the data give (borderline) evidence the coin is biased toward heads.
Sia tip — The last mark is the in-context sentence, not the arithmetic. ‘Reject H₀’ alone is incomplete — you must say what it means for the coin, and a careful answer flags that p = 0.046 is only just below 0.05, so the evidence is weak.
Glossary

Key terms

Confounder
A lurking third variable linked to both the exposure and the outcome, so the raw association between them is untrustworthy. It is why an observational study cannot license the word ‘causes’ — only randomisation balances confounders, known and unknown.
Resistance
Robustness of a summary to outliers. The median and IQR are resistant (a few wild points barely move them); the mean and SD are not. The rule: skewed or outlier-heavy data → report the median and IQR.
Standard units (z-score)
How many SDs a value sits from its mean: z = (x − x̄)/s. It puts any value on a common scale, lets you read areas off the Normal curve, and is the OV−EV part of the inference engine made unitless.
Standard error (SE)
The SD of a statistic across repeated samples — how much a sample mean or proportion would wobble from sample to sample. It shrinks like 1/√n, and it is the denominator of the (OV−EV)/SE engine.
P-value
The probability, if the null hypothesis were true, of getting a test statistic as extreme as the one observed. A small p-value means the data are surprising under the null; it is not the probability that the null is true.
FAQ

DATA1001 FAQ

Is DATA1001 hard?

Conceptually approachable but interpretation-dense: most marks reward reading a study correctly, picking the right method, and writing the one-sentence conclusion in context — not heavy maths or memorised code. The pressure is concentrated because the final exam is 60% in one sitting and backstops almost everything else.

How is DATA1001 assessed?

The final exam is 60% in a single 2-hour written sitting and is the universal backstop. The rest is Project 2 (about 20%, individual), Project 1 (about 10%, a group reproducible report), weekly Evaluate Quizzes (about 5%, best 8 of 10 plus an Early task) and workshop participation (about 5%). Confirm this year's exact weights on your own Canvas.

Is the DATA1001 exam a coding exam?

No. It is conceptual and interpretive: you will not write R from a blank screen. You are given studies, plots, summaries and small datasets and asked to choose the right method, run the logic and interpret in context. The same skeleton — (OV−EV)/SE read against a Normal or t curve — powers nearly every inference question.

Do I need to be good at R or maths for DATA1001?

You learn R in the Coding Milestones and Projects, but the exam tests statistical reasoning, not coding fluency. The maths is light: standard units, areas under a curve, a slope, and the (OV−EV)/SE ratio. The skill that earns marks is choosing the right tool and reading the answer correctly.

Is using AskSia for DATA1001 cheating?

No. AskSia is a study reference written in our own words — we host none of your lecturer's files, and Sia teaches you the method to earn the marks; it does not complete or sit your assessments.

Study strategy

How to study for the exam

Because the exam is 60% and the universal backstop — and nothing backstops the exam — over-invest in exam-style reasoning, and treat the projects as exam practice with a longer deadline. Drill the two recurring chains until they are automatic: read the study design → say what conclusion is legal (observational = association only; randomised = causation licensed), and state HATPC → compute (OV−EV)/SE → read the p-value or CI without the classic misreads. Every test — proportion z-test, t-test, slope test, chi-square — is the same standardised distance with a different EV, SE and reference curve, so master the one engine and fresh exam numbers cannot surprise you. The exam pays for the in-context sentence, not the arithmetic.

A+Everything unlocked
Unlocks this Bible + all 25 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.
Sia - your DATA1001 tutor, unlimited, worked the way the exam marks it
The full 43-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full DATA1001 Bible + 25 University of Sydney subjects解锁完整 DATA1001 Bible + University of Sydney 25 门科目
$25/mo