University of Sydney · S1 2026 · FACULTY OF SCIENCE

DATA1001 · Foundations Of Data Science

Q: Is DATA1001 hard?

Conceptually approachable but interpretation-dense: most marks reward reading a study correctly, picking the right method, and writing the one-sentence conclusion in context — not heavy maths or memorised code. The pressure is concentrated because the final exam is 60% in one sitting and backstops almost everything else.

Q: How is DATA1001 assessed?

The final exam is 60% in a single 2-hour written sitting and is the universal backstop. The rest is Project 2 (about 20%, individual), Project 1 (about 10%, a group reproducible report), weekly Evaluate Quizzes (about 5%, best 8 of 10 plus an Early task) and workshop participation (about 5%). Confirm this year's exact weights on your own Canvas.

Q: Is the DATA1001 exam a coding exam?

No. It is conceptual and interpretive: you will not write R from a blank screen. You are given studies, plots, summaries and small datasets and asked to choose the right method, run the logic and interpret in context. The same skeleton — (OV−EV)/SE read against a Normal or t curve — powers nearly every inference question.

Q: Do I need to be good at R or maths for DATA1001?

You learn R in the Coding Milestones and Projects, but the exam tests statistical reasoning, not coding fluency. The maths is light: standard units, areas under a curve, a slope, and the (OV−EV)/SE ratio. The skill that earns marks is choosing the right tool and reading the answer correctly.

Q: Is using AskSia for DATA1001 cheating?

No. AskSia is a study reference written in our own words — we host none of your lecturer's files, and Sia teaches you the method to earn the marks; it does not complete or sit your assessments.

- one subject, every graph, every model, every mark

50% final exam · hurdle7 Chapters43-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

The Complete Exam Bible · S1 2026

Foundations of Data Science

— one course, one engine, every method, every mark

Foundations of Data Science teaches statistical thinking end to end — how data are produced (study design and causation), how to describe them (exploratory data analysis), how to model them (the Normal curve and regression), and how to reason from a sample to a conclusion (probability, sampling distributions and hypothesis testing). The final exam is 60% of your mark in a single 2-hour sitting, and the universal backstop for almost every other component. It is conceptual and interpretive, not a coding exam: you read a study, choose the right method, run the logic and say what it means in plain English. Nearly every inference question runs the same engine — the standardised distance (OV−EV)/SE — scaffolded by HATPC. This guide teaches each topic to that standard.

DATA1001 · University of Sydney

Contents · the whole subject, one map

What DATA1001 covers

Seven exam topics → one exam-ready map, walking the course pipeline Exploring → Modelling → Sampling → Deciding. Each links to its free chapter guide.

01Study DesignData types · observational vs randomised · confounding · bias · sampling 02Exploratory Data AnalysisMean vs median · SD vs IQR · resistance · histogram & skew · boxplot & 1.5·IQR 03The Normal Distributionz-scores · 68–95–99.7 · reading areas & percentiles · measurement error 04RegressionCorrelation r · SD line vs regression line · regression to the mean · r² · residuals 05Probability and the Box ModelAdd for ‘or’ · multiply for ‘and’ · the box model · EV & SE · the law of large numbers 06Sampling DistributionsParameter vs statistic · the central limit theorem · confidence intervals · the bootstrap 07Hypothesis TestingHATPC · the (OV−EV)/SE engine · z / t / slope / chi-square · p-value & CI literacy

Assessment

How DATA1001 is assessed

Component	Weight	Format
Final exam	60%	One 2-hour written paper · conceptual & interpretive (not coding) · the universal backstop
Project 2	20%	Individual — EDA + client report, parts across the semester
Project 1	10%	Group reproducible report
Evaluate Quizzes	5%	Weekly online — best 8 of 10 plus an Early task
Workshop participation	5%	All weeks — attend and take part

Worked example · free

A proportion test by HATPC — the signature inference, mark by mark

Q [6 marks]. A coin is spun n = 100 times and lands heads 60 times. Test, at the 5% level, whether the coin is fair (p = 0.5). Use HATPC and the (OV−EV)/SE engine, and interpret the result in context.

+1H — Hypotheses: H₀: p = 0.5 (fair); H₁: p ≠ 0.5 (two-sided).
+1A — Assumptions: independent spins, np₀ = 50 and n(1−p₀) = 50 both ≥ 10, so the Normal approximation holds.
+2T — Test statistic: OV = p̂ = 60/100 = 0.6; EV = 0.5; SE = √(0.5×0.5/100) = 0.05; z = (0.6 − 0.5)/0.05 = 2.0.
+1P — P-value: a two-sided z = 2.0 gives p ≈ 2×0.0228 = 0.046 — just inside |z| > 1.96.
+1C — Conclusion in context: 0.046 < 0.05, so reject H₀ — there is evidence the coin is biased toward heads (though the effect is borderline).

z = (0.6 − 0.5)/0.05 = 2.0, two-sided p ≈ 0.046 < 0.05, so we reject the fair-coin hypothesis: the data give (borderline) evidence the coin is biased toward heads.

Sia tip — The last mark is the in-context sentence, not the arithmetic. ‘Reject H₀’ alone is incomplete — you must say what it means for the coin, and a careful answer flags that p = 0.046 is only just below 0.05, so the evidence is weak.

Glossary

Key terms

Confounder: A lurking third variable linked to both the exposure and the outcome, so the raw association between them is untrustworthy. It is why an observational study cannot license the word ‘causes’ — only randomisation balances confounders, known and unknown.
Resistance: Robustness of a summary to outliers. The median and IQR are resistant (a few wild points barely move them); the mean and SD are not. The rule: skewed or outlier-heavy data → report the median and IQR.
Standard units (z-score): How many SDs a value sits from its mean: z = (x − x̄)/s. It puts any value on a common scale, lets you read areas off the Normal curve, and is the OV−EV part of the inference engine made unitless.
Standard error (SE): The SD of a statistic across repeated samples — how much a sample mean or proportion would wobble from sample to sample. It shrinks like 1/√n, and it is the denominator of the (OV−EV)/SE engine.
P-value: The probability, if the null hypothesis were true, of getting a test statistic as extreme as the one observed. A small p-value means the data are surprising under the null; it is not the probability that the null is true.

FAQ

DATA1001 FAQ

Is DATA1001 hard?

Conceptually approachable but interpretation-dense: most marks reward reading a study correctly, picking the right method, and writing the one-sentence conclusion in context — not heavy maths or memorised code. The pressure is concentrated because the final exam is 60% in one sitting and backstops almost everything else.

How is DATA1001 assessed?

The final exam is 60% in a single 2-hour written sitting and is the universal backstop. The rest is Project 2 (about 20%, individual), Project 1 (about 10%, a group reproducible report), weekly Evaluate Quizzes (about 5%, best 8 of 10 plus an Early task) and workshop participation (about 5%). Confirm this year's exact weights on your own Canvas.

Is the DATA1001 exam a coding exam?

No. It is conceptual and interpretive: you will not write R from a blank screen. You are given studies, plots, summaries and small datasets and asked to choose the right method, run the logic and interpret in context. The same skeleton — (OV−EV)/SE read against a Normal or t curve — powers nearly every inference question.

Do I need to be good at R or maths for DATA1001?

You learn R in the Coding Milestones and Projects, but the exam tests statistical reasoning, not coding fluency. The maths is light: standard units, areas under a curve, a slope, and the (OV−EV)/SE ratio. The skill that earns marks is choosing the right tool and reading the answer correctly.

Is using AskSia for DATA1001 cheating?

No. AskSia is a study reference written in our own words — we host none of your lecturer's files, and Sia teaches you the method to earn the marks; it does not complete or sit your assessments.

Study strategy

How to study for the exam

Because the exam is 60% and the universal backstop — and nothing backstops the exam — over-invest in exam-style reasoning, and treat the projects as exam practice with a longer deadline. Drill the two recurring chains until they are automatic: read the study design → say what conclusion is legal (observational = association only; randomised = causation licensed), and state HATPC → compute (OV−EV)/SE → read the p-value or CI without the classic misreads. Every test — proportion z-test, t-test, slope test, chi-square — is the same standardised distance with a different EV, SE and reference curve, so master the one engine and fresh exam numbers cannot surprise you. The exam pays for the in-context sentence, not the arithmetic.

A+Everything unlocked

Unlocks this Bible + all 25 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your DATA1001 tutor, unlimited, worked the way the exam marks it

The full 43-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works