DATA1001, the University of Sydney's foundational data-science unit, hangs on one number: the final exam is worth 60% of the grade. The other 40% is split across seven smaller tasks, from a 1% early-feedback quiz to a 17% data-analysis report, and almost all of it is done in R.
The unit carries 6 credit points, has no formal prerequisites, and assumes only Year 10 mathematics. It runs in Semester 1 and Semester 2 at Camperdown/Darlington, and it is the gateway to DATA2002 and the wider Data Science major.
It is a statistics unit wearing a data-science name. Most of the work is statistical reasoning, not coding for its own sake.
What Is DATA1001 at USYD?
DATA1001 develops statistical thinking through real problems: does mobile-phone use raise brain-tumour risk, how does the public react to shark culling. Students work with data from the physical, health, life, and social sciences, mostly in teams.
The unit is run by the School of Mathematics and Statistics, not a computing department. That framing matters. The skills assessed are study design, summarising data, regression, chance, and hypothesis testing, not software engineering. Unlike a coding-first introduction such as Monash's FIT1043, DATA1001 leads with statistical reasoning.
In 2026 the coordinator is Yeeka Yau, with Ellis Patrick lecturing. The optional reference text is Freedman, Pisani, and Purves' Statistics, 4th edition, which shapes the unit's box-model approach to chance.
How Is DATA1001 Assessed?
The grade splits into eight tasks. One dominates. The 2-hour final exam, sat in the formal exam period, tests statistical thinking using R output and counts for 60%.
The next-largest piece is the second data project. Its written report alone is worth 17%, and its earlier exploratory-data-analysis stage adds 3%. Project 1, done in groups, contributes 10% across a presentation and a report.
Quizzes are low-stakes by design. The best 8 of 10 "Evaluate" quizzes count for just 4% combined, at 0.5% each, and Quiz 3 is a separate 1% early-feedback task due in Week 3.
Workshop contribution is worth 5%, earned at 0.5% per session for actively joining the coding and project milestones. Miss the final exam and you receive an automatic AF grade, regardless of your other marks.
Because 60% rides on one supervised paper, exam rehearsal matters more here than in continuously assessed units. AskSia's Mock Exam mode generates adaptive practice in the unit's format, auto-graded with rationale, so you can see how R output maps to written interpretation before exam day.
Why Does DATA1001 Run on R?
Every graphical and numerical summary in DATA1001 is produced in base R and ggplot, named explicitly in the learning outcomes. There is no Python pathway. R is the unit's working language from Week 2 onward.
This trips up students who expected a programming unit. The coding is a means to statistical ends: making a boxplot, fitting a linear model, running a hypothesis test. Syntax errors, not statistics, are the most common early frustration.
That is where one-on-one help compresses time. AskSia's AI tutor explains the same R error or p-value three different ways until one lands, in voice or text, which helps when a single misplaced bracket breaks an entire analysis the night before a report is due.
The payoff is real fluency. By Week 12, students are running tests for a mean and tests for a relationship on live multivariate data, the same techniques used across research and industry.
What Does Each Week Cover?
The 12-week schedule moves from designing studies to testing relationships, building one layer at a time. The first half is description and modelling. The second half is chance and inference.
Week 7 is a Project Week with no new theory. It is a deliberate stop to consolidate the first six topics before inference begins.
The dependencies are not obvious from a week list. AskSia's Concept Map renders the unit as a navigable tree, so you can see how the normal model in Week 4 feeds the box model in Week 8 and the hypothesis tests in Weeks 10–12.
How Hard Is DATA1001?
The official workload is 120–150 hours across the semester, the standard 1.5–2 hours per credit point per week for a 6-credit-point unit. Spread evenly, that is roughly 9 hours a week.
Difficulty is bimodal. Students comfortable with senior statistics often find the theory gentle but underestimate R. Students strong in calculus-heavy units like UniMelb's MAST10006 sometimes struggle more here, because DATA1001 rewards interpretation over computation.
The grade scale is standard: 50–64 is a pass, 65–74 a credit, 75–84 a distinction, and 85 or above a high distinction. The pass mark is 50, but the final exam is compulsory and must be attempted.
The most common failure mode is not the maths. It is leaving the two data projects, worth 28% combined, until the deadline, when reproducible R reports cannot be rushed.
For students arriving from foundational maths units such as Monash's MAT9004, the statistics vocabulary is the steeper climb. AskSia's Flashcards build spaced-repetition decks for R functions and test-selection rules, tuned to your exam date.
Where Does DATA1001 Lead?
DATA1001 is the prerequisite for DATA2002, the next unit in the Data Science major, so a pass here is a gate, not an endpoint. It also satisfies the statistics requirement in many science and commerce degrees.
For students aiming higher, the unit has an advanced twin.
DATA1901 covers the same foundations at greater depth, with masterclasses, and is built for high achievers heading into the major. You cannot take both: each prohibits the other, along with units like MATH1005 and ECMT1010.
Whichever you choose, the unit appears on the DATA1001 hub and within AskSia's wider University of Sydney unit guides.
Frequently Asked Questions
What are the prerequisites for DATA1001?
DATA1001 has no formal prerequisites and assumes only Year 10 mathematics or equivalent, which makes it accessible to almost any first-year student. What it does have is a long prohibition list: you cannot count DATA1001 toward your degree if you have already passed DATA1901, MATH1005, MATH1905, MATH1015, MATH1115, ENVX1002, ECMT1010, or BUSS1020, because all eight cover overlapping statistics. The unit is also open to study-abroad and exchange students. If you are unsure whether a prior unit clashes, the prohibition list on the official DATA1001 unit page is the authoritative source. Check it against your transcript in Sydney Student before you enrol, since the system will block a prohibited combination at registration.
What's the difference between DATA1001 and DATA1901?
Both units are 6 credit points and cover the same foundations: study design, data summaries, the normal and linear models, chance, and hypothesis testing. DATA1901 is the advanced version, taught at greater depth with additional masterclasses, and aimed at students with a stronger mathematical background heading into the Data Science major. You cannot enrol in both, because each is listed as a prohibition for the other. For most students, DATA1001 is the default and is enough to progress to DATA2002. Choose DATA1901 only if you are confident with senior-level statistics and want the more rigorous treatment. The two outlines list identical learning outcomes, so the gap is in pace and challenge, not curriculum. Compare both unit pages directly before locking in your choice.
Can you use AI in DATA1001 assessments?
Partly. The Semester 1 2026 outline allows generative AI on the open assessments: both data projects, the EDA task, the quizzes, and workshop contribution, provided you acknowledge its use. AI is prohibited in the final exam, which is a secure, supervised paper worth 60% of the grade. Using AI without acknowledgment, or in a banned task, can breach the University's Academic Integrity Policy. The practical line is that AI can help you learn and draft the project work, while the unit confirms your understanding under exam conditions where no tools are allowed. Check the "Use of AI" column for each task in Canvas, since the rules are set per assessment, not unit-wide.
When is DATA1001 offered each year?
DATA1001 runs in both Semester 1 and Semester 2, normal day mode, at the Camperdown/Darlington campus, with a Semester 1 offering also at Westmead. An Intensive January session has appeared in recent handbook listings for students who want to finish it before the standard year begins. The Semester 1 2026 census date is 31 March, the last day to drop the unit without academic or financial penalty. Teaching runs three hours of lectures a week plus a two-hour workshop from Week 1. To confirm the exact timetable and any campus changes for your intake, check the unit availability table on the official DATA1001 page before locking your enrolment.
What mark do you need to pass DATA1001?
The pass mark is 50 on the University's standard scale: 50–64 is a pass, 65–74 a credit, 75–84 a distinction, and 85 or above a high distinction. One rule overrides your raw total: the final exam is compulsory and must be attempted, or you receive an automatic AF (absent fail) regardless of your other marks. The unit also applies "better mark" principles, so a strong exam can lift a weak quiz total. Because the two data projects are worth 28% combined and cannot be rushed, the safest path to a comfortable pass is steady project work plus deliberate exam practice. Map your target grade backward from the 60% exam weight to see how many marks each task must contribute.
What Might Change Before You Enrol?
Unit details are reviewed annually. The 60/40 assessment split and project structure described here are from the Semester 1 2026 outline, and the University divided Project 2 into two parts this year. A future offering could rebalance the weights or change the project format.
Outlines are published roughly two weeks before teaching starts, so Semester 2 details may differ slightly. Treat the official DATA1001 unit page as the single source of truth, and confirm the assessment table for your specific session before you plan your semester around it.