DATA1001

Jun 1, 2026

All files

我拿到了一份我在上的课的考前宝典！你帮我拆解一下这节课，告诉我复习重点，我快final了越详细越好，公式，定义，核心考点我都需要，救救孩子

先给你一句“救命总纲”：DATA1001 的 Final 是“概念 + 解读”考试，不是写 R 代码考试。你会拿到研究/图/汇总/小数据集，被考：能不能读懂设计 → 选对方法 → 算出关键量 → 用一句话把结论讲清楚。[1]Source: asksia-bible-data1001-bilingual.pdfIndependent study companion. Not affiliated with or endorsed by the University of Sydney. Corrections: takedowns@asksia. ai PREFACE - HOW TO USE THIS BOOK Method, not memory; context, not code 重方法,不重记忆;重情境,不重代码 The exam is conceptual & interpretive - read a study, pick a tool, say what it means 考试偏概念与解读 -- 读一份研究、选一个工具、说清它意味着什么 This is not a transcript of the lecture slides or a re-run of the R labs. It is a self-contained course in the statistical thinking DATA1001 examines - each idea stated plainly, each method shown on a worked example with real numbers, each classic misread flagged. You learn R in the Coding Milestones and Projects; the exam tests whether you can read a study, choose the right method, run the logic and interpret the answer in context. That is what these pages drill. 这不是讲义幻灯片的逐字稿,也不是R实验课的重播。它是一门自成体系的课程,讲的是 DATA1001 所考查的统计思维每个概念都讲得明明白白,每种方法都配一个用真实数字做出的范例,每个经典误读都被标了出来。你在 Coding Milestones 和 Projects 里学 R;考试考查的是你能否读懂一份研究、选对方法、跑通逻辑,并结合情境解读答案。这正是本书所要演练的。 A 1 . LEARN 1· 学习 You haven't done the topic yet. Read a chapter top to bottom. Every idea opens with a one-line TL;DR, then define - picture - method - worked example - trap. The diagrams are original schematics of standard statistics - learn the picture cold. 你还没学过这个主题。从头到尾通读一章。每个要点都以一句 TL;DR 开头,然后是定义→图示→方法→例题→陷阱。图都是标准统计内容的原创示意图 -- 把图刻进脑子里。 B 2 . DRILL 2 · 演练 You've seen lectures and a workshop. Cover the worked steps and re-do each one by hand, then write the one- sentence interpretation in context. The exam pays for the sentence, not the arithmetic. 你已经看过讲座和一次研讨课。遮住解题步骤、亲手把每一步重做一遍,再写出结合情境的一句话解释。考试给分的是那句话, 而不是算术。 C 3 . EXAM 3 . 应考 It's the revision lecture / study week. The TL;DRs, the trap boxes and the recurring (OV-EV)/SE pattern are your map. The blueprint overleaf shows the weights, the backstop machinery and the question template. 到了复习讲座/学习周。那些 TL;DR、陷阱框、以及反复出现的(OV-EV)/SE 模式就是你的地图。背面的蓝图展示了分值权重、兜底机制和题目模板。 i The single engine that runs the back half of the course 驱动这门课后半程的那台唯一引擎 Master one calculation and the whole inference half collapses into a pattern. Every test - proportion test, z-test, t- test, slope test - is the same standardised distance, only the EV, the SE and the reference curve change. Wrapped around it is HATPC, the course's literal exam scaffold that graders reward line by line. Internalise the engine and the scaffold and fresh exam numbers cannot surprise you. 掌握一个计算,整个推断部分就坍缩成一个模式。每个检验 -- 比例检验、z检验、t 检验、斜率检验 -- 都是同一个标准化距离,只是EV、SE 和参考曲线在变。围绕它的是 HATPC,本课程字面意义上的考试脚手架,阅卷人逐行给分。把引擎和脚手架内化,再新的考试数字也吓不到你。 DATA1001 . Foundations of Data Science . AskSia Library THE SPINE test statistic = OV - EV SE HATPC [4]Source: asksia-bible-data1001-bilingual.pdfProject 2 (individual; EDA + client report) 20% Parts due ~Wk 9 & 11 Project 1 (group reproducible report) 10% ~Wk 6, present Wk 7 Evaluate Quizzes (weekly online) 5% Best 8 of 10 + Early task Workshop participation 5% All weeks . attend + take part The better-mark / progress-mark machinery 取较优分 / 进步加分的机制 Rule What it does Project progress- mark A better Project 2 mark replaces Project 1 Quiz better-mark If exam % > quiz %, exam % replaces quiz score Spec-con adjustment Missed work can be pushed onto the exam weight Net effect The exam is the universal backstop - and ungated by anything DATA1001 . Foundations of Data Science . AskSia Library ★ The exam format - conceptual & interpretive, NOT coding 考试形式 -- 偏概念与解读,而非写代码 One 2-hour written paper. You will not be asked to write R from a blank screen. You will be given studies, plots, summaries and small datasets and asked to pick the right method, run the logic and interpret in context. The same skeleton - (OV-EV)/SE read against a Normal or t curve - powers nearly every inference question. Walk the module pipeline on a fresh dataset and you have walked the exam. 一份2小时的笔试。不会要求你对着空白屏幕写R。你会拿到研究、图、汇总和小数据集,被要求挑对方法、跑通逻辑、并结合情境解释。同一副骨架 (OV-EV)/SE 对照正态或 t 曲线 -- 驱动几乎每一道推断题。在一个新数据集上走一遍模块流水线,你就走了一遍考试。 ✓ The strategy this dictates
你这份“考前宝典/cheatsheet”的核心思想是：整门课后半程其实只有一台引擎在驱动——
- 检验统计量：$$\text{stat}=\frac{OV-EV}{SE}$$
- 每个检验（比例、z/t、斜率、卡方）只是 EV/SE/参考曲线换了，骨架不变。[1]Source: asksia-bible-data1001-bilingual.pdfIndependent study companion. Not affiliated with or endorsed by the University of Sydney. Corrections: takedowns@asksia. ai PREFACE - HOW TO USE THIS BOOK Method, not memory; context, not code 重方法,不重记忆;重情境,不重代码 The exam is conceptual & interpretive - read a study, pick a tool, say what it means 考试偏概念与解读 -- 读一份研究、选一个工具、说清它意味着什么 This is not a transcript of the lecture slides or a re-run of the R labs. It is a self-contained course in the statistical thinking DATA1001 examines - each idea stated plainly, each method shown on a worked example with real numbers, each classic misread flagged. You learn R in the Coding Milestones and Projects; the exam tests whether you can read a study, choose the right method, run the logic and interpret the answer in context. That is what these pages drill. 这不是讲义幻灯片的逐字稿,也不是R实验课的重播。它是一门自成体系的课程,讲的是 DATA1001 所考查的统计思维每个概念都讲得明明白白,每种方法都配一个用真实数字做出的范例,每个经典误读都被标了出来。你在 Coding Milestones 和 Projects 里学 R;考试考查的是你能否读懂一份研究、选对方法、跑通逻辑,并结合情境解读答案。这正是本书所要演练的。 A 1 . LEARN 1· 学习 You haven't done the topic yet. Read a chapter top to bottom. Every idea opens with a one-line TL;DR, then define - picture - method - worked example - trap. The diagrams are original schematics of standard statistics - learn the picture cold. 你还没学过这个主题。从头到尾通读一章。每个要点都以一句 TL;DR 开头,然后是定义→图示→方法→例题→陷阱。图都是标准统计内容的原创示意图 -- 把图刻进脑子里。 B 2 . DRILL 2 · 演练 You've seen lectures and a workshop. Cover the worked steps and re-do each one by hand, then write the one- sentence interpretation in context. The exam pays for the sentence, not the arithmetic. 你已经看过讲座和一次研讨课。遮住解题步骤、亲手把每一步重做一遍,再写出结合情境的一句话解释。考试给分的是那句话, 而不是算术。 C 3 . EXAM 3 . 应考 It's the revision lecture / study week. The TL;DRs, the trap boxes and the recurring (OV-EV)/SE pattern are your map. The blueprint overleaf shows the weights, the backstop machinery and the question template. 到了复习讲座/学习周。那些 TL;DR、陷阱框、以及反复出现的(OV-EV)/SE 模式就是你的地图。背面的蓝图展示了分值权重、兜底机制和题目模板。 i The single engine that runs the back half of the course 驱动这门课后半程的那台唯一引擎 Master one calculation and the whole inference half collapses into a pattern. Every test - proportion test, z-test, t- test, slope test - is the same standardised distance, only the EV, the SE and the reference curve change. Wrapped around it is HATPC, the course's literal exam scaffold that graders reward line by line. Internalise the engine and the scaffold and fresh exam numbers cannot surprise you. 掌握一个计算,整个推断部分就坍缩成一个模式。每个检验 -- 比例检验、z检验、t 检验、斜率检验 -- 都是同一个标准化距离,只是EV、SE 和参考曲线在变。围绕它的是 HATPC,本课程字面意义上的考试脚手架,阅卷人逐行给分。把引擎和脚手架内化,再新的考试数字也吓不到你。 DATA1001 . Foundations of Data Science . AskSia Library THE SPINE test statistic = OV - EV SE HATPC [10]Source: asksia-bible-data1001-bilingual.pdflikely size of chance error H A T P c The exam scaffold - write all five letters, every time HATPC 考试脚手架 -- 每次都把五个字母全部写出来 Step What goes here Marks reward H Hypotheses Ho (=, "due to chance") vs H, (>, < or #). Decide one- vs two-sided. Correct symbols & direction A Assumptions Independence; Normality / large n; equal variance - state and justify each. Checking, not just listing T Test statistic Plug into (OV-EV)/SE with the right EV and SE under Ho. Right EV, right SE, arithmetic P P-value P(stat as/more extreme | H. ) from the reference curve. Double for two- sided. Correct tail, correct doubling C Conclusion Statistical (vs a) and scientific (in context). Both layers, in plain English = DATA1001 . Foundations of Data Science . AskSia Library i Why one engine covers four tests 为什么一台引擎能覆盖四种检验 OV = the statistic you computed (a proportion p, a mean x, a difference, a slope @ ). EV = what He says that statistic should be. SE = the standard error - the likely size of the chance error in that statistic, from the box model / sampling distribution. The only things that change across the test zoo are the formula for SE and which reference curve you read the p-value from. Learn the skeleton once and the rest is bookkeeping. OV= 你计算出的统计量(一个比例 p、一个均值 x、一个差值、一个斜率βi)。EV= Ho 所说该统计量应有的值。SE= 标准误 -- 该统计量中机遇误差可能的大小,来自盒子模型/抽样分布。在这套检验体系里唯一变化的,是 SE 的公式以及你从哪条参考曲线读p值。骨架学一次,其余都是记账。 ★ The two-layer conclusion graders look for 评分者想看到的双层结论 A bare "reject H. " rarely gets full marks. Write both: (1) statistical - "p = 0. 013 < 0. 05, so we reject Ho at the 5% level"; then (2) scientific - "there is evidence the new tutoring program raises mean exam scores. " And never say "accept H. " - you retain it (absence of evidence # evidence of absence).[16]Source: asksia-cheatsheet-data1001.pdf17 . HATPC TOPIC 9 . THE SCAFFOLD * The exam scaffold for every test: · P-value - see col 5. · Conclusion - two layers: statistical (vs a) + scientific (in context). Ho always carries the = ("the difference is due to chance"); H, carries >, < or #. Decide one- vs two-sided before seeing the data, from the research question - not to chase significance. Assumptions checklist: independence (from the design), Normality or large-n (histogram / QQ-plot / Shapiro-Wilk), equal variance for a pooled two-sample test (compare spreads). If they fail: transform the data or use a different test. State each - graders award the "A". Write the conclusion in both registers: "p = 0. 03 < 0. 05, so we reject Ho (statistical); there is evidence the new method raises the mean score (scientific, in context). " Dropping the context half loses marks. Never write "accept Ho" - only "retain", since absence of evidence is not evidence of absence. Compiled by AskSia . mapped to the DATA1001 syllabus . asksia. ai/cheatsheet/usyd- data1001 18 . THE ENGINE ONE STAT, FOUR TESTS * THE MASTER TEST STATISTIC stat = (OV - EV) / SE - = (observed - what He predicts) / SE How many SEs the data sit from Ho. Large |stat]=> evidence against Ho; near 0 => consistent with chance. Proportion, z, t and slope tests are all this one calculation - only EV, SE and the reference curve change. Once you can read EV and SE off the box model, every test is the same three keystrokes. TEST SE / EV CURVE Proportion VIPo(1-Po)/n] N(0,1) z (o known) o/n
最重要的复习策略：每做一道题，都强迫自己写“情境一句话解释”；考试给分的是这句话，而不是算术。[1]Source: asksia-bible-data1001-bilingual.pdfIndependent study companion. Not affiliated with or endorsed by the University of Sydney. Corrections: takedowns@asksia. ai PREFACE - HOW TO USE THIS BOOK Method, not memory; context, not code 重方法,不重记忆;重情境,不重代码 The exam is conceptual & interpretive - read a study, pick a tool, say what it means 考试偏概念与解读 -- 读一份研究、选一个工具、说清它意味着什么 This is not a transcript of the lecture slides or a re-run of the R labs. It is a self-contained course in the statistical thinking DATA1001 examines - each idea stated plainly, each method shown on a worked example with real numbers, each classic misread flagged. You learn R in the Coding Milestones and Projects; the exam tests whether you can read a study, choose the right method, run the logic and interpret the answer in context. That is what these pages drill. 这不是讲义幻灯片的逐字稿,也不是R实验课的重播。它是一门自成体系的课程,讲的是 DATA1001 所考查的统计思维每个概念都讲得明明白白,每种方法都配一个用真实数字做出的范例,每个经典误读都被标了出来。你在 Coding Milestones 和 Projects 里学 R;考试考查的是你能否读懂一份研究、选对方法、跑通逻辑,并结合情境解读答案。这正是本书所要演练的。 A 1 . LEARN 1· 学习 You haven't done the topic yet. Read a chapter top to bottom. Every idea opens with a one-line TL;DR, then define - picture - method - worked example - trap. The diagrams are original schematics of standard statistics - learn the picture cold. 你还没学过这个主题。从头到尾通读一章。每个要点都以一句 TL;DR 开头,然后是定义→图示→方法→例题→陷阱。图都是标准统计内容的原创示意图 -- 把图刻进脑子里。 B 2 . DRILL 2 · 演练 You've seen lectures and a workshop. Cover the worked steps and re-do each one by hand, then write the one- sentence interpretation in context. The exam pays for the sentence, not the arithmetic. 你已经看过讲座和一次研讨课。遮住解题步骤、亲手把每一步重做一遍,再写出结合情境的一句话解释。考试给分的是那句话, 而不是算术。 C 3 . EXAM 3 . 应考 It's the revision lecture / study week. The TL;DRs, the trap boxes and the recurring (OV-EV)/SE pattern are your map. The blueprint overleaf shows the weights, the backstop machinery and the question template. 到了复习讲座/学习周。那些 TL;DR、陷阱框、以及反复出现的(OV-EV)/SE 模式就是你的地图。背面的蓝图展示了分值权重、兜底机制和题目模板。 i The single engine that runs the back half of the course 驱动这门课后半程的那台唯一引擎 Master one calculation and the whole inference half collapses into a pattern. Every test - proportion test, z-test, t- test, slope test - is the same standardised distance, only the EV, the SE and the reference curve change. Wrapped around it is HATPC, the course's literal exam scaffold that graders reward line by line. Internalise the engine and the scaffold and fresh exam numbers cannot surprise you. 掌握一个计算,整个推断部分就坍缩成一个模式。每个检验 -- 比例检验、z检验、t 检验、斜率检验 -- 都是同一个标准化距离,只是EV、SE 和参考曲线在变。围绕它的是 HATPC,本课程字面意义上的考试脚手架,阅卷人逐行给分。把引擎和脚手架内化,再新的考试数字也吓不到你。 DATA1001 . Foundations of Data Science . AskSia Library THE SPINE test statistic = OV - EV SE HATPC [3]Source: asksia-bible-data1001-bilingual.pdf由此决定的应考策略 Because the exam backstops quizzes, Project 1 and missed pieces - but nothing backstops the exam - the dominant move is to over-invest in exam-style reasoning. Treat the projects as exam practice with a longer deadline: the EDA, the method choice, HATPC and interpret-for-a-client are exactly what the exam rewards. Drill the engine; write the in-context sentence every time. 因为期末为小测、Project 1 和缺失环节兜底 -- 而期未本身无人兜底 -- 主导策略就是过度投资于考试式推理。把项目当作截止期更长的考试练习:EDA、方法选择、HATPC、以及面向客户的解释,恰恰是考试给分之处。狂练引擎;每次都写出结合情境的那句话。 i What the exam is really testing 考试真正考查的是什么 Four recurring chains carry most marks: read the study design - say what conclusion is legal; summarise & plot - describe shape/centre/spread; state HATPC - compute (OV-EV)/SE - p-value; interpret the p-value / CI without the classic misreads. Every chapter in this book is built to make those chains automatic. 四条反复出现的链条承载了大部分分值:读研究设计→说出什么结论合法;汇总并作图→描述形状/中心/离散;陈述 HATPC → 计算(OV-EV)/SE →p 值;解释p 值/CI,避开那些经典误读。本书的每一章都旨在让这些链条变成下意识动作。 DATA1001 . Foundations of Data Science . AskSia Library CONTENTS - CONTENTS Four modules, one pipeline 四个模块,一条流水线 Exploring - Modelling - Sampling - Deciding - and one engine under it all 探索→建模→ 抽样→ 决策––以及贯穿全程的一台引擎 Ch Topic Core ideas Module 1 . Exploring data (Weeks 1-3) 1 Design & data types categorical vs quantitative · observational vs experiment . confounding · bias . → sampling 2 Exploratory data analysis mean/median . SD/IQR . resistance . histogram & skew . boxplot & 1. 5. IQR → Module 2 . Modelling data (Weeks 4-5) 3 The Normal model z-scores . 68-95-99. 7 . measurement error . pnorm/qnorm → 4 The linear model correlation · regression line . SD line . regression to the mean · r2 → Module 3 . Sampling data (Weeks 6-9) 5 Chance & the box model probability rules . binomial . EV & SE . the CLT → 6 Surveys & confidence intervals parameter vs statistic · bias . 0-1 box . CI . bootstrap → Module 4 . Decisions with data (Weeks 10-12) 7 Testing: HATPC & the engine (OV-EV)/SE · proportion / z / t / slope · p-value & Cl literacy →

5）你给我 3 个信息，我可以把它变成“你的个人复习清单 + 每天刷题计划”
你 Final 是哪天（具体日期）？
你最怕哪三块（选 3 个）：
- A 设计/混杂/偏差（因果判断）
- B EDA（箱线图、IQR、离群、偏斜、mean vs median）
- C 正态 + z 分数
- D 回归/相关 + 斜率检验（LINE 假设）
- E 概率/二项/“至少一个”
- F 盒子模型 + CLT + SE（SD vs SE）
- G HATPC + t 检验 / 卡方 / CI 解释
你希望我用哪种方式带你：
- “我给你模板+你自己写答案我来批改”，还是“我先给你完整示范答案再让你仿写”？