USyd · BUSS6002 · Data Science in Business

BUSS6002: pass the exams, not just read the notes

Your complete guide to University of Sydney's data science in business unit. See where the marks are, work real practice questions, and study with an AI tutor that knows BUSS6002.

6 credit points Postgraduate coursework Offered S1 ~70% exams Discipline of Business Analytics

Learn with AskSia Explore the AskSia library

Sia generates BUSS6002 practice questions, walks through knowledge discovery and linear algebra i step by step, and quizzes you on the material the exam weights most heavily.

Try a real exam-style question

Worked example

Multiple choice · solution revealed after you answer

In BUSS6002 the inner product of two vectors × and y in R^n is defined as the sum over i of x_i times y_i. Let × = [2, -1, 3] and y = [5, -2, -3]. What is the inner product of × and y?

Worked solution

Apply the definition: the inner product is the sum of the element-wise products, (2)(5) + (-1)(-2) + (3)(-3).

First term: 2 times 5 = 10.
Second term: (-1) times (-2) = +2, because a negative times a negative is positive.
Third term: 3 times (-3) = -9.
Add them: 10 + 2 − 9 = 3.

The trap: Mishandling the signs. Writing the second term as -2 instead of +2 gives 10 − 2 − 9 = -1, and summing the magnitudes ignoring sign gives 10 + 2 + 9 = 21. A negative times a negative is positive, so the correct answer is 3. classic slip!

Generate 10 more like this with Sia

your whole grade↘

Where your grade comes from Exams 70% · Reports 30%

2530Final · 45%

One exam decides 45% of your grade. Covers the entire semester (all weeks). This whole page is built around that.

Overview

What BUSS6002 is, and where it sits

BUSS6002 is the University of Sydney Business School's postgraduate gateway to data science. It bridges the gap between introductory statistics and advanced analytics, blending three disciplines (Business Analytics, Marketing and Business Information Systems) into a single 13-week core. You move from data fundamentals and exploratory data analysis, through vectors, matrices and linear regression, into clustering, classification, model selection, maximum likelihood estimation and big data.

The defining feature is that you learn the maths behind each method, implement it in Python (NumPy, scikit-learn), and frame every technique against a real business decision. About 70% of the grade is two closed-book pen-and-paper exams that test maths by hand (matrix algebra, regression inference, bias-variance, MLE derivations) plus hand-written Python, so a method you can only run in code but cannot derive will cost you marks.

It is a foundational postgraduate core: it assumes basic probability, linear algebra and calculus, and it feeds the more advanced QBUS analytics units. The cohort is mixed, from confident coders to people opening Python for the first time, which is part of what makes the unit feel demanding.

How it differs from its first-year siblings. QBUS5001 is the corequisite analytics foundation and sits beside BUSS6002 rather than overlapping it; BUSS1020 is the undergraduate quantitative-methods grounding (descriptive stats, probability, regression) that BUSS6002 assumes and then pushes into matrix algebra and machine learning; DATA3404 is the data-engineering and scalable-storage angle, where BUSS6002 stays on the modelling and decision side. BUSS6002's signature is doing the maths by hand on paper, not only calling a library.

Official outline: sydney.edu.au · BUSS6002 outline. Always treat the official outline and the exam timetable as authoritative.

Difficulty & time commitment

Is BUSS6002 hard, and how much time does it take?

BUSS6002 is manageable if you keep a weekly rhythm and treat the back half as the main event. Across student reviews the pattern is consistent: it starts gently and steepens, and the heaviest assessment is the part that separates grades.

Difficulty

3.67 / 5

Hard. Gentle early, demanding back half. Hard to fail with steady work; an HD takes consistent practice.

Exam load

70%

The exams decide most of the grade. The heaviest single component is 45%.

Weekly time

~10 hrs

The standard load for a 6-credit-point unit, around 1.5 hours per credit point per week including class.

A read across student reviews and course feedback. See what students say ↓

Weeks 1 to 6vocabulary build

Weeks 7 to 11steep

The difficulty curve and the assessment weighting point the same way: the back half is harder and worth more. Front-loading effort there is the highest-return decision in the unit.

Is this unit for you

Who tends to do well, and who tends to struggle

You will likely do well if

You are comfortable doing matrix algebra and calculus by hand, not just calling a library
You can write and read Python (NumPy, scikit-learn) and trace code on paper
You keep up weekly, because the formula spine compounds and a missed week on vectors hurts again at regression and MLE
You build and rehearse your one A4 note sheet early rather than the night before

You may struggle if

You rely on running code and never practise the derivations the exams demand by hand
You are rusty on probability, linear algebra and calculus (the assumed knowledge) and do not shore it up in Weeks 1 to 4
You treat the 30% assignment as the whole game, when 70% of the mark is still on pen-and-paper exams
You cram, because bias-variance, MLE and the log-transform bias result need spaced practice to stick

do this ↘

What HD students do differently

Convert every module's worked example into a hand-written drill and redo it without notes
Curate the A4 sheet as a living document from Week 1, every formula and every step in the course's own notation, because the exam requires that notation
Practise the three exam genres separately: timed multiple choice, show-all-working short answer, and hand-written Python
Be able to derive and explain, not just state, the least-squares solution, the bias-variance decomposition, and why logistic regression needs gradient descent

Syllabus

The 13 topics, week by week

The exam-weight marker on each topic shows where the marks concentrate. The amber topics carry the highest exam weight.

T1 · Introduction to data and data science

Lecturer-authored module; Python practice questions

History and state of data in marketing, big data vs small data, how data connects to decisions, and Python basics (variables, types, lists, indexing, loops, if-else).

Lower exam weight

T2 · Analytical capabilities and data fundamentals

Lecturer-authored module; data-types notes

Organisational structures and the data lifecycle, decision environments, descriptive vs predictive vs prescriptive analytics, variable types (nominal, ordinal, discrete, continuous), and file formats.

Lower exam weight

T3 · Knowledge discovery and exploratory data analysis

NIST/SEMATECH e-Handbook; Behrens (1997)

Process models (KDDA, CRISP-DM, Snail Shell), the four sample moments, sample quantiles, histogram, boxplot, Q-Q plot, scatter plot, sample correlation, and data quality (missing data, outliers).

High exam weightQuiz me on knowledge discovery →

T4 · Linear algebra I and scientific computing I

Lecturer-authored linear algebra notes

Vectors and vector operations, the inner product, the 2-norm, Euclidean distance between vectors, and NumPy for scientific computing.

High exam weightQuiz me on linear algebra i →

T5 · Clustering and customer segmentation

Bishop, Pattern Recognition and Machine Learning, Ch 9.1

Unsupervised learning, clusters and centroids, the within-cluster sum of squares objective, and the k-means algorithm (assign, update centroids, repeat to convergence).

High exam weightQuiz me on clustering →

T6 · Linear algebra II and regression

James, Witten, Hastie & Tibshirani (ISLR)

Matrices and matrix operations, simple and multiple linear regression, the least-squares solution, regression inference (standard error, confidence interval, t-test, R squared), and residual diagnostics.

High exam weightQuiz me on linear algebra ii →

T7 · Feature engineering and ethics

Lecturer-authored module

Data ethics (ownership, transparency, privacy, intention, outcomes and disparate impact), responsible AI, feature transformations, and the log-transform back-transform bias via Jensen's inequality.

High exam weightQuiz me on feature engineering →

T8 · Classification

Lecturer-authored module; ISLR

Binary classification, the decision threshold, logistic regression and the logit link, the confusion matrix, and metrics (accuracy, recall, specificity, precision, F1) including the imbalanced-data case.

High exam weightQuiz me on classification →

T9 · Model evaluation and selection

Lecturer-authored module; ISLR

Measuring campaign success, linear basis function models, underfitting vs overfitting, the bias-variance tradeoff and decomposition, train/validation/test splits, and expected prediction error.

High exam weightQuiz me on model evaluation →

W10

T10 · Model fitting and optimisation (MLE)

Lecturer-authored maths-foundations handout

Objective functions, likelihood and log-likelihood, maximum likelihood estimation, analytic vs iterative solutions, and gradient descent (and why logistic regression has no closed form).

High exam weightQuiz me on model fitting →

W11

T11 · Big data

Lecturer-authored module

Big-data issues and consequences, algorithm time complexity, wide vs tall data, and big-data strategies (dimension reduction, sampling, distributed compute).

Lower exam weight

W12

T12 · Consolidation and exam-style practice

Weekly practice questions

Tutorial and review week with no new module. Timed practice across Weeks 1 to 11 in the three exam genres.

Lower exam weight

W13

T13 · Final revision and consultation

Consultation and revision sessions

Week-by-week review, the maths-foundations handout (MLE, optimisation, gradient descent), and building the single A4 double-sided note sheet.

Lower exam weight

How it's assessed

Assessment structure

Component	Weight	Format & timing
Mid-semester exam	25%	In-person, on-campus, pen-and-paper. Multiple choice (12 marks) plus short answer (15 marks) plus Python code (8 marks), 35 marks total, 1 hour with no reading time. Permitted: non-programmable calculator, one A4 single-sided handwritten note sheet, physical translation dictionary. Around mid-semester (S1 2026 sitting: 19 April 2026). Covers Weeks 1 to 6 inclusive.
Individual assignment	30%	Individual written report on a predictive-modelling task in Python (for example logistic regression with scikit-learn). Markers must be able to follow and reproduce your code and reasoning. GenAI is permitted with a required GenAI reflection. Late penalty 5% per day, zero after 10 days. Released around Week 10. Applied modelling drawing on EDA, regression and classification.
Final exam	45%	In-person, on-campus, pen-and-paper. Multiple choice (12 marks) plus short answer (27 marks, show all working) plus Python code (6 marks), 45 marks total, 2 hours writing plus 10 minutes reading. Permitted: non-programmable calculator, one A4 double-sided handwritten note sheet, physical translation dictionary. Answers in blue or black pen. End-of-semester exam period (S1 2026 sitting: 17 June 2026). Covers the entire semester (all weeks).

Mid-semester exam25%

In-person, on-campus, pen-and-paper. Multiple choice (12 marks) plus short answer (15 marks) plus Python code (8 marks), 35 marks total, 1 hour with no reading time. Permitted: non-programmable calculator, one A4 single-sided handwritten note sheet, physical translation dictionary.

Individual assignment30%

Individual written report on a predictive-modelling task in Python (for example logistic regression with scikit-learn). Markers must be able to follow and reproduce your code and reasoning. GenAI is permitted with a required GenAI reflection. Late penalty 5% per day, zero after 10 days.

Final exam45%

In-person, on-campus, pen-and-paper. Multiple choice (12 marks) plus short answer (27 marks, show all working) plus Python code (6 marks), 45 marks total, 2 hours writing plus 10 minutes reading. Permitted: non-programmable calculator, one A4 double-sided handwritten note sheet, physical translation dictionary. Answers in blue or black pen.

Obtain at least 50% overall. There is no component hurdle. A separate 65% progression rule applies for some later QBUS units but it is not a hurdle for passing BUSS6002.
Both exams have three sections: multiple choice, short answer (show all working), and hand-written Python code. The mid-semester allows a single-sided A4 note sheet; the final allows a double-sided A4 note sheet.
Calculator policy: Non-programmable calculator permitted in both exams; a physical translation dictionary is also allowed

read this! If you read nothing else

This is an exam-cram unit. With the exams at 70% of the grade and the final exam alone at 45%, your result is overwhelmingly decided by how well you perform under time pressure. Covers the entire semester (all weeks).

Final exam timing: 17 June 2026 (S1 2026 sitting; confirm each semester). Confirm the exact date and venue on the official exam timetable.

How to actually pass it

A weekly rhythm, two checklists, and the traps to avoid

The unit rewards consistency over cramming, and practice over re-reading. Here is the loop that works, then what to have nailed before each exam.

The weekly loop

During the week

Work through the module content and the orienting examples, then re-derive each formula by hand rather than reading it.

Same week

Do that week's practice questions while the material is fresh, and re-implement at least one worked example in Python to keep coding fluency warm.

After each module

Add anything exam-worthy to your A4 note sheet immediately, in the course's exact notation.

When a derivation will not click

Book a consultation or drop-in before the gap compounds into later weeks.

Before the mid-semester checklist

Master Weeks 1 to 6 cold: Python, data types, the four moments, vectors and norms, k-means, and regression with its assumptions
Build the single-sided A4 sheet for the mid-semester in the course's own notation
Drill the inner product, 2-norm and Euclidean distance until they are automatic
Practise one full iteration of k-means by hand: assign points, then recompute centroids
Do timed practice across all three sections (multiple choice, short answer, hand-written Python)

Before the final heaviest topics

Rebuild the A4 sheet double-sided, because the final allows double-sided and covers the whole semester
Drill the bias-variance decomposition and the log-transform (Jensen) bias result, because both appear as multiple-choice traps
Practise reading residual plots to judge whether a model is correctly specified
Rehearse hand-written Python: a grid search, NumPy dot products, and a fit/predict loop
Be able to derive and explain the least-squares solution and why logistic regression needs gradient descent
Sit full timed past-style papers across all three sections under exam conditions

The mistakes that cost marks

Treating the report as the whole grade. The 30% report cannot save a weak exam, because 70% of the mark is two closed-book pen-and-paper exams.

One note sheet for both exams. The mid-semester allows a single-sided A4 sheet but the final allows a double-sided sheet. Prepare two different sheets.

Assuming the log-linear prediction is unbiased. Under log(y) = Xb + e, the naive prediction exp of the fitted values is downward biased by Jensen's inequality, not unbiased. This is a classic multiple-choice trap.

Picking the model with the lowest training error. Lowest training MSE does not mean the best model. The optimal model minimises expected prediction error, not training error.

Using outside notation or methods. You must show all working in short answer and use the course's own notation. Outside notation or outside methods can lose marks.

Build a study plan with Sia → Drill the back-half topics →

Teaching team

Who teaches BUSS6002

The bios below are factual. The star ratings are not ours: they are impressions from students who have taken the unit, so you can hear from people who sat in the lectures.

Unit coordinator

Dr Yuning Zhang

Lecturer in the Discipline of Business Analytics and lead of the BUSS6002 teaching team. Researches Bayesian inference in actuarial and insurance analytics. Staff profile

Student ratingNo student ratings yet

Add your review →

Lecturer (Business Analytics)

Dr Wilson Ye Chen

Senior Lecturer in the Discipline of Business Analytics, contributing the course materials and assessment. Researches efficient computational methods for Bayesian inference and statistical tools for financial time-series analysis. Staff profile

Student ratingNo student ratings yet

Add your review →

Lecturer (Marketing strand)

Dr Jiang Qian

Lecturer in the Discipline of Marketing. Researches quantitative models and machine-learning methods for marketing insight from large-scale structured and unstructured data. Staff profile

Student ratingNo student ratings yet

Add your review →

Lecturer (Business Information Systems strand)

Associate Professor Manoj A. Thomas

Associate Professor in the Discipline of Business Information Systems. Researches emerging technologies, data science and social computing applied to public health, education and e-commerce. Staff profile

Student ratingNo student ratings yet

Add your review →

Teaching team as listed in the unit materials reviewed. AskSia does not rate lecturers; star ratings are submitted by students who have taken BUSS6002.

Formula & concept sheet

The vocabulary and formulas you must own

Exploratory data analysis (EDA): Discovering patterns, spotting anomalies and checking assumptions using summary statistics and graphics such as histograms, boxplots, Q-Q plots and scatter plots.
Four sample moments: Mean, variance, skewness and kurtosis, the standard numerical summary of a variable's distribution (location, spread, symmetry, tail behaviour).
Sample quantile: The value below which a given fraction of the sample falls; underlies the boxplot, the quartiles and the percentiles.
Inner product: The sum over i of x_i times y_i, the dot product of two vectors; the basis for projections, norms and correlation.
2-norm: The square root of the inner product of a vector with itself, the Euclidean length of the vector; the distance metric used in k-means clustering.
k-means: An unsupervised clustering algorithm that minimises the within-cluster sum of squares by iterating: assign each point to its nearest centroid, then recompute centroids, until convergence.
Least-squares estimator: The analytic solution for the coefficients of a linear regression model, given by (X transpose X) inverse times X transpose y.
Residual diagnostics: Checking model assumptions via residuals: zero conditional mean and constant variance; a patterned residual plot signals misspecification.

Log-transform back-transform bias: Under log(y) = Xb + e, the naive prediction exp(Xb-hat) is downward biased by Jensen's inequality, so a bias-correction factor is needed.
Logistic regression: A probabilistic classifier modelling the probability of the positive class via the logit link, fitted by maximising the Bernoulli likelihood (cross-entropy), with no closed form.
Confusion matrix: The two-by-two table of true positives, false positives, false negatives and true negatives, from which accuracy, recall, specificity, precision and F1 are computed.
Bias-variance tradeoff: Expected prediction error decomposes into irreducible error plus bias squared plus variance; the optimal model balances bias against variance, not training error.
Expected prediction error: The expected squared difference between the outcome and the prediction, the quantity the optimal model minimises; validation MSE is its empirical approximation.
Maximum likelihood estimation (MLE): Choosing parameters that maximise the log-likelihood; for a Gaussian linear model the MLE coincides with least squares.
Gradient descent: An iterative optimiser that updates the parameters in the direction of the negative gradient, repeated to convergence; the step size too large or too small fails. Used where no closed form exists, as in logistic regression.
Wide vs tall data: Wide data has many features relative to rows; tall data has many rows relative to features. Each calls for different big-data strategies (dimension reduction vs sampling or distributed compute).

Common acronyms: EDA · IQR · MLE · MSE · EPE · TPR · FPR · TNR · ISLR · CRISP-DM.

Drill these as flashcards → Map them with Sia →

What students say

What students actually say about BUSS6002

Recurring themes from student reviews, paraphrased in our own words.

How students revise

A revision-heavy unit where consolidated notes are sought after
High demand for mid-semester and exam summaries against a large enrolled cohort

Make your own notes and flashcards →

Before the exams

Students actively share past assignment guidelines and practice or exam materials

Get instant walkthroughs →

Recurring student opinions, paraphrased and aggregated, not official course information.

Set texts

The prescribed reading

The syllabus references map straight onto these.

Reference · regression, classification, model selection

An Introduction to Statistical Learning (ISLR)

James, G., Witten, D., Hastie, T. & Tibshirani, R. Publisher page

Reference · clustering

Pattern Recognition and Machine Learning, Ch 9.1 (k-means)

Bishop, C. M.

Reference · EDA

NIST/SEMATECH e-Handbook of Statistical Methods

NIST/SEMATECH. Publisher page

Where it fits

Prerequisites, related units & why it matters

No formal prerequisites. Corequisite: QBUS5001 or QBUS5002. Assumed knowledge: basic mathematics including probability, linear algebra and calculus. The unit starts from Python basics, so prior Python is helpful but not assumed in detail.

QBUS5001Foundations of Data Analytics for BusinessCorequisite analytics foundation taken alongside BUSS6002 BUSS1020Quantitative Business AnalysisUndergraduate quantitative-methods grounding this unit assum DATA3404Scalable Data ManagementComplementary data-engineering and scalable-storage angle ExploreAll Business & Economics unitsUSyd discipline hub

Why it matters beyond the grade. BUSS6002 turns a business student into someone who can both run a model in Python and explain why it works. The matrix algebra, regression inference, bias-variance and MLE you build here are the maths you carry into later quantitative units and into analytics, data-science and marketing-analytics roles.

Your BUSS6002 study toolkit

Study the unit with Sia, not just read about it

Each tool already knows BUSS6002: your syllabus, your texts, and where the marks are. Grouped by how you study, from first contact to exam week.

1 · Learn itunderstand the material

💬AI tutorAsk anything about BUSS6002 and get step-by-step answers. 📤Explain my notesUpload your slides or lecture and Sia breaks them down. 📑Topic summariserCondense a week into the essentials you actually need.

2 · Practise ittest yourself

📝Practice quiz generatorUnlimited exam-style MCQs and short-answer on any topic. 📊Past paper analysisSee what the exams actually test, topic by topic. ✓Assignment & problem helpWork through problem sets one step at a time.

3 · Revise & cramlock it in before the exam

🃏FlashcardsKey concepts and formulas as a spaced-repetition deck. 📋Cheatsheet maker NEWAuto-build a one-page exam cheatsheet for the unit. 🧠Mindmap generatorSee how the topics connect on one visual map.

4 · Discuss itcompare notes

👥Community Q&AAsk other BUSS6002 students and share what worked.

FAQ

Frequently asked questions

How is BUSS6002 assessed?

Three pieces: a mid-semester exam (25%, covers Weeks 1 to 6), an individual Python report (30%, released around Week 10), and a final exam (45%, covers the whole semester). You pass on 50% overall with no component hurdle.

Are the exams open-book?

No. Both exams are closed-book, in-person and pen-and-paper. You may bring a non-programmable calculator and one A4 handwritten note sheet, single-sided for the mid-semester and double-sided for the final, plus a physical translation dictionary.

Do I need to know Python beforehand?

It helps but is not assumed in detail, because the unit starts from Python basics. The cohort is mixed, from confident coders to first-timers. What is assumed is basic probability, linear algebra and calculus, and you will write Python by hand in both exams.

How much maths is there?

A lot, and you do it by hand: vectors and norms, matrix algebra, the least-squares solution, regression inference, the bias-variance decomposition, and maximum likelihood estimation with gradient descent. Every formula is expressible in standard notation, but you are expected to derive and apply it without a computer.

What is the hardest part?

The back half, classification, model selection and MLE optimisation, combined with the fact that the 45% final tests everything at once on paper. Bias-variance, the log-transform (Jensen) bias result, and gradient descent are common stumbling points and frequent exam material.

What should I put on my A4 sheet?

Every formula in the course's own notation: the four moments, the inner product and 2-norm, the k-means update, the least-squares solution with its assumptions, the confusion-matrix metrics, the bias-variance decomposition, and the gradient-descent update rule. Build it from Week 1 and make two versions, single-sided for the mid-semester and double-sided for the final.

Study BUSS6002 with Sia

Work through knowledge discovery, linear algebra i, clustering and the rest of the unit with a tutor that knows it and quizzes you on the topics the assessments weight most heavily.

Start studying with Sia