University of Sydney · S1 2026 · FACULTY OF BUSINESS & ECONOMICS

BUSS6002 · Data Science In Business

- one subject, every graph, every model, every mark

50% final exam · hurdle11 Chapters120-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

The Complete Exam Bible · S2 2026

Data Science in Business

— one subject, every model, every framework, every mark

BUSS6002 Data Science in Business is a University of Sydney postgraduate unit that deliberately weaves together business analytics, marketing and information systems into one data-science spine, taking you from the business framing of data through the statistical and linear-algebra machinery behind clustering, regression, classification and model selection — with hands-on Python. Across eleven examinable chapters you cover big-data thinking and the CRISP-DM knowledge-discovery process, EDA, vectors and matrices, K-means segmentation, linear and logistic regression, feature engineering, data ethics, the bias–variance tradeoff, maximum likelihood and gradient descent, and big-data scalability. The catch is that it looks like a business unit but is examined like a quantitative-methods unit: a 25% mid-semester exam, a 30% coded assignment, and a 45% closed-restricted final with hand-written Python and short-answer derivations. The exam rewards three things — doing the maths by hand in the course's exact notation, diagnosing residual plots and selecting models via bias–variance, and writing runnable NumPy/sklearn from memory — while still paying out the easy framework marks that under-prepared students drop. A result of 65 (not just the 50% pass) is the real bar for anyone continuing to QBUS6810/6840.

BUSS6002 · University of Sydney

Contents · the whole subject, one map

What BUSS6002 covers

BUSS6002 runs as eleven examinable chapters that build from the business framing of data through the linear-algebra and statistical machinery that the closed-book exams actually test — here is the full map.

01Foundations of Data Science in BusinessHistory of data in marketing · big vs small data · data as a decision tool · data-science perspectives 02Organisational Data & Analytical CapabilitiesData lifecycle · learning loops · DS teams · decision environments · analytical-capability types · data types & dataset structure 03Knowledge Discovery & Exploratory Data AnalysisKDDA · CRISP-DM · Snail Shell process · EDA techniques · data quality — missing data & outliers 04Linear Algebra I: Vectors & Scientific ComputingVectors · vector operations · geometric intuition · NumPy scientific computing 05Clustering & Customer SegmentationUnsupervised learning · K-means algorithm · distance/similarity · customer segmentation application 06Matrices & Linear RegressionMatrix operations · OLS in matrix form · interpretation · regression diagnostics · scientific computing II 07Feature Engineering & Data EthicsEthical & legal issues · responsible AI · privacy/fairness · feature transformations · unstructured/text data 08Classification & Logistic RegressionClassifiers · evaluation metrics · logistic regression · log-odds & sigmoid · predictive analytics in marketing 09Model Evaluation & SelectionMeasuring marketing-campaign success · bias–variance tradeoff · validation MSE · selecting the optimal model 10Maximum Likelihood & OptimisationMaximum Likelihood Estimation · log-likelihood · analytic vs iterative solutions · gradient descent 11Big Data SolutionsWide vs tall data · algorithm complexity · scalable computational approaches

Assessment

How BUSS6002 is assessed

Component	Weight	Format
Mid-semester exam	25%	In-person pen-and-paper, 1 hour, Weeks 1–6; MCQ 12 + Short answer 15 + Python code 8; 1 A4 single-sided handwritten note allowed
Individual assignment	30%	Individual Python/sklearn coding task + written report; GenAI reflection required if AI used
Final exam	45%	In-person pen-and-paper, 2h + 10min reading, whole semester; MCQ 12 + Short answer 27 + Python code 6 = 45 marks; 1 A4 double-sided handwritten note allowed
Pass requirement	Hurdle	Must obtain at least 50% overall; no component hurdle ('There are no other requirements')

Worked example · free

Logistic regression: predicted probability and the decision threshold

Q [4 marks]. A telecom firm fits a logistic regression for whether a customer will churn next quarter (y = 1) versus stay (y = 0): P(y = 1 | x) = 1 / (1 + exp[−(β₀ + β₁x₁ + β₂x₂)]), with β₀ = −2.0, β₁ = 2.0 and β₂ = 1.0. Here x₁ is the standardised months since last engagement and x₂ is the number of unresolved support tickets. For a customer with x₁ = 1.5 and x₂ = 1, compute the predicted probability of churn and state the predicted class under the default decision threshold τ = 0.5.

+1State the model and compute the linear predictor (the log-odds): η = β₀ + β₁x₁ + β₂x₂ = −2.0 + (2.0)(1.5) + (1.0)(1) = −2.0 + 3.0 + 1.0 = 2.0. This score lives on the log-odds scale, not the probability scale.
+1Map the log-odds to a probability with the sigmoid (the inverse-logit activation): P̂(y = 1 | x) = 1 / (1 + exp(−η)) = 1 / (1 + exp(−2.0)).
+1Evaluate numerically: exp(−2.0) ≈ 0.135, so P̂(y = 1 | x) = 1 / (1 + 0.135) = 1 / 1.135 ≈ 0.88.
+1Apply the default threshold τ = 0.5: because 0.88 > 0.5, classify the customer as y = 1 (predicted to churn). Equivalently, a positive log-odds (η > 0) always gives P̂ > 0.5.

P̂(y = 1 | x) ≈ 0.88, so the customer is classified as a churn risk (ŷ = 1).

Sia tip — Always compute the log-odds η first, then push it through the sigmoid — never plug raw xᵀβ in as if it were already a probability. Sign check: positive log-odds ⇒ P̂ > 0.5, the more positive the closer to 1. Memorise exp(−2.0) ≈ 0.135 for the calculator-restricted exam.

Glossary

Key terms

Big data (4 V's): Data characterised by Volume, Variety, Velocity and Veracity, in contrast to deliberately collected small data; the unit stresses that value comes from analysis, not from size alone.
CRISP-DM: The canonical knowledge-discovery process model with six phases: Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation and Deployment. Knowing the order and purpose of each phase is a recurring exam ask.
Exploratory Data Analysis (EDA): Iteratively summarising and visualising data (sample moments, quantiles, histograms, box-plots, Q-Q plots) to find patterns, spot anomalies and test assumptions before modelling.
Inner product and Euclidean norm: The dot product ⟨x, y⟩ = Σ xᵢyᵢ = xᵀy returns a scalar; the 2-norm ‖x‖₂ = √⟨x, x⟩ measures length. Distance between points is ‖x − y‖₂ — the building block for K-means and OLS.
K-means clustering: An unsupervised algorithm that partitions points into k groups by alternating an assignment step (nearest centroid) and an update step (recompute centroid means), minimising within-cluster sum of squares. It converges to a local, not global, optimum.
OLS estimator: The ordinary-least-squares coefficient vector in matrix form, β̂ = (XᵀX)⁻¹Xᵀy — the closed-form minimiser of the residual sum of squares for linear regression.
Logistic regression (sigmoid and logit): A GLM for a Bernoulli response: the logit link sets log(p/(1−p)) = xᵀβ, and the sigmoid activation p = 1/(1 + exp(−xᵀβ)) maps the log-odds back to a probability. Fitted by maximum likelihood, not least squares.
Confusion matrix and metrics: The 2×2 table of TP, FP, FN, TN from which accuracy, precision, recall (TPR), specificity (TNR), FPR and F1 are computed. Accuracy is misleading under class imbalance.
Bias–variance tradeoff: Expected prediction error decomposes as irreducible σ² + Bias² + Variance. Flexible models cut bias but raise variance (overfit); simple models do the reverse (underfit). Model selection balances the two.
Validation MSE: The mean squared error on a held-out validation set, used as the practical estimate of expected prediction error to choose model complexity; the test set gives the final unbiased estimate. Training MSE is never a selection criterion.
Maximum likelihood and gradient descent: MLE chooses parameters that maximise the (log-)likelihood of the observed data. When there is no closed form (e.g. logistic regression), gradient descent iterates θ_{k+1} = θ_k − α∇f(θ_k) to minimise the negative log-likelihood.
Wide vs tall data and Big-O: Wide data has large p (many predictors) → reduce dimensions or simplify the model; tall data has large n (many observations) → block it or use stochastic gradient descent. Algorithmic cost is reasoned about with Big-O notation.

FAQ

BUSS6002 FAQ

How is BUSS6002 assessed?

Three components: a 25% in-person mid-semester exam (1 hour, covering Weeks 1–6), a 30% individual Python/sklearn assignment with a written report, and a 45% in-person final exam covering the whole semester. The pass requirement is 50% overall with no individual-component hurdle.

Is there a final exam?

Yes — a 45% in-person, pen-and-paper final (2 hours writing plus 10 minutes reading) over all weeks. It mixes 12 marks of multiple choice, 27 marks of short answer with full working, and 6 marks of hand-written Python code. You may bring one A4 double-sided handwritten note sheet, a non-programmable calculator and a dictionary.

What part of the course is hardest?

The trap is that BUSS6002 looks like a business unit but is examined like a quantitative-methods unit. Students over-focus on the linear algebra, OLS, MLE and gradient descent and then throw away the easy framework marks (CRISP-DM, decision environments, data ethics). The marquee tricky results are residual-plot diagnosis, the bias–variance tradeoff, and the log-linear retransformation bias.

How should I prepare for the exams?

Practise doing the maths by hand in the course's exact notation, since notation is graded. Drill the recurring question types — vector/matrix arithmetic, K-means assignment, OLS by hand, confusion-matrix metrics, logistic probabilities, MLE derivations and Big-O reasoning — and rehearse writing runnable NumPy/sklearn from memory. Build your A4 note sheet around the formulas you reach for most.

How much maths and coding does it really involve?

A lot of both. Expect vector and matrix algebra, OLS in matrix form, Euclidean distance, logistic log-odds and sigmoid, the bias–variance decomposition, maximum likelihood and gradient-descent update rules — plus hand-written Python (NumPy/sklearn) on the exam and a coded assignment.

What mark do I need to continue to QBUS6810 or QBUS6840?

BUSS6002 is a feeder unit: a result of 65 is required to progress to QBUS6810 and QBUS6840, so treat 65 — not just the 50% pass mark — as the real bar if you plan to continue in business analytics.

Is this guide official or affiliated with the University of Sydney?

No. This is an independent AskSia study resource for BUSS6002. It is not produced, endorsed by or affiliated with the University of Sydney; always confirm assessment details, dates and policies against your official Canvas unit outline.

Can I use generative AI in the assignment?

The assignment permits AI use but requires a generative-AI reflection if you used it. Follow the unit's declaration rules exactly, and remember the exams are closed-restricted and hand-written, so your underlying skills still have to be exam-ready.

Study strategy

How to study for the exam

Treat BUSS6002 as two courses sharing one exam. First, lock the framework chapters (Foundations, Organisational Data, Knowledge Discovery/EDA, and Feature Engineering & Data Ethics) early — they are the cheap, high-yield MCQ and short-answer marks that quant-anxious students leave on the table; memorise the 4 V's, the CRISP-DM phases, the four analytics types, the Cynefin contexts and the five ethics principles as crisp checklists. Second, build genuine fluency in the quantitative spine by hand: vector and matrix operations, the OLS estimator (XᵀX)⁻¹Xᵀy, K-means, the logistic log-odds/sigmoid, the bias–variance decomposition, validation MSE, MLE derivations and gradient descent, all written in the course's exact notation because notation is graded. Drill past-style questions until the recurring patterns (residual-plot diagnosis, retransformation bias, model-selection true/false, Big-O ranking) are automatic, rehearse writing runnable NumPy/sklearn from memory, and curate your single A4 note sheet around the formulas and one-line definitions you reach for most. Aim for 65, not 50, if you intend to progress to QBUS6810/6840.

A+Everything unlocked

Unlocks this Bible + all 203 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your BUSS6002 tutor, unlimited, worked the way the exam marks it

The full 120-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works