University of Sydney · S1 2026 · FACULTY OF BUSINESS & ECONOMICS

QBUS5001 · Foundation In Data Analytics For Business

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters7-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 2 of 11 · QBUS5001

Descriptive Statistics & Association

Module 1 builds the descriptive toolkit: measures of centre (mean, median, mode), measures of spread (range, variance, standard deviation, IQR, coefficient of variation) and measures of association (covariance and correlation). The headline idea is that covariance tells you the direction of a linear relationship but its size is scale-dependent, whereas correlation standardises it onto [−1, 1] so you can read off strength.

These statistics are the raw materials for everything later: standard error uses the SD, regression uses covariance, and confidence intervals quote the sample mean. Excel functions CORREL and COVARIANCE.S do the arithmetic.

In this chapter

What this chapter covers

01Mean: population μ vs sample x̄
02Median, mode and when each is preferred
03Variance and standard deviation (note the n−1 divisor for samples)
04Range and interquartile range (IQR = Q₃ − Q₁)
05Coefficient of variation: CV = s/x̄ for unit-free comparison
06Sample covariance and its direction-only meaning
07Sample correlation r and the [−1, 1] scale
08Scale-dependence: why correlation standardises covariance

Worked example · free

Standard deviation, CV and correlation reasoning

Q [5 marks]. Two delivery routes are timed (minutes per trip). Route A has mean 30 with sample SD 6; Route B has mean 80 with sample SD 10. (a) Which route is relatively more variable? (b) If the sample covariance between trip time and distance is positive but small in raw units, what does the correlation coefficient add that covariance does not?

1 markCompute the coefficient of variation for each route: CV(A) = 6/30 = 0.20 (20%).
1 markCV(B) = 10/80 = 0.125 (12.5%).
1 markCompare: although Route B has the larger absolute SD (10 > 6), Route A is relatively more variable because its CV (20%) exceeds B's (12.5%).
1 mark(b) Covariance only signals direction (here, positive: longer distance tends to mean longer time) and its magnitude depends on the units (minutes × km).
1 markCorrelation r divides covariance by the product of the two SDs, giving a unit-free number in [−1, 1] that measures the strength of the linear relationship comparably across datasets.

Route A is relatively more variable (CV 20% vs 12.5%) despite a smaller absolute SD. Correlation adds a standardised, unit-free measure of strength on [−1, 1], whereas covariance only gives a scale-dependent direction.

Sia tip — Use CV whenever the question compares variability across series with different means or units — raw SD alone can mislead, exactly as it does here.

Glossary

Key terms

Coefficient of variation (CV): CV = s/x̄ (or σ/μ), a unit-free measure of relative spread used to compare variability across datasets with different means or units.
Sample covariance: s(X,Y) = (1/(n−1))Σ(xᵢ−x̄)(yᵢ−ȳ), measuring the direction of a linear relationship; its magnitude depends on the variables' units.
Sample correlation (r): r = s(X,Y)/(sₓ s_y), a standardised covariance lying in [−1, 1]: near +1 strong positive, near −1 strong negative, near 0 no linear relationship.
Interquartile range (IQR): IQR = Q₃ − Q₁, the spread of the middle 50% of the data; robust to outliers and used to flag them in a box-and-whisker plot.
Sample variance (s²): s² = (1/(n−1))Σ(xᵢ−x̄)²; the n−1 divisor (Bessel's correction) makes it an unbiased estimator of the population variance σ².

FAQ

Descriptive Statistics & Association FAQ

Why divide by n−1 instead of n for the sample variance?

Because the sample mean is itself estimated from the data, dividing by n−1 (the degrees of freedom) corrects the downward bias and makes s² an unbiased estimator of σ². Population variance, where μ is known, divides by N.

When should I report correlation rather than covariance?

Whenever you want to judge the strength of a relationship or compare relationships across different variable pairs. Covariance is scale-dependent, so its size means nothing on its own; correlation is bounded in [−1, 1].

Does a correlation near zero mean the variables are unrelated?

It means no linear relationship. Two variables can have r near 0 yet a strong non-linear (e.g. U-shaped) relationship, which a scatter plot would reveal. Correlation only captures the linear component.

Study strategy

Exam move

Compute every descriptive measure by hand on one small dataset, then reproduce it with Excel (AVERAGE, STDEV.S, CORREL, COVARIANCE.S) so you trust both routes under exam time. Internalise the slogan covariance = direction, correlation = strength, because regression in Module 10 reuses exactly the covariance-over-variance structure for the slope.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 203 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your QBUS5001 tutor, unlimited, worked the way the exam marks it

The full 7-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works