University of Sydney · S1 2026 · FACULTY OF BUSINESS & ECONOMICS

ECMT1010 · Introduction To Economic Statistics

- one subject, every graph, every model, every mark
50% final exam · hurdle14 Chapters7-page Bible
Our own words - no uploaded lecturer files
Built to mirror S1 2026 · updated this semester
Chapter 2 of 11 · ECMT1010

Describing Data: Centre, Spread & Shape

Week 2 is the toolkit for summarising one or two variables: centre (mean vs median and resistance to outliers), spread (SD, range, IQR and the five-number summary), shape (symmetric vs skewed histograms), location (z-scores) and association (the correlation r). It is examined as MCQ in the Week-7 test and as short-answer calculation — compute x̄ and s, build a five-number summary, flag outliers with the 1.5×IQR rule, and interpret a z-score and an r in plain English.

In this chapter

What this chapter covers

  • 011. Categorical vs quantitative variables, and choosing the right summary
  • 022. Histograms and shape: symmetric vs left/right-skewed; how skew pulls the mean
  • 033. Centre: mean x̄ vs median; the median is resistant to outliers, the mean is not
  • 044. Spread: standard deviation s, range, IQR and the five-number summary
  • 055. Boxplots and the 1.5×IQR rule for flagging outliers
  • 066. Standardisation: the z-score zᵢ = (xᵢ − x̄)/s as 'distance from the mean in SDs'
  • 077. The 95% rule: for bell-shaped data ~95% of values lie within x̄ ± 2s
  • 088. Correlation r: unit-free, −1 ≤ r ≤ 1, measures only linear association and is not resistant
Worked example · free

Mean, standard deviation and a z-score

Q [6 marks]. A small shop records daily sales (in $00s) over six days: 18, 26, 22, 14, 33, 21. Find the sample mean and sample standard deviation, and give the z-score of the best day.
  • 1 markCompute the mean: x̄ = (18 + 26 + 22 + 14 + 33 + 21)/6 = 134/6 ≈ 22.33 ($00s).
  • 2 marksFind the deviations from the mean: −4.33, 3.67, −0.33, −8.33, 10.67, −1.33; square and sum them: 18.7 + 13.4 + 0.11 + 69.4 + 113.8 + 1.78 ≈ 217.3.
  • 2 marksUse the sample divisor (n − 1) = 5: s² = 217.3/5 ≈ 43.46, so s = √43.46 ≈ 6.59 ($00s).
  • 1 markStandardise the best day (33): z = (33 − 22.33)/6.59 ≈ 10.67/6.59 ≈ +1.62 — the best day is about 1.6 SD above the mean.
x̄ ≈ 22.33 ($00s), s ≈ 6.59 ($00s); the best day's z-score is about +1.62, i.e. roughly 1.6 standard deviations above average and inside the 95% (±2s) band.
Sia tip — Always divide by (n − 1) for a sample standard deviation, not n — that is the single most common arithmetic mark-loser here. A z-score has no units and tells you how unusual a value is; |z| above 2 starts to look like an outlier under the 95% rule.
Glossary

Key terms

Mean vs median
The mean is the arithmetic average (Σxᵢ)/n; the median is the middle of the ordered data. The median is resistant to outliers, while the mean is pulled toward the long tail of a skewed distribution.
Standard deviation (s)
A measure of spread, s = √[Σ(xᵢ − x̄)²/(n − 1)] for a sample. It is the typical distance of a value from the mean and uses the (n − 1) divisor for sample data.
Five-number summary & IQR
The five-number summary is {min, Q1, median, Q3, max}; the interquartile range IQR = Q3 − Q1 captures the middle 50% of the data and is the basis of the boxplot and the outlier rule.
1.5×IQR outlier rule
A value is flagged as an outlier if it is below Q1 − 1.5·IQR or above Q3 + 1.5·IQR. It is the standard rule used to draw whiskers and points on a boxplot.
z-score
The standardised value zᵢ = (xᵢ − x̄)/s, the distance of a value from the mean measured in standard deviations. Standardising a dataset gives it mean 0 and SD 1, making different variables comparable.
Correlation (r)
A unit-free measure of the strength and direction of a linear association, with −1 ≤ r ≤ 1. It is symmetric in x and y, is not resistant to outliers, and r = 0 means no linear association only — not no relationship and not no causation.
FAQ

Describing Data: Centre, Spread & Shape FAQ

When should I report the median instead of the mean?

Use the median when the data are skewed or contain outliers, because it is resistant — it ignores how extreme the tail values are and reports the true middle. The mean is pulled toward the long tail, so for skewed data like incomes or house prices the mean overstates the 'typical' value. For roughly symmetric data the mean and median are close and the mean is fine.

Why do I divide by n − 1 and not n for the standard deviation?

Because you are computing a sample standard deviation that estimates the population SD. Dividing by (n − 1) instead of n corrects for the fact that the deviations are taken around the sample mean rather than the true mean, which slightly underestimates spread; (n − 1) makes the estimator unbiased. In ECMT1010 you almost always have a sample, so use (n − 1).

What does the 95% rule say and when does it apply?

For roughly bell-shaped (symmetric, unimodal) data, about 68% of values lie within one SD of the mean, about 95% within two SDs (x̄ ± 2s), and about 99.7% within three SDs. It lets you judge quickly whether a value is unusual: anything beyond ±2s is in the outer 5%. It only applies to bell-shaped data, so check the histogram first.

What does r = 0 actually mean?

It means there is no linear association between the two variables — but there could still be a strong non-linear (for example U-shaped) relationship that r cannot detect. Always plot the scatterplot first. Also remember r is not resistant, so a single outlier can drag it up or down, and a strong r still never proves causation.

Study strategy

Exam move

Build a fixed routine for any single dataset: order the values, write the five-number summary, compute x̄ and s (with the n − 1 divisor), then apply the 1.5×IQR rule before you trust the mean. Practise reading shape off a histogram and predicting whether the mean sits above or below the median from the direction of the skew — examiners love this conceptual MCQ. Learn the z-score as a portable 'how unusual' ruler and the 95% rule as its companion. For association questions, internalise the four facts about r — unit-free, bounded by ±1, symmetric, not resistant, linear-only — and always pair an r value with a one-sentence plain-English interpretation, because the interpretation earns marks a bare number does not.

A+Everything unlocked
Unlocks this Bible + all 191 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.
Sia - your ECMT1010 tutor, unlimited, worked the way the exam marks it
The full 7-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full ECMT1010 Bible + 191 University of Sydney subjects解锁完整 ECMT1010 Bible + University of Sydney 191 门科目
$25/mo