BUS STATS Cheatsheet - Business statistics — distributions, hypothesis testing, regression

BUS STATS · Business statistics — distributions, hypothesis testing, regression

Midterm & Final Reference · Ultra-Dense A4
Generated by AskSia.ai — graphs, formulas, traps

① DESCRIPTIVE STATISTICS ↗ TAP

Center, spread, shape

Measure	Formula	Notes
Mean	Σx / n	sensitive to outliers
Median	middle value	robust
Mode	most frequent	can be 0, 1, or many
Variance s²	Σ(x − x̄)² / (n − 1)	units²
Std dev s	√s²	same units as data
IQR	Q3 − Q1	middle 50%, robust spread
Coefficient of variation	s / x̄	compare across scales

Distribution shape

Skew

Right-skewed (positive): mean > median (income, sales). Left-skewed: mean < median (test scores near max).

Kurtosis

Heavy tails (excess kurtosis > 0) — outliers more frequent than normal. Stock returns are notoriously fat-tailed.

Empirical rule (normal): 68% within 1σ, 95% within 2σ, 99.7% within 3σ

Outliers: commonly defined as > 1.5·IQR beyond Q1/Q3. Don't blindly delete — investigate first. Outliers can be data errors OR meaningful signals.

⚡ EXAM TRAP — MEAN OF SKEWED DATA MISLEADS

For income data, the mean is much higher than the median because of a few extreme earners. Reporting 'average' income hides the typical experience. Use median for skewed data; use mean only when distribution is roughly symmetric.

⑥ CONFIDENCE INTERVALS ↗ TAP

The CI formula

CI = point estimate ± (critical value) × (standard error)

Parameter	Estimator	SE formula
μ (σ known)	x̄	σ/√n
μ (σ unknown)	x̄	s/√n (use t)
p	p̂	√(p̂(1−p̂)/n)
μ₁ − μ₂	x̄₁ − x̄₂	√(s₁²/n₁ + s₂²/n₂)

Critical values

Common z*

90% → 1.645. 95% → 1.96. 99% → 2.576. (For two-sided.)

t-distribution

Heavier tails than z, depends on df = n − 1. Approaches z as n → ∞ (n > 30 typically OK to use z).

Margin of Error: ME = z* · SE Width: 2·ME

What CI means: if we repeated the sampling process, 95% of constructed CIs would contain the true parameter. NOT 'there's a 95% chance the parameter is in this CI' (true parameter is fixed, not random).

Sample size for desired ME: n = (z*·σ/ME)². Halving ME requires 4× the sample size.

⚡ EXAM TRAP — CI INTERPRETATION

'95% CI: [40, 60].' Don't say '95% probability the mean is between 40 and 60.' Correct: 'we are 95% confident the interval [40, 60] captures the true mean' — emphasizing the procedure's reliability, not a probability of the parameter.

② PROBABILITY ↗ TAP

Foundations

0 ≤ P(A) ≤ 1 P(S) = 1 P(A^c) = 1 − P(A)

Rule	Formula	When
Addition	P(A∪B) = P(A) + P(B) − P(A∩B)	any two events
Multiplication	P(A∩B) = P(A)·P(B\|A)	any two events
Independence	P(A∩B) = P(A)·P(B)	only if independent
Conditional	P(A\|B) = P(A∩B)/P(B)	given B occurred

Bayes: P(A|B) = P(B|A)·P(A) / P(B)

Mutually exclusive

Events can't both happen. P(A∩B) = 0. NOT the same as independent. Mutually exclusive events are NEVER independent (unless one has prob 0).

Counting

Permutations: nPr = n!/(n−r)! (order matters). Combinations: nCr = n!/(r!(n−r)!) (order doesn't).

Total probability: P(B) = Σ P(B|A_i)·P(A_i). The 'flip Bayes around' formula. Used heavily in business: P(default | applied) using credit segments.

⚡ EXAM TRAP — INDEPENDENCE vs MUTUAL EXCLUSIVITY

Mutually exclusive: P(A∩B) = 0. Independent: P(A∩B) = P(A)·P(B). The two are opposites in a sense — knowing one event happened gives info about the other (rules out it). Don't conflate.

⑤ HYPOTHESIS TESTING ↗ TAP

The 5-step framework

▼ HYPOTHESIS TEST

1. State H₀ (status quo) and H_a (claim).

2. Pick α (typically 0.05). Identify test statistic + distribution.

3. Calculate test statistic from data.

4. Compare to critical value OR compute p-value.

5. Decision: reject H₀ if p < α.

Test	Use when	Statistic
z (one mean)	σ known, n large	(x̄ − μ₀)/(σ/√n)
t (one mean)	σ unknown	(x̄ − μ₀)/(s/√n)
z (proportion)	np₀(1−p₀) ≥ 10	(p̂ − p₀)/√(p₀(1−p₀)/n)
2-sample t	compare 2 means	(x̄₁ − x̄₂)/SE
χ²	fit to expected freqs	Σ(O − E)²/E

Type I vs Type II

Type I (α): reject H₀ when true (false positive). Type II (β): fail to reject H₀ when false (false negative). Power = 1 − β.

p-value definition

Probability of seeing data this extreme (or more) if H₀ is true. Small p = data unlikely under H₀ → reject.

Statistical vs practical significance: with huge n, even tiny effects become statistically significant. Always report effect size, not just p-value.

⚡ EXAM TRAP — p-VALUE IS NOT P(H₀ TRUE)

p-value = P(data | H₀ true). It is NOT P(H₀ true | data). They're conceptually different. The whole frequentist framework treats H₀ as fixed (either true or false), not random.

⑦ REGRESSION ↗ TAP

Simple linear regression

ŷ = b₀ + b₁ xb₁ = r · (s_y / s_x) b₀ = ȳ − b₁ x̄

Quantity	What it tells
r (correlation)	strength + direction (−1 to 1)
R² (coefficient of det)	fraction of variance explained
SSE / RMSE	residual error
p-value of slope	is b₁ significantly ≠ 0?

Assumptions (LINE)

L · I · N · E

Linearity. Independence of errors. Normality of errors. Equal variance (homoscedasticity).

Residual analysis

Plot residuals vs fitted. Should be a random cloud. Patterns indicate non-linearity or heteroscedasticity.

Multiple regression: ŷ = b₀ + b₁x₁ + b₂x₂ + … Each b_i is the change in y per unit x_i, holding others constant. Multicollinearity inflates SE of coefficients.

R² = 1 − SS_res / SS_tot Adj R² penalizes adding variables

Predicting: for prediction of new observation, use prediction interval (wider than CI for mean response).

⚡ EXAM TRAP — R² ≠ MODEL VALIDITY

High R² doesn't mean the model is right. Always check residual plots for non-linearity, outliers, heteroscedasticity. R² of 0.95 with patterned residuals is misleading; R² of 0.6 with random residuals is more trustworthy.

③ DISTRIBUTIONS ↗ TAP

Discrete distributions

Dist	Use	Mean	Variance
Bernoulli	single yes/no	p	p(1−p)
Binomial	n trials, k successes	np	np(1−p)
Poisson	rare events / time	λ	λ

Binomial: P(X = k) = C(n,k) · p^k · (1−p)^(n−k)Poisson: P(X = k) = e^(−λ) · λ^k / k!

Continuous distributions

Normal N(μ, σ²)

Bell curve. Standard normal Z has μ=0, σ=1. Use z-table or technology to find probabilities.

Exponential

Time between events. P(X > t) = e^(−λt). Memoryless property.

Z = (X − μ) / σ standardize to use z-table

Empirical rule: for normal data, ~68% within 1σ, 95% within 2σ, 99.7% within 3σ. The 'six sigma' standard is ~3.4 defects per million.

⚡ EXAM TRAP — DISCRETE vs CONTINUOUS

For continuous distributions, P(X = single value) = 0. Only intervals have non-zero probability: P(a < X < b). Don't compute exact-equality probabilities for normal distributions — that's for discrete (binomial, Poisson).

④ SAMPLING & CLT ↗ TAP

Sampling distribution of x̄

μ_x̄ = μ σ_x̄ = σ / √n (standard error of mean)

The sampling distribution describes how sample means vary across repeated samples. As n grows, σ_x̄ shrinks like 1/√n.

Central Limit Theorem

If sample size n is…	Sampling dist of x̄
any (population normal)	exactly normal
large (n ≥ 30)	approximately normal regardless of pop shape
small (n < 30, non-normal pop)	not necessarily normal

Why √n?

Variance of mean = σ²/n. Standard deviation = σ/√n. Quadrupling sample size halves the SE — diminishing returns.

For proportions

SE_p̂ = √(p(1−p)/n). CLT applies when np(1−p) ≥ 10. Use normal approx for binomial.

Z = (x̄ − μ) / (σ/√n) uses CLT for inference about μ

Sample size for desired SE: n = (σ/SE_target)². To halve SE, quadruple n. The marginal benefit of larger samples decreases rapidly.

⚡ EXAM TRAP — CLT IS ABOUT THE SAMPLING DISTRIBUTION

CLT says sample means are approximately normal — NOT that the original data are normal. A skewed distribution still has normal sample means with large n. Many students confuse 'data is normal' with 'sampling dist is normal.'

⑧ DECISION BOX — WHICH TEST? ↗ TAP

Match question to method

Question says…	Use § from	Approach
'find mean, median, SD'	§ ①	direct formulas; n−1 for sample variance
'is data skewed?'	§ ①	compare mean vs median; right if mean > median
'P(A and B)'	§ ②	multiplication rule: P(A)·P(B\|A); = P(A)·P(B) if independent
'P(A or B)'	§ ②	addition: P(A) + P(B) − P(A∩B)
'P(disease \| positive test)'	§ ②	Bayes' theorem
'binomial probability of k'	§ ③	C(n,k)·p^k·(1-p)^(n-k)
'normal probability'	§ ③	Z = (X−μ)/σ; z-table or 68-95-99.7
'rare events per time'	§ ③	Poisson: λ^k·e^(-λ)/k!
'sample mean distribution'	§ ④	CLT: N(μ, σ/√n)
'sample size for SE'	§ ④	n = (σ/SE_target)²
'test μ = a vs μ ≠ a'	§ ⑤	z or t test (depends on σ known/unknown)
'compare 2 group means'	§ ⑤	2-sample t-test; pooled if equal var
'test proportion'	§ ⑤	z-test for p with √(p₀(1-p₀)/n)
'p-value < α?'	§ ⑤	reject H₀ if smaller
'95% CI'	§ ⑥	estimate ± z·SE (or t·SE if σ unknown)
'CI for proportion'	§ ⑥	p̂ ± z*·√(p̂(1−p̂)/n)
'CI for difference of means'	§ ⑥	(x̄₁−x̄₂) ± t*·SE_diff
'predict y from x'	§ ⑦	linear regression; ŷ = b₀ + b₁x
'R² interpretation'	§ ⑦	fraction of variance explained
'is slope significant?'	§ ⑦	t-test on b₁; p-value
'multiple regression'	§ ⑦	each b_i: change in y per unit x_i, holding others constant

Always-on checklist

(1) Sample or population? (2) Random sampling assumed? (3) Variable type: numeric or categorical? (4) Independent vs paired? Each affects which test to use.

Effect size matters

Don't just report p-value. Cohen's d for mean differences, r² for regression. With huge n, even tiny effects significant — but practically meaningless.

⚡ EXAM TRAP — STAT vs PRACTICAL SIGNIFICANCE

'p < 0.001' impresses on tests, but the effect size matters too. A diet program that 'significantly' loses 0.1 kg in n=10000 isn't useful. Always pair p-value with magnitude.

⚡ FINAL EXAM TRAP — CORRELATION ≠ CAUSATION

Strong correlation: associative, not causal. To prove causation, need controlled experiment with random assignment. Observational data alone allows multiple causal explanations.

Saved your GPA? Send this to your study group.

Want one for YOUR exact syllabus?