Generated by AskSia.ai — graphs, formulas, traps
| Measure | Formula | Notes |
|---|---|---|
| Mean | Σx / n | sensitive to outliers |
| Median | middle value | robust |
| Mode | most frequent | can be 0, 1, or many |
| Variance s² | Σ(x − x̄)² / (n − 1) | units² |
| Std dev s | √s² | same units as data |
| IQR | Q3 − Q1 | middle 50%, robust spread |
| Coefficient of variation | s / x̄ | compare across scales |
Empirical rule (normal): 68% within 1σ, 95% within 2σ, 99.7% within 3σOutliers: commonly defined as > 1.5·IQR beyond Q1/Q3. Don't blindly delete — investigate first. Outliers can be data errors OR meaningful signals.
For income data, the mean is much higher than the median because of a few extreme earners. Reporting 'average' income hides the typical experience. Use median for skewed data; use mean only when distribution is roughly symmetric.
CI = point estimate ± (critical value) × (standard error)| Parameter | Estimator | SE formula |
|---|---|---|
| μ (σ known) | x̄ | σ/√n |
| μ (σ unknown) | x̄ | s/√n (use t) |
| p | p̂ | √(p̂(1−p̂)/n) |
| μ₁ − μ₂ | x̄₁ − x̄₂ | √(s₁²/n₁ + s₂²/n₂) |
Margin of Error: ME = z* · SE Width: 2·MEWhat CI means: if we repeated the sampling process, 95% of constructed CIs would contain the true parameter. NOT 'there's a 95% chance the parameter is in this CI' (true parameter is fixed, not random).
Sample size for desired ME: n = (z*·σ/ME)². Halving ME requires 4× the sample size.
'95% CI: [40, 60].' Don't say '95% probability the mean is between 40 and 60.' Correct: 'we are 95% confident the interval [40, 60] captures the true mean' — emphasizing the procedure's reliability, not a probability of the parameter.
0 ≤ P(A) ≤ 1 P(S) = 1 P(A^c) = 1 − P(A)| Rule | Formula | When |
|---|---|---|
| Addition | P(A∪B) = P(A) + P(B) − P(A∩B) | any two events |
| Multiplication | P(A∩B) = P(A)·P(B|A) | any two events |
| Independence | P(A∩B) = P(A)·P(B) | only if independent |
| Conditional | P(A|B) = P(A∩B)/P(B) | given B occurred |
Bayes: P(A|B) = P(B|A)·P(A) / P(B)Total probability: P(B) = Σ P(B|A_i)·P(A_i). The 'flip Bayes around' formula. Used heavily in business: P(default | applied) using credit segments.
Mutually exclusive: P(A∩B) = 0. Independent: P(A∩B) = P(A)·P(B). The two are opposites in a sense — knowing one event happened gives info about the other (rules out it). Don't conflate.
1. State H₀ (status quo) and H_a (claim).
2. Pick α (typically 0.05). Identify test statistic + distribution.
3. Calculate test statistic from data.
4. Compare to critical value OR compute p-value.
5. Decision: reject H₀ if p < α.
| Test | Use when | Statistic |
|---|---|---|
| z (one mean) | σ known, n large | (x̄ − μ₀)/(σ/√n) |
| t (one mean) | σ unknown | (x̄ − μ₀)/(s/√n) |
| z (proportion) | np₀(1−p₀) ≥ 10 | (p̂ − p₀)/√(p₀(1−p₀)/n) |
| 2-sample t | compare 2 means | (x̄₁ − x̄₂)/SE |
| χ² | fit to expected freqs | Σ(O − E)²/E |
Statistical vs practical significance: with huge n, even tiny effects become statistically significant. Always report effect size, not just p-value.
p-value = P(data | H₀ true). It is NOT P(H₀ true | data). They're conceptually different. The whole frequentist framework treats H₀ as fixed (either true or false), not random.
ŷ = b₀ + b₁ xb₁ = r · (s_y / s_x) b₀ = ȳ − b₁ x̄| Quantity | What it tells |
|---|---|
| r (correlation) | strength + direction (−1 to 1) |
| R² (coefficient of det) | fraction of variance explained |
| SSE / RMSE | residual error |
| p-value of slope | is b₁ significantly ≠ 0? |
Multiple regression: ŷ = b₀ + b₁x₁ + b₂x₂ + … Each b_i is the change in y per unit x_i, holding others constant. Multicollinearity inflates SE of coefficients.
R² = 1 − SS_res / SS_tot Adj R² penalizes adding variablesPredicting: for prediction of new observation, use prediction interval (wider than CI for mean response).
High R² doesn't mean the model is right. Always check residual plots for non-linearity, outliers, heteroscedasticity. R² of 0.95 with patterned residuals is misleading; R² of 0.6 with random residuals is more trustworthy.
| Dist | Use | Mean | Variance |
|---|---|---|---|
| Bernoulli | single yes/no | p | p(1−p) |
| Binomial | n trials, k successes | np | np(1−p) |
| Poisson | rare events / time | λ | λ |
Binomial: P(X = k) = C(n,k) · p^k · (1−p)^(n−k)Poisson: P(X = k) = e^(−λ) · λ^k / k!Z = (X − μ) / σ standardize to use z-tableEmpirical rule: for normal data, ~68% within 1σ, 95% within 2σ, 99.7% within 3σ. The 'six sigma' standard is ~3.4 defects per million.
For continuous distributions, P(X = single value) = 0. Only intervals have non-zero probability: P(a < X < b). Don't compute exact-equality probabilities for normal distributions — that's for discrete (binomial, Poisson).
μ_x̄ = μ σ_x̄ = σ / √n (standard error of mean)The sampling distribution describes how sample means vary across repeated samples. As n grows, σ_x̄ shrinks like 1/√n.
| If sample size n is… | Sampling dist of x̄ |
|---|---|
| any (population normal) | exactly normal |
| large (n ≥ 30) | approximately normal regardless of pop shape |
| small (n < 30, non-normal pop) | not necessarily normal |
SE_p̂ = √(p(1−p)/n). CLT applies when np(1−p) ≥ 10. Use normal approx for binomial.Z = (x̄ − μ) / (σ/√n) uses CLT for inference about μSample size for desired SE: n = (σ/SE_target)². To halve SE, quadruple n. The marginal benefit of larger samples decreases rapidly.
CLT says sample means are approximately normal — NOT that the original data are normal. A skewed distribution still has normal sample means with large n. Many students confuse 'data is normal' with 'sampling dist is normal.'
| Question says… | Use § from | Approach |
|---|---|---|
| 'find mean, median, SD' | § ① | direct formulas; n−1 for sample variance |
| 'is data skewed?' | § ① | compare mean vs median; right if mean > median |
| 'P(A and B)' | § ② | multiplication rule: P(A)·P(B|A); = P(A)·P(B) if independent |
| 'P(A or B)' | § ② | addition: P(A) + P(B) − P(A∩B) |
| 'P(disease | positive test)' | § ② | Bayes' theorem |
| 'binomial probability of k' | § ③ | C(n,k)·p^k·(1-p)^(n-k) |
| 'normal probability' | § ③ | Z = (X−μ)/σ; z-table or 68-95-99.7 |
| 'rare events per time' | § ③ | Poisson: λ^k·e^(-λ)/k! |
| 'sample mean distribution' | § ④ | CLT: N(μ, σ/√n) |
| 'sample size for SE' | § ④ | n = (σ/SE_target)² |
| 'test μ = a vs μ ≠ a' | § ⑤ | z or t test (depends on σ known/unknown) |
| 'compare 2 group means' | § ⑤ | 2-sample t-test; pooled if equal var |
| 'test proportion' | § ⑤ | z-test for p with √(p₀(1-p₀)/n) |
| 'p-value < α?' | § ⑤ | reject H₀ if smaller |
| '95% CI' | § ⑥ | estimate ± z*·SE (or t*·SE if σ unknown) |
| 'CI for proportion' | § ⑥ | p̂ ± z*·√(p̂(1−p̂)/n) |
| 'CI for difference of means' | § ⑥ | (x̄₁−x̄₂) ± t*·SE_diff |
| 'predict y from x' | § ⑦ | linear regression; ŷ = b₀ + b₁x |
| 'R² interpretation' | § ⑦ | fraction of variance explained |
| 'is slope significant?' | § ⑦ | t-test on b₁; p-value |
| 'multiple regression' | § ⑦ | each b_i: change in y per unit x_i, holding others constant |
'p < 0.001' impresses on tests, but the effect size matters too. A diet program that 'significantly' loses 0.1 kg in n=10000 isn't useful. Always pair p-value with magnitude.
Strong correlation: associative, not causal. To prove causation, need controlled experiment with random assignment. Observational data alone allows multiple causal explanations.