BUSS6002: Data Science in Business: BUSS6002 W6: Linear Algebra and

Mar 26, 2026

All files

It looks like you've uploaded a file, but didn't include a message or question.
If you need help with something in your document (BUSS6002 W6-Lec.pdf), please let me know what you'd like to do. Here are some common options:

  • Summarize the contents of a section or the whole file
  • Explain a specific term, equation, or concept
  • Answer practice questions or work through an example
  • Check assumptions or explain outputs from a regression
  • Anything else you need for study or revision

Just type your request or a keyword—I'm here to assist!

summarize_document

BUSS6002 W6: Linear Algebra and Linear Regression

This document provides a summary of key concepts in Linear Algebra and Linear Regression, intended for students of Ella Luo.

I. Linear Algebra

A. Matrices

  • Definition: A matrix is a rectangular array of numbers.
    • d_ij represents the element in the i-th row and j-th column.
    • The dimension of a matrix A is denoted as n x m (n rows by m columns).
    • Matrices are typically represented by bold upper-case letters (e.g., A).
  • Matrix Equality: Two matrices A and B are equal (A = B) if they have the same dimensions and all corresponding elements are equal (a_ij = b_ij for all i and j).

B. Matrix Operations

  • Matrix Addition:
    • Matrices must have the same dimensions to be added.
    • Commutative Property: A + B = B + A
    • Associative Property: (A + B) + C = A + (B + C)
  • Matrix Scalar Multiplication: Multiplying each element of a matrix by a scalar.
    • Example: 0.5 * [[1, 2], [3, 4]] = [[0.5, 1], [1.5, 2]]
  • Matrix Multiplication:
    • Given an n x m matrix A and an m x r matrix B, their product AB results in an n x r matrix C.
    • The element c_ij is calculated by the dot product of the i-th row of A and the j-th column of B.
    • Example: [[4, 9, 6], [1, 5, 8]] * [[1, 2], [8, 3], [5, 7]] = [[(4*1 + 9*8 + 6*5), (4*2 + 9*3 + 6*7)], [(1*1 + 5*8 + 8*5), (1*2 + 5*3 + 8*7)]] = [[77, 71], [61, 73]]
    • Properties:
      • Not Commutative: AB ≠ BA (generally)
      • Associative Property: (AB)C = A(BC)
      • Distributive Property: A(B + C) = AB + AC

C. Special Matrices

  • Square Matrix: A matrix where the number of rows equals the number of columns (n = m).
  • Diagonal Matrix: A square matrix where all off-diagonal elements are zero.
  • Identity Matrix (I): A diagonal matrix with ones on the main diagonal.
    • Useful Property: AI = IA = A

D. Inverse of a Matrix

  • An n x n square matrix A is invertible if there exists an n x n matrix B such that AB = BA = I.
  • The matrix B is called the inverse of A and is denoted as A⁻¹.
  • A square matrix is singular (not invertible) if and only if its determinant is zero.
  • For a 2x2 matrix [[a, b], [c, d]], the inverse is (1 / (ad - bc)) * [[d, -b], [-c, a]].

II. Linear Regression

A. Introduction to Linear Regression

  • Purpose: To describe the relationship between a continuous response variable (Y) and a set of predictor variables (X).
    • Y = f(X) + ε
    • Y: Response, dependent variable, target.
    • X: Predictors, features, independent variables, covariates.
    • ε: Error term.
  • Supervised Learning: Involves learning from observed feature-target pairs where targets have labels.
  • Why Use Linear Regression:
    • Useful for predictions.
    • Easy to interpret (not a "black-box").
    • A good starting point for more complex models.
  • Visualization: Pairwise scatter plots can help visualize the relationship between predictors and the response.

B. Simple Linear Regression (SLR)

  • Model: Involves only one predictor (x) and assumes a linear relationship: y = β₀ + β₁x + ε.
    • β₀: Intercept (estimated as β̂₀).
    • β₁: Slope (estimated as β̂₁).
    • ε: Error term with zero mean and constant variance.
  • Prediction: Predicted value ŷ = β̂₀ + β̂₁x.
  • Estimating Coefficients (Least Squares):
    • Minimize the Residual Sum of Squares (RSS): RSS = Σ(yᵢ - ŷᵢ)² = Σ(yᵢ - (β̂₀ + β̂₁xᵢ))².
    • The optimal β̂₀ and β̂₁ are found using the Least Squares method.
    • Optimal Solutions:
      • β̂₁ = Σ(xᵢ - x̄)(yᵢ - ȳ) / Σ(xᵢ - x̄)²
      • β̂₀ = ȳ - β̂₁x̄

C. Interpreting a Linear Regression Model

  • β₁ (Slope): Represents the average change in y for a one-unit increase in x.
  • β₀ (Intercept): Represents the expected value of y when x is 0.
  • ε (Error Term): Represents the variation in y not explained by the model.

D. Accuracy of Coefficients

  • True Model vs. Estimated Model: The true model describes the population, while estimates are derived from a sample.
  • Estimates as Random Variables: Parameter estimates (β̂₀, β̂₁) are random variables with their own mean and standard deviation.
  • Unbiasedness: Least squares estimates are unbiased if their mean equals the true parameter value (E(β̂₀) = β₀, E(β̂₁) = β₁).
  • Standard Error (SE): The standard deviation of an estimator.
    • SE(β̂₀) and SE(β̂₁) formulas are provided (dependent on sample variance σ² and predictor values).
  • Confidence Interval (CI): A range of values likely to contain the true parameter.
    • Approximate 95% CI for β₁: [β̂₁ - 2 * SE(β̂₁), β̂₁ + 2 * SE(β̂₁)]. (More accurately uses t-distribution quantiles).
  • Hypothesis Testing:
    • Null Hypothesis (H₀): No relationship between x and y (β₁ = 0).
    • Alternative Hypothesis (H₁): There is a relationship (β₁ ≠ 0).
    • Rejection: If the CI does not contain 0, or if the calculated t-statistic (t = β̂₁ / SE(β̂₁)) falls outside the critical region defined by the t-distribution (with n-2 degrees of freedom), H₀ is rejected.

E. Goodness of Fit

  • R-squared ():
    • Measures the proportion of variability in y explained by the model.
    • R² = 1 - (RSS / TSS)
    • TSS (Total Sum of Squares): Σ(yᵢ - ȳ)²
    • RSS (Residual Sum of Squares): Σ(yᵢ - ŷᵢ)²
    • Ranges from 0 to 1. Higher values indicate a better fit to the training data.

F. Multiple Linear Regression

  • Model: y = β₀ + β₁x₁ + ... + βₚxₚ + ε
    • β: Vector of coefficients [β₀, β₁, ..., βₚ]ᵀ.
    • X: Matrix of predictors (including a column of ones for the intercept).
  • Interpretation of βⱼ: The average change in y for a one-unit increase in xⱼ, holding all other predictors constant.
  • Least Squares: Minimize RSS = ||y - Xβ||² = (y - Xβ)ᵀ(y - Xβ).
    • Solution: β̂ = (XᵀX)⁻¹Xᵀy (requires XᵀX to be invertible).

G. Residual Diagnostics

  • Purpose: To check if the assumptions of the linear regression model are met, which is crucial for valid statistical inference.
  • Assumptions:
    • Linearity: The relationship between predictors and the response is linear.
    • Independence: Errors (ε) are independent of each other.
    • Normality: Errors are normally distributed.
    • Equal Variance (Homoscedasticity): Errors have a constant variance across all levels of predictors.
  • Diagnostic Tools:
    • Residual Plots: Plotting residuals against fitted values (ŷ).
      • Patterns suggest non-linearity or unequal variance (heteroscedasticity).
    • Squared Residuals vs. Fitted Values: Helps detect heteroscedasticity.
    • Q-Q Plot (Quantile-Quantile Plot): Compares the distribution of residuals to a theoretical normal distribution. Points should roughly follow a straight line for normality.

Note: The provided text also includes practice questions and examples related to these topics.

中文讲解

以下是对“BUSS6002 W6-Lec.pdf”文件的中文要点总结与讲解:


一、线性代数基础

1. 矩阵基础

2. 矩阵运算

3. 特殊矩阵

4. 矩阵的逆


二、线性回归基础

1. 线性回归基本模型

2. 为什么用线性回归

3. 简单线性回归(SLR)

4. 回归参数的估计与解释

5. 假设检验与显著性

6. 模型优度与诊断

7. 多元线性回归


如需某一部分详细推导、例题解析或公式讲解,可继续提出!

summarize_document

BUSS 6002 Final Review Summary

This document provides a comprehensive review for BUSS 6002, covering Big Data, Data Handling, Machine Learning algorithms, Model Selection, and Marketing Applications.


Part 1. Introduction to Big Data

1.1 Analytical Capabilities

Big Data analytics can be categorized into four types:

  • Descriptive Analytics: What happened?
  • Diagnostic Analytics: Why did it happen?
  • Predictive Analytics: What will happen?
  • Prescriptive Analytics: What should we do? How can we make it happen?

Prescriptive analytics often involves Optimization and Foresight.

1.2 CRISP-DM and Snail Shell Model

These are process models for Knowledge Discovery in Databases (KDDA).

  • CRISP-DM (Cross-Industry Standard Process for Data Mining):

    • Business Understanding: Identify business problems, collect initial data, determine objectives, output a project plan.
    • Data Understanding: Explore data, check quality, examine metadata.
    • Data Preparation: Select, clean, construct, integrate, and format data.
    • Modeling: Select appropriate techniques, generate test design (train, test, validation), build models.
    • Evaluation: Evaluate model performance against business objectives, determine next steps.
    • Deployment: Deploy the model, plan monitoring, and create a final report.
  • Snail Shell KDDA Process Model: This model also outlines a process for data analytics, with similar phases to CRISP-DM.


Part 2. Data Handling

2.1 Data Quality Issue

  • Missing Data:

    • Missing Completely at Random (MCAR): Missingness is independent of all variables (observed and unobserved). Causes no bias but is rare.
    • Missing at Random (MAR): Missingness depends on observed variables. Can cause bias.
    • Not Missing at Random (NMAR): Missingness depends on unobserved variables. Common but hard to identify and address.
    • Handling: Delete or impute missing data.
  • Outliers: Data points significantly different from others.

2.2 Exploratory Data Analysis (EDA)

  • Univariate:

    • Non-Graphical:
      • Categorical: Counts, proportions.
      • Numerical: Mean, median, spread.
    • Graphical:
      • Categorical: Bar chart.
      • Numerical: Histogram, box plot, QQ plot.
  • Multivariate:

    • Non-Graphical:
      • Numerical vs. Numerical: Covariance matrix.
    • Graphical:
      • Numerical vs. Numerical: Correlation heatmap, scatterplot.
      • Category vs. Numerical: Box plot.

2.3 Feature Engineering

  • Structured Data Feature Engineering:

    • Standardization: Scales data to have a mean of 0 and a variance of 1.
    • Normalization: Scales data into a range, typically [0, 1].
    • Exponential or Log Transformation: Useful for addressing right-skewed data.
    • Linear Regression Feature Engineering:
      • Interpreting log-transformed variables.
      • Creating dummy variables.
      • Polynomial Regression.
  • Text Data Feature Engineering:

    • Bag-of-Words: Extracts features based on word occurrence within a document. Creates a vocabulary and measures word presence. Does not capture word order.
    • TF-IDF (Term Frequency-Inverse Document Frequency): Translates word counts into a measure of importance.
      • Formula: TFIDF = (N of token t in document d) / (N of tokens in document d) * ln(N of documents containing token t)
    • Stemming or Lemmatization: Reducing words to their root form.

Part 4. Machine Learning Algorithms

4.1 Unsupervised Learning - Clustering

  • Goal: Partition data into clusters where points within a cluster are similar, and points in different clusters are dissimilar.
  • K-Means Algorithm:
    1. Initialize cluster centroids (randomly).
    2. Assign each data point to the nearest centroid.
    3. Recalculate the centroid of each cluster.
    4. Repeat steps 2-3 until centroids no longer change.
  • K-Means Appropriateness:
    • Highly dependent on initial centroid selection.
    • Elbow Method: Used to select the optimal number of clusters (k).
    • Advantage: Fast computational speed.
    • Limitation: Assumes clusters are convex and isotropic (circular).

4.2 Supervised Learning

4.2.1 Regression - Linear Regression
  • Model: $Y = X\beta + \epsilon$
  • Solution (OLS): $\beta = (X^T X)^{-1} X^T y$
  • Residual Diagnostics:
    • Linearity: Plot residuals against fitted values.
    • Equal Variance (Homoscedasticity): Plot squared residuals against fitted values.
4.2.2 Classification - Logistic Regression
  • Model: A generalized linear model (GLM).

  • Forecasting Probability: $P(Y=1|x) = \frac{1}{1 + e^{-X\beta}}$

  • Decision Boundary: Linear.

  • Interpreting Coefficients:

    • $\beta_0$: Odds of class 1 when all $x$ are zero.
    • $\beta_i$: For a 1-unit increase in $x_i$, the odds increase by $(\exp(\beta_i) - 1) * 100%$.
  • Likelihood Function: Measures how likely the observed data is given the model parameters.

  • Maximum Likelihood Estimation (MLE): Finds parameters that maximize the likelihood function. $\hat{\beta} = \arg \max L(x|\beta)$.

  • Classification Model Evaluation:

    • Accuracy: $(TP + TN) / (TP + TN + FP + FN)$
    • True Positive Rate (Recall, Sensitivity): $TP / (TP + FN)$
    • True Negative Rate (Specificity): $TN / (TN + FP)$
    • False Positive Rate: $FP / (FP + TN) = 1 - Specificity$
    • False Negative Rate: $FN / (TP + FN)$
    • Precision: $TP / (TP + FP)$
    • False Discovery Rate: $FP / (TP + FP) = 1 - Precision$
    • F1-Score: $2 * (Precision * Recall) / (Precision + Recall)$

Part 5. Analytical Methods & Optimization

5.3.1 Analytical Methods

  • Approach: Solve for parameters by setting partial derivatives of the loss function to zero.
  • Problem: Matrix inversion can be computationally expensive and infeasible for large datasets. Some loss functions may not have a unique solution.

5.3.2 Gradient Descent

  • Basic Idea: Iteratively move towards the minimum of a function by taking steps in the direction of the negative gradient.
  • Steps:
    1. Initialize parameters ($\beta^0$).
    2. Iterate: $\beta^{t+1} = \beta^t - \alpha \nabla L(\beta^t)$, where $\alpha$ is the step size (learning rate).
    3. Stop when the update is below a threshold.
  • Step Size (Learning Rate): Controls the speed of convergence. Too large can overshoot, too small can lead to slow convergence.
  • Convexity:
    • Convex Function: Guaranteed to find the global minimum.
    • Non-Convex Function: May find a local minimum; requires trying different initializations.

5.3.3 Comparison between Analytic and Gradient Descent

  • Analytic Solution: Mathematically simpler, easier to implement for some problems.
  • Gradient Descent: Lower maximum computation requirements, more scalable for large datasets.
  • Both methods aim for the same solution (within tolerance). Gradient descent is often preferred for large-scale problems.

Part 6. Model Selection

  • Overfitting vs. Underfitting:
    • Overfitting: Complex model, low bias, high variance. Performs well on training data but poorly on unseen data.
    • Underfitting: Simple model, high bias, low variance. Performs poorly on both training and unseen data.
  • MSE Trend: Training MSE decreases with increasing complexity. Test MSE first decreases, then increases. The optimal model minimizes test error.
  • Data Splitting:
    • Training Set: Used for EDA, model building, and parameter estimation.
    • Validation Set: Used for model selection and hyperparameter tuning.
    • Test Set: Used for final, unbiased evaluation of the selected model.

Part 7. Applications in Marketing

2. Customer Analytics

  • Product-Centric Marketing: Often uses Collaborative Filtering.
  • Customer-Centric Marketing: Focuses on individual customer value. Uses models like:
    • Natural Propensity Models: Predicts the likelihood of a customer taking a specific action (e.g., purchasing a product) regardless of marketing intervention.
    • Campaign Response Models: Predicts the likelihood of a customer responding to a specific marketing campaign.
    • Uplift Models: Estimate the causal impact of a marketing intervention by comparing the outcome for treated vs. untreated customers. Categories include:
      • "Lost Causes" (treated vs. untreated: No difference or negative)
      • "Sleeping Dogs" (treated vs. untreated: Negative impact)
      • "Persuadable" (treated vs. untreated: Positive impact)
      • "Sure Thing" (treated vs. untreated: No difference, but would have bought anyway)

3. Customer Analytics - Measuring Success

  • Good Success Metrics: Should be business-aligned, measure uplift, have appropriate timing, and be at the right level of aggregation.
  • Common Pitfalls: Focusing solely on engagement metrics (likes, shares) without linking them to business outcomes.

Part 8. Big Data Solutions

  • Algorithm Running Time: Determined by the fastest-growing term after dropping coefficients (e.g., $O(n^3)$).
  • Tall Data Solutions:
    • Scalable Algorithms: e.g., Stochastic Gradient Descent (SGD).
    • Parallelization: Divide and Conquer approaches.

Key Concepts & Questions Addressed in Practice Problems

  • Data Quality: Identifying missing data types (MCAR, MAR, NMAR), handling duplicates.
  • EDA: Univariate and multivariate analysis techniques.
  • Feature Engineering: Standardization, normalization, log transformations, dummy variables, TF-IDF, stemming/lemmatization.
  • Clustering (K-Means): Algorithm steps, sensitivity to initialization and outliers, selecting 'k' (elbow method), limitations (cluster shape).
  • Linear Regression: OLS solution, residual diagnostics, interpreting coefficients, feature transformations, bias-variance trade-off.
  • Logistic Regression: Probability forecasting, interpreting coefficients, MLE, classification metrics (Accuracy, Precision, Recall, F1-Score).
  • Optimization: Analytic solutions vs. Gradient Descent, learning rate, convexity.
  • Model Selection: Overfitting/underfitting, train/validation/test splits.
  • Marketing Analytics: Customer-centric vs. product-centric, campaign response models, uplift models, defining success metrics.
  • Big Data: Scalability, parallelization.
  • Text Analysis: Bag-of-Words, TF-IDF.
  • CRISP-DM: Phases and their purpose.
  • Ethics in Data Science: Principles of responsible AI.
结合Week6 汇总

BUSS6002 第六周(Week 6)“线性代数与线性回归”结合全套复习&重点概念中文汇总:


一、线性代数基础知识(复习PPT + 期末复习文档)

1. 矩阵基础

2. 特殊矩阵与矩阵逆


二、线性回归整体结构

1. 线性回归模型本质

2. 简单线性回归(SLR)推导与解读

3. 多元线性回归

4. 模型拟合优度


三、补充:数据工程与特征处理


四、知识点典型总结与实战题解析(部分)


重点掌握:Week6重点在于线性代数(矩阵概念/运算/逆矩阵)+ 线性回归完整步骤(建模、估计、推断、诊断图、假设检验)以及连接实际数据工程与模型选用全流程的综合理解。

如需每个部分公式详细推导或习题举例讲解,欢迎继续提问!

generate_study_guide

Ask Sia for quick explanations, examples, and study support.

Let's Get in Touch

AskSia on InstagramAskSia on TikTokAskSia on DiscordAskSia on FacebookAskSia on LinkedInAskSia on Reddit