QBUS5001 · Foundation In Data Analytics For Business
Data, Visualisation & Ethics
Module 0 sets the foundation before any calculation: how to source, classify, visualise and communicate business data responsibly. You learn to tell structured from unstructured data, primary from secondary sources, and categorical from numerical variables — the classification that dictates which chart and which later technique is valid.
The module then covers visualisation principles (after Tufte: clarity, high data-ink, no distortion) and the ethics of analytics, including algorithmic bias from biased design or biased training data. It is light on formulas but heavy on judgement, and the vocabulary recurs in the group assignment.
What this chapter covers
- 01Data types: structured vs unstructured, primary vs secondary
- 02Variable classification: categorical vs numerical, cross-section vs time-series
- 03Univariate charts: bar, column, histogram, pie, box-and-whisker
- 04Bivariate charts: the scatter plot
- 05Tufte principles: clarity, data-ink ratio, avoiding distortion
- 06Data communication: matching the chart to the variable type
- 07Ethics: transparency about limitations
- 08Algorithmic bias: design bias and training-data bias
Classifying variables and choosing a chart
- 1 markDelivery region is categorical (nominal) — labels with no natural order.
- 1 markParcels per day and daily fuel cost are numerical (continuous/discrete count); the on-time flag is categorical (binary).
- 1 mark(a) For the distribution of a single numerical variable, use a histogram (or box-and-whisker plot to show centre, spread and outliers).
- 1 mark(b) For the relationship between two numerical variables, use a scatter plot with fuel cost on one axis and parcels on the other.
- 1 markNote an ethics point: if region correlates with outcomes, reporting region-level performance without context could create misleading or biased conclusions — disclose limitations.
Key terms
- Structured data
- Data organised into rows and columns with a fixed schema (e.g. a spreadsheet or relational table), as opposed to unstructured data such as free text, images or audio.
- Primary vs secondary source
- Primary data is collected first-hand for the question at hand (a survey you run); secondary data is reused from an existing source (government statistics, a vendor dataset).
- Data-ink ratio
- Tufte's principle that the proportion of a chart's ink devoted to displaying actual data should be maximised — strip away non-informative decoration.
- Algorithmic bias
- Systematic unfairness in a model's outputs arising from biased design choices or biased training data, which can produce discrimination against groups.
- Categorical variable
- A variable whose values are labels or categories (nominal or ordinal) rather than meaningful quantities; summarised by counts and proportions, not means.
Data, Visualisation & Ethics FAQ
Is there much maths in Module 0?
No — it is conceptual. The value is in correctly classifying variables (which drives every later technique) and in the vocabulary of visualisation and ethics, which the group assignment and short-answer exam questions draw on.
Why does data classification matter for the rest of the course?
Because the valid technique depends on it: numerical variables get means, variances and regression; categorical variables get proportions and the chi-squared test. Misclassifying a variable leads to the wrong test.
What is the difference between cross-section and time-series data?
Cross-section data captures many units at one point in time (sales across 50 stores this month); time-series data tracks one unit over many periods (one store's monthly sales over five years). Time-series raises issues like autocorrelation later in the course.
Exam move
Make a small reference card matching each variable type to its summary statistics and its chart, because every downstream module assumes you can do this instantly. Memorise the two sources of algorithmic bias (design vs training data) and one concrete business example of each, since ethics questions tend to be short-answer and reward a crisp, applied definition over a vague one.