University of Sydney · S1 2026 · FACULTY OF BUSINESS & ECONOMICS

QBUS5001 · Foundation In Data Analytics For Business

- one subject, every graph, every model, every mark

50% final exam · hurdle14 Chapters8-page Bible

Our own words - no uploaded lecturer files

Built to mirror S1 2026 · updated this semester

Chapter 1 of 11 · QBUS5001

Data, Visualisation & Ethics

Module 0 sets the foundation before any calculation: how to source, classify, visualise and communicate business data responsibly. You learn to tell structured from unstructured data, primary from secondary sources, and categorical from numerical variables — the classification that dictates which chart and which later technique is valid.

The module then covers visualisation principles (after Tufte: clarity, high data-ink, no distortion) and the ethics of analytics, including algorithmic bias from biased design or biased training data. It is light on formulas but heavy on judgement, and the vocabulary recurs in the group assignment.

In this chapter

What this chapter covers

01Data types: structured vs unstructured, primary vs secondary
02Variable classification: categorical vs numerical, cross-section vs time-series
03Univariate charts: bar, column, histogram, pie, box-and-whisker
04Bivariate charts: the scatter plot
05Tufte principles: clarity, data-ink ratio, avoiding distortion
06Data communication: matching the chart to the variable type
07Ethics: transparency about limitations
08Algorithmic bias: design bias and training-data bias

Worked example · free

Classifying variables and choosing a chart

Q [5 marks]. A logistics firm has a dataset with these fields: delivery region (North/South/East/West), parcels delivered per day, on-time flag (Yes/No), and daily fuel cost in dollars. Classify each variable and state one appropriate chart for (a) the distribution of daily parcels and (b) the relationship between fuel cost and parcels delivered.

1 markDelivery region is categorical (nominal) — labels with no natural order.
1 markParcels per day and daily fuel cost are numerical (continuous/discrete count); the on-time flag is categorical (binary).
1 mark(a) For the distribution of a single numerical variable, use a histogram (or box-and-whisker plot to show centre, spread and outliers).
1 mark(b) For the relationship between two numerical variables, use a scatter plot with fuel cost on one axis and parcels on the other.
1 markNote an ethics point: if region correlates with outcomes, reporting region-level performance without context could create misleading or biased conclusions — disclose limitations.

Region = categorical nominal; on-time flag = categorical binary; parcels/day and fuel cost = numerical. Use a histogram for the parcels distribution and a scatter plot for the fuel-cost vs parcels relationship.

Sia tip — Chart choice follows variable type: one numerical → histogram/boxplot; one categorical → bar/pie; two numerical → scatter. Stating the variable type first makes the chart answer automatic.

Glossary

Key terms

Structured data: Data organised into rows and columns with a fixed schema (e.g. a spreadsheet or relational table), as opposed to unstructured data such as free text, images or audio.
Primary vs secondary source: Primary data is collected first-hand for the question at hand (a survey you run); secondary data is reused from an existing source (government statistics, a vendor dataset).
Data-ink ratio: Tufte's principle that the proportion of a chart's ink devoted to displaying actual data should be maximised — strip away non-informative decoration.
Algorithmic bias: Systematic unfairness in a model's outputs arising from biased design choices or biased training data, which can produce discrimination against groups.
Categorical variable: A variable whose values are labels or categories (nominal or ordinal) rather than meaningful quantities; summarised by counts and proportions, not means.

FAQ

Data, Visualisation & Ethics FAQ

Is there much maths in Module 0?

No — it is conceptual. The value is in correctly classifying variables (which drives every later technique) and in the vocabulary of visualisation and ethics, which the group assignment and short-answer exam questions draw on.

Why does data classification matter for the rest of the course?

Because the valid technique depends on it: numerical variables get means, variances and regression; categorical variables get proportions and the chi-squared test. Misclassifying a variable leads to the wrong test.

What is the difference between cross-section and time-series data?

Cross-section data captures many units at one point in time (sales across 50 stores this month); time-series data tracks one unit over many periods (one store's monthly sales over five years). Time-series raises issues like autocorrelation later in the course.

Study strategy

Exam move

Make a small reference card matching each variable type to its summary statistics and its chart, because every downstream module assumes you can do this instantly. Memorise the two sources of algorithmic bias (design vs training data) and one concrete business example of each, since ethics questions tend to be short-answer and reward a crisp, applied definition over a vague one.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 203 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your QBUS5001 tutor, unlimited, worked the way the exam marks it

The full 8-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works