University of Sydney · S1 2026 · FACULTY OF BUSINESS & ECONOMICS

BUSS6002 · Data Science In Business

- one subject, every graph, every model, every mark
50% final exam · hurdle14 Chapters7-page Bible
Our own words - no uploaded lecturer files
Built to mirror S1 2026 · updated this semester
Chapter 1 of 11 · BUSS6002

Foundations of Data Science in Business

Week 1 of BUSS6002 sets the conceptual spine the rest of the unit hangs on. The founding idea is that data is collected, not handed to us by nature — every dataset carries the choices of whoever gathered it — so value comes from analysis and interpretation, never from size alone. You learn to separate big data (defined by the four V's: Volume, Variety, Velocity, Veracity) from small data (bounded, deliberately collected for a purpose), and the three-perspective model in which data science lives where Analytics, Domain knowledge and IT overlap.

It looks like the easiest week, and it is examined as cheap MCQ and short-answer marks — but quant-anxious students routinely drop them. This chapter locks in the definitions and the four classic traps (correlation ≠ causation, missing context, quality > quantity, big-in-bytes ≠ big data) that recur in both the mid-semester and final exams.

In this chapter

What this chapter covers

  • 011. Data is collected, not given — datasets reflect the choices of whoever gathered them, which seeds later bias/quality/ethics chapters
  • 022. Data science already reshapes business, most visibly in marketing (customer internet behaviour now drives strategy)
  • 033. Value comes from analysis and interpretation, not from data itself — 'big data comes with big responsibility'
  • 044. Big data = the four V's: Volume, Variety, Velocity, Veracity
  • 055. Small data — bounded, human-comprehensible, deliberately collected for a specific present-day decision
  • 066. The examinable axis: how data ARISES (passive/continuous vs deliberate/once-off), not its byte count
  • 077. The three perspectives — Analytics (algorithms), Domain knowledge (problems & consequences), IT (storage/ETL)
  • 088. Data-driven decisions and the tightening decision–feedback loop vs slow pre-digital marketing
  • 099. The four traps — correlation ≠ causation, context/big-picture, quality > quantity, big-in-bytes ≠ big data
Worked example · free

Classify the approach: big data or small data?

Q [3 marks]. A streaming-music service is deciding whether to add a cheaper ad-supported tier. The product team proposes four evidence sources. Which one is a 'big data' approach as the unit defines it, and why? (A) a 500-listener online survey on price sensitivity analysed with a t-test; (B) continuously streamed play, skip, search and device logs from tens of millions of listeners, mined for behaviour patterns; (C) the quarterly subscriber-revenue report; (D) a 9-person listening lab session.
  • +1State the test. Big data must show the four V's together — high Volume, Velocity (continuously generated) and Variety (heterogeneous formats); judge by how the data arises, not by how many bytes it occupies.
  • +1Eliminate the small-data options. A, C and D are all bounded, deliberately collected, low-velocity single sources — classic small data, however rigorous the survey in A looks.
  • +1Select B and justify: it streams continuously (Velocity), spans play/skip/search/device events (Variety) across tens of millions of listeners (Volume), so it is the only big-data approach.
Option B — the continuously streamed multi-source listener logs — is the big-data approach, because it is the only option exhibiting Volume, Velocity and Variety together. The survey (A) is the trap: small data is not the same as low quality.
Sia tip — The well-designed survey is always the bait because it feels 'scientific'. The course tests the generation mechanism (passive & continuous vs deliberate & once-off), not rigour and not byte count — read every option through the four-V lens before answering.
Glossary

Key terms

Big data (four V's)
Data characterised by Volume (size), Variety (many formats), Velocity (high-speed generation) and Veracity (trustworthiness). The defining feature is how it arises — passively and continuously from many sources — not its size in bytes.
Small data
Data small enough for direct human comprehension, deliberately collected for a specific purpose, and able to impact a decision in the present. It can be large in bytes yet still small data if it fails the four-V test; small ≠ low quality.
Veracity
The trustworthiness or reliability of data — the fourth V. Poor veracity (e.g. inconsistent early-pandemic case counts) yields confident but wrong evidence-based decisions, which is why quality of insight beats quantity.
The three perspectives
The complementary capabilities every data-science project needs: Analytics (algorithms and tools that inform decisions), Domain/business knowledge (identify problems and own the consequences) and Information Technology (storage, ETL, speed of access). Data science sits at their intersection.
ETL (extract–transform–load)
The IT-perspective pipeline that extracts data from sources, transforms it into a usable shape and loads it into a store — the plumbing that makes data accessible at the speed analytics needs.
Data-driven decision
A decision where the choice between actions is determined primarily by evidence extracted from data (patterns, predictions, estimated effects) rather than by intuition, hierarchy or precedent alone.
Correlation ≠ causation
A pattern between two variables does not establish that one causes the other; a hidden confounder can drive both. Only a randomised experiment licenses a causal claim — a striking pattern in big data still does not prove cause.
Marketing fatigue
The decline in consumer responsiveness caused by message overload; a domain-knowledge concept that shows why more data-driven outreach is not automatically better outreach.
FAQ

Foundations of Data Science in Business FAQ

Is Week 1 actually examinable, or just background?

It is directly examinable. Foundations shows up as 1-mark MCQ classification items (big vs small data) and short, definition-style short-answer questions in both the 25% mid-semester and the cumulative 45% final. They are among the cheapest marks in the unit, which is exactly why under-prepared students drop them.

What is the difference between big data and small data?

Big data is defined by the four V's — Volume, Variety, Velocity and Veracity — and arises passively and continuously from many sources. Small data is bounded, human-comprehensible and deliberately collected for a specific purpose. The examinable distinction is how the data is generated, not how many bytes it occupies, so a petabyte-scale single survey is still small data.

What are the three perspectives I have to memorise?

Analytics (the algorithms, models and tools), Domain/business knowledge (identifying the right problems and owning the consequences) and Information Technology (storage, ETL and speed of access). Data science is interdisciplinary — it lives where all three overlap, and a common short-answer asks you to name each with its role.

Why does the course insist 'correlation is not causation'?

Because a pattern in data — even a striking one in a huge dataset — can be produced by a hidden confounder that drives both variables, so the association is partly spurious. Only a randomised experiment, which balances confounders, lets you attribute a change in the outcome to the treatment.

Is this guide official or affiliated with the University of Sydney?

No. This is an independent AskSia study resource for BUSS6002. It is not produced, endorsed by or affiliated with the University of Sydney; always confirm definitions, dates and assessment details against your official Canvas unit outline.

Study strategy

Exam move

Treat Foundations as guaranteed, low-effort marks and bank them early. Memorise three crisp checklists — the four V's (Volume, Variety, Velocity, Veracity), the three perspectives (Analytics / Domain knowledge / IT, each with what it owns) and the four traps (correlation ≠ causation, missing context, quality > quantity, big-in-bytes ≠ big data) — and practise classifying short business scenarios as big or small data by how the data arises rather than by its size. In short-answer questions, signpost every mark explicitly (definition / contrast / change) because graders scan for the keyword pivot, which for data-driven decisions is the tightening of the feedback loop. Finally, carry the single sentence 'data is collected, not given' forward: it is the seed of the data-quality, ethics and big-data chapters later in the unit.

A+Everything unlocked
Unlocks this Bible + all 203 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.
Sia - your BUSS6002 tutor, unlimited, worked the way the exam marks it
The full 7-page Bible + practice bank with worked solutions
Chrome extension - sync your LMS so Sia knows your deadlines
Bilingual EN / Chinese on every Bible and every Sia answer
$25/ month
30-day money-back · cancel in one tap · how it works
Unlock the full BUSS6002 Bible + 203 University of Sydney subjects解锁完整 BUSS6002 Bible + University of Sydney 203 门科目
$25/mo