USyd · DATA3404 · Scalable Data Management

DATA3404: pass the exams, not just read the notes

Your complete guide to University of Sydney's scalable data management unit. See where the marks are, work real practice questions, and study with an AI tutor that knows DATA3404.

6 credit points Level 3 undergrad Offered S1 ~60% exams School of Computer Science

Sia generates DATA3404 practice questions, walks through storage engines and tree-based indexing (b+-trees) step by step, and quizzes you on the material the exam weights most heavily.

Spot the bug

Find what is wrong

Multiple choice · the fix is revealed after you answer

A team runs DATA3404 Assignment 2 in PySpark. A small lookup DataFrame `regions` (a few KB) is joined to a huge `trips` DataFrame, then the count is printed. The job triggers a full shuffle of `trips` across the cluster and runs far slower than expected. Which single change fixes the performance bug?

    regions = spark.read.parquet('regions')   # tiny
    trips   = spark.read.parquet('trips')     # billions of rows
    joined  = trips.join(regions, 'region_id')
    result  = joined.filter(joined.fare > 100)
    print(result.count())
The fix

By default Spark uses a shuffle sort-merge join, which shuffles BOTH inputs across the cluster on the join key. Shuffling a billions-row table is the expensive step here.

When one side is tiny (below spark.sql.autoBroadcastJoinThreshold), the right strategy is a broadcast join: send the small table to every node and do a local hash join, so the huge table never moves.
Wrapping the small DataFrame in broadcast(regions) forces this strategy: trips.join(broadcast(regions), 'region_id') removes the shuffle of trips.
cache() only stores regions in memory, it does not stop the shuffle of trips. repartition adds more shuffling, not less. Pushing the filter down is a real optimisation but the fare>100 filter is on trips after the join and Catalyst already pushes such filters, so it does not remove the join shuffle; the join strategy is the bug.

The trap: Reaching for cache() or repartition() because the table is reused or skewed. Neither removes the shuffle of the huge table. The fix is choosing the broadcast join strategy so only the tiny table is moved. classic slip!

your whole grade
Where your grade comes from Exams 60% · Projects 24% · Quizzes 6% · Participation 5% · Presentations 5%

One exam decides 60% of your grade. Hurdle task: at least 40% required on this exam. This whole page is built around that.

Overview

What DATA3404 is, and where it sits

DATA3404 is a third-year database-systems-internals and big-data unit where you stop using databases and start understanding how they are built. You look under the hood of the engines that store, index, and query large data collections. The single-node half covers storage layout, buffer management, B+-tree, hash and bitmap indexing, external sorting, the three join algorithms, and cost-based query optimisation. The distributed half covers partitioning, replication, the CAP theorem, HDFS and MapReduce, Apache Spark (RDD, DataFrame and DAG execution), data-stream processing, and NoSQL.

The practical assignments use PostgreSQL, the SimpleDB educational engine, and Spark/Databricks, but the heavily weighted final exam does not assess your code. It tests hand-worked I/O cost models: join cost counting, external-sort pass formulas, B+-tree height and insertion mechanics, extendible-hashing splits, buffer-replacement traces, and selectivity-based query-optimisation costing. The lecturer motto is to learn the principles, not the software.

It is a 6-credit-point Level 3 unit in the School of Computer Science. It builds on an intermediate databases unit (DATA2001, DATA2901, ISYS2120, INFO2120 or INFO2820), is prohibited against INFO3504 and INFO3404, and feeds data-engineering, data-science and systems pathways.

How it differs from its first-year siblings. Where QBUS5001 and BUSS6002 use data tools to answer business questions, DATA3404 is about the machinery underneath: it asks why a query is slow and how the engine executes it, costed by hand in page I/Os, rather than how to model or interpret the data.

Official outline: sydney.edu.au · DATA3404 outline. Always treat the official outline and the exam timetable as authoritative.

Difficulty & time commitment

Is DATA3404 hard, and how much time does it take?

DATA3404 is manageable if you keep a weekly rhythm and treat the back half as the main event. Across student reviews the pattern is consistent: it starts gently and steepens, and the heaviest assessment is the part that separates grades.

Difficulty
3.5 / 5
Hard. Gentle early, demanding back half. Hard to fail with steady work; an HD takes consistent practice.
Exam load
60%
The exams decide most of the grade. The heaviest single component is 60%.
Weekly time
~10 hrs
The standard load for a 6-credit-point unit, around 1.5 hours per credit point per week including class.

A read across student reviews and course feedback. See what students say ↓

Weeks 1 to 4 (storage, indexing)building
Weeks 5 to 7 (sorting, joins, optimisation)steep
Weeks 8 to 13 (distributed, Spark, NoSQL)reuses join cost reasoning per node

The difficulty curve and the assessment weighting point the same way: the back half is harder and worth more. Front-loading effort there is the highest-return decision in the unit.

Is this unit for you

Who tends to do well, and who tends to struggle

You will likely do well if

  • You enjoy working cost formulas by hand and stay precise with ceilings, floors, page counts and buffer sizes under exam time pressure
  • You build a one-page cost-formula reference early (join formulas, external-sort passes, B+-tree height, extendible-hashing rules) and drill the worked examples until plugging in is automatic
  • You treat the weekly quizzes and worksheets as exam rehearsal rather than box-ticking
  • You can coordinate a group of three across two practical assignments without letting the SQL and Spark work crowd out exam-theory revision

You may struggle if

  • You rely on coding ability to carry you, because the 60% exam explicitly does not assess code, it assesses the maths behind the engine
  • You leave the cost models until swotvac, because they compound (joins reuse sorting, distributed joins reuse single-node join cost) and cannot be crammed cold
  • You coast on group assignments and skip the individual quizzes and worksheets that rehearse the exam
  • You are shaky on the prerequisite SQL, relational algebra or basic programming the unit assumes
do this ↘
What HD students do differently
  • Memorise and can re-derive every cost formula: NLJ b_R + |R|*b_S, BNLJ b_R + ceil(b_R/(M-2))*b_S, INLJ b_R + |R|*(1+h+matches), sort-merge, hash 3*(b_R+b_S), and external-sort passes = 1 + ceil(log_(B-1) ceil(N/B))
  • Always pick the smaller relation as the outer in a nested-loops join and explain why
  • Practise the full chain without notes: a B+-tree insertion trace, an extendible-hashing build, a CLOCK or GCLOCK buffer trace, and a Car/Trip-style optimisation costing
  • Carry single-node join reasoning into the distributed setting (per-node cost, then divide by the number of nodes) and pick the right distributed-join strategy
  • Confirm the exact semi-open-book permitted materials with the coordinator and build your allowed notes accordingly

Syllabus

The 13 topics, week by week

The exam-weight marker on each topic shows where the marks concentrate. The amber topics carry the highest exam weight.

W1

T1 · Organisation, admin and introduction

Hellerstein, Stonebraker & Hamilton (2007), Architecture of a Database System

Unit-in-brief and the learn-the-principles-not-the-software motto; storage hierarchy from registers to disk and the access gap; pages as the unit of disk I/O; MB (decimal) versus MiB (binary).

Lower exam weight
W2

T2 · Storage engines and physical data organisation

Ramakrishnan & Gehrke (storage); Petrov, Database Internals

Heap versus sorted file cost table; fixed and variable-length records; slotted page and TID; PostgreSQL TOAST; row versus column store and PAX; the buffer manager (pin count, dirty bit); replacement policies FIFO, LRU, MRU, LFU, CLOCK and GCLOCK.

W3

T3 · Tree-based indexing (B+-trees)

Ramakrishnan & Gehrke (tree indexing)

Index as an access path via a search key; ISAM versus B+-tree; entries only in leaves with sibling pointers; search cost = height + 1; insertion (copy up) and deletion (redistribute or merge); index classification (primary/secondary, clustered/unclustered, dense/sparse).

W4

T4 · Hash-based and bitmap indexing

Ramakrishnan & Gehrke (hash indexing)

Static hashing and overflow chains; extendible hashing with a directory of size 2^(global depth), local versus global depth, and the directory-doubling rule; bitmap indexes for low-cardinality attributes; bitmap versus Bloom filter; the EGSR index-choice heuristic and covering indexes.

W5

T5 · Query processing, relational algebra and external sorting

Ramakrishnan & Gehrke (external sorting, query evaluation)

SQL to relational algebra to physical plan; the six basic RA operators; access paths and CNF matching; materialisation versus pipelining; external merge sort with pass-0 runs = ceil(N/B), merging B-1 runs at a time, passes = 1 + ceil(log_(B-1) ceil(N/B)) and total I/O = 2N times passes.

W6

T6 · Query execution and join algorithms

Ramakrishnan & Gehrke (join algorithms)

The exam gold. Join types (theta, equi, natural, outer); simple, page and block nested-loops join; index nested-loops join; sort-merge join; grace/hash join at 3*(b_R+b_S); the canonical Student-Enrolled cost chain; choosing the smaller relation as outer.

W7

T7 · Query optimisation and cost estimation

Ramakrishnan & Gehrke (query optimisation)

Cost-based optimisation; RA equivalence rules (cascade, commute, push selections and projections down); selectivity = 1/V on V distinct values; left-deep join trees and System-R dynamic programming; statistics and the uniform-distribution assumption; the Car/Trip selectivity chain. Assignment 1 due.

W8

T8 · Distributed data management: partitioning and replication

Lecture notes on distributed data management

Parallel architectures (shared-memory, shared-disk, shared-nothing); speed-up versus scale-up; the CAP theorem; horizontal (sharding) versus vertical partitioning; round-robin, hash and range placement; co-partitioning on the join key; synchronous versus asynchronous replication; cluster MTBF scaling.

Lower exam weight
W9

T9 · Distributed query and data processing; MapReduce

Lecture notes; Apache Spark / MapReduce material

Distributed join approaches (local reference, broadcast, distributed-shuffle, fragment-and-replicate); distributed parallel hash join; the MapReduce model (map, shuffle and sort, reduce) and WordCount; data locality and shared-nothing execution; the 9-node R-join-S cost comparison.

Lower exam weight
W10

T10 · Distributed dataflow platforms: Spark and Flink

Apache Spark / Databricks documentation

RDD (immutable, partitioned, in-memory, fault-tolerant via lineage); lazy transformations versus actions and the DAG; the DataFrame API and .explain(); Spark join strategies (broadcast, shuffle sort-merge, shuffle hash); the Catalyst optimiser and Adaptive Query Execution; HDFS with NameNode and DataNodes, 64 MB blocks and 3x replication.

Lower exam weight
W11

T11 · Data stream management systems

Lecture notes on data stream processing

Stream versus batch processing; Spark Streaming; the pipelined dataflow runtime (Flink) for streams; continuous, unbounded data and the motivation for windowing.

Lower exam weight
W12

T12 · NoSQL stores

Lecture notes on NoSQL stores

NoSQL as non-relational or Not-Only-SQL; the drivers of scalability and schema flexibility; families (key-value, wide-column, document, graph); decoupled lakehouse architecture; MongoDB sharding and asynchronous primary-copy replication. Assignment 2 due Week 13.

Lower exam weight
W13

T13 · Unit review and Assignment 2 due

Revision across all lectures

Synthesis across storage, indexing, query processing, optimisation and the distributed/Spark half; exam-style cost-model revision. Assignment 2 due.

Lower exam weight

How it's assessed

Assessment structure

ComponentWeightFormat & timing
Weekly homework quizzes6%Individual Canvas quizzes on database-engine concepts; once started, a single timed submission with no resubmission. Weekly from Week 2. Closed-knowledge concept tests; together with the exam, about 66% of the unit is individually examined under time pressure.
Weekly participation5%Submit a completed (unmarked) worksheet, or take part in the Week 12 or 13 tutorial peer activity. Mere attendance does not count. Marked as a percentage of the possible weekly participations. Weekly from Week 2.
Practical Assignment 1: Database Programming and Testing12%Groupwork in threes: extend the SimpleDB educational engine in Python by implementing one chosen component such as a buffer replacer, external sort, a join algorithm, or an index structure. Due Week 7.
Practical Assignment 2: Complex Querying and Performance Tuning12%Groupwork in threes: complex querying and performance tuning in SQL and Apache Spark (PostgreSQL versus Databricks/Spark) with query-plan analysis and a report. Due Week 13.
Presentation of DB Concepts5%Groupwork in threes: a roughly 6-minute video explaining a data-processing concept, marked by tutors and peers. Generative AI is allowed for research and design but not to generate the video or audio. Across multiple weeks.
Final Exam60%Paper-based, 2 hours, semi-open book; mostly short answer with possibly some MCQ; does not assess coding. Covers lectures, labs, homework and assignments. Formal exam period. Hurdle task: at least 40% required on this exam.
Weekly homework quizzes6%
Individual Canvas quizzes on database-engine concepts; once started, a single timed submission with no resubmission.
Weekly participation5%
Submit a completed (unmarked) worksheet, or take part in the Week 12 or 13 tutorial peer activity. Mere attendance does not count. Marked as a percentage of the possible weekly participations.
Practical Assignment 1: Database Programming and Testing12%
Groupwork in threes: extend the SimpleDB educational engine in Python by implementing one chosen component such as a buffer replacer, external sort, a join algorithm, or an index structure.
Practical Assignment 2: Complex Querying and Performance Tuning12%
Groupwork in threes: complex querying and performance tuning in SQL and Apache Spark (PostgreSQL versus Databricks/Spark) with query-plan analysis and a report.
Presentation of DB Concepts5%
Groupwork in threes: a roughly 6-minute video explaining a data-processing concept, marked by tutors and peers. Generative AI is allowed for research and design but not to generate the video or audio.
Final Exam60%
Paper-based, 2 hours, semi-open book; mostly short answer with possibly some MCQ; does not assess coding. Covers lectures, labs, homework and assignments.
  • You must achieve at least 40% in the final exam AND an overall final mark of 50 or more. A student who misses either may be given a maximum final mark of no more than 45 regardless of their average. The final exam is flagged a hurdle task.
  • Paper-based, 2 hours, semi-open book, mostly short answer with possibly some MCQ, testing hand-worked engine internals and cost models rather than coding.
  • Calculator policy: Not specified in the available course pages; confirm with the unit coordinator.
read this! If you read nothing else

This is an exam-cram unit. With the exams at 60% of the grade and the final exam alone at 60%, your result is overwhelmingly decided by how well you perform under time pressure. Hurdle task: at least 40% required on this exam.

Final exam timing: Formal examination period, Semester 1 2026 (exact date set by the USyd exams timetable). Confirm the exact date and venue on the official exam timetable.

How to actually pass it

A weekly rhythm, two checklists, and the traps to avoid

The unit rewards consistency over cramming, and practice over re-reading. Here is the loop that works, then what to have nailed before each exam.

The weekly loop

During the lecture
Attend or watch the 2-hour Tuesday lecture and note the week's cost formula or mechanic (buffer trace, B+-tree rule, join formula, selectivity).
Same week, by hand
Do the lab worksheet by hand (not just reading the solution) and submit it for the 5% participation while the cost method is fresh.
Same week, timed
Sit the Canvas quiz seriously as timed exam rehearsal for the 6%.
End of each week
Add the week's cost formula plus one worked example to a running master one-pager, and re-do one earlier topic's calculation so the mechanics do not decay.

Before the mid-semester checklist

  • Do each weekly worksheet by hand and submit it for the 5% participation
  • Sit each weekly quiz as timed exam rehearsal (6%)
  • After each lecture add its cost formula and one worked example to a master one-pager
  • Lock in your SimpleDB component for Assignment 1 (Week 7) well before the deadline
  • Drill the single-node mechanics early: B+-tree height and insertion, extendible-hashing splits, external-sort passes, and the storage cost table

Before the final heaviest topics

  • Re-derive every join cost formula from memory and know which join works for equi versus non-equi conditions
  • Drill B+-tree insertion and deletion and height estimation; extendible-hashing splits and the directory-doubling rule; LRU, CLOCK and GCLOCK traces; external-sort passes and total I/O
  • Work a Car/Trip-style optimisation chain: selectivity = 1/V, push selections down, pick the index pair, compare plan costs
  • Practise distributed joins: compute per-node cost for local-reference, broadcast, shuffle-hash and fragment-and-replicate and compare to centralised
  • Confirm the exact semi-open-book permitted materials with the coordinator and prepare your allowed notes accordingly
  • Remember the hurdle: target well above 40% on the exam and 50% overall, since failing either caps you at 45

The mistakes that cost marks

01

Counting on your code to carry you. The 60% exam does not assess your code. Strong assignment marks will not save you if you cannot hand-compute cost models. Skew revision to cost estimation, join I/O counting, indexing maths, external-sort passes and query-optimisation rewriting.

02

Assuming the exam is fully open book. It is described as semi-open book, but the available course pages do not specify which materials are permitted. Do not assume you can bring everything; confirm the allowed materials with the coordinator first.

03

Using the wrong block size in BNLJ. This course's worksheets use M-2 (one input page and one output page reserved), not M-1. Mixing them up changes every block nested-loops answer.

04

Forgetting to put the smaller relation as the outer. In nested-loops joins the outer relation should be the smaller one. Forgetting this gives a much larger and wrong cost.

05

Trusting the published week numbering. The Unit Schedule page's week numbers can disagree with the quiz and worksheet ordering. Follow the lecture and worksheet sequence: sorting, then joins, then optimisation, then distributed.

06

Leaving the cost models until swotvac. They compound. Joins reuse sorting, distributed joins reuse single-node join cost, so they cannot be crammed cold. Build the master one-pager from Week 2.

Teaching team

Who teaches DATA3404

The bios below are factual. The star ratings are not ours: they are impressions from students who have taken the unit, so you can hear from people who sat in the lectures.

Unit coordinator and lecturer

Uwe Roehm

Associate Professor in the School of Computer Science. Research in database systems and data management, scalable data science and big data, cloud computing, machine learning with database systems, data management on multicore hardware, and replication and caching. Organises the School's Database Research Group. Staff profile

Student ratingNo student ratings yet

Teaching team as listed in the unit materials reviewed. AskSia does not rate lecturers; star ratings are submitted by students who have taken DATA3404.

Formula & concept sheet

The vocabulary and formulas you must own

Slotted page
A page layout for variable-length records: a header plus a slot directory of (offset, length) entries growing from one end and records from the other. A record can move within the page without changing its TID.
TID (Tuple ID)
A row identifier of the form (page id, slot number); the slot-directory indirection lets a record move within its page without the TID changing.
Buffer manager
The component that brings disk pages into memory frames, tracks a pin count and dirty bit per frame, and evicts unpinned frames by a replacement policy (writing back dirty pages first).
CLOCK (second chance)
A buffer-replacement policy using a circular frame list and a reference bit: on eviction a set bit is cleared and the hand advances; a clear bit, if unpinned, is evicted.
B+-tree
A dynamic, balanced, multi-level index with entries only in the leaves and sibling pointers between leaves for range scans; search cost is about height + 1.
Clustered index
An index whose data records are stored in the same order as the index entries; at most one per table. It makes range scans cheap because matches sit on consecutive pages.
Extendible hashing
A dynamic hash index using a directory of size 2^(global depth); a bucket overflow splits the bucket and doubles the directory only when the bucket's local depth equals the global depth.
External merge sort
Sorting data larger than memory: a pass-0 sort phase producing ceil(N/B) runs, then merge passes combining B-1 runs at a time.
Selectivity
The fraction of input tuples a predicate keeps; for an equality on an attribute with V distinct values it is estimated as 1/V, assuming a uniform distribution.
Left-deep join tree
A join order in which the right input of every join is a base relation, enabling fully pipelined plans; System-R's dynamic-programming optimiser considers only these.
Shared-nothing
A parallel architecture where each node has its own CPU, memory and disk and communicates only by network; it scales to thousands of nodes and is the practice standard for big data.
CAP theorem
In a distributed data system you can guarantee at most two of Consistency, Availability and Partition tolerance.
RDD (Resilient Distributed Dataset)
Spark's immutable, partitioned, in-memory collection that is fault-tolerant via lineage: a lost partition is rebuilt from its derivation.
Lazy evaluation
In Spark, transformations only record a DAG of operations; nothing executes until an action is called.

Common acronyms: B+-tree search cost = ceil(log_F B) + 1 = index height + 1 · External sort passes = 1 + ceil(log_(B-1) ceil(N/B)); total I/O = 2N * passes · Simple nested-loops join = b_R + |R| * b_S (smaller relation as outer) · Page nested-loops join = b_R + b_R * b_S · Block nested-loops join = b_R + ceil(b_R/(M-2)) * b_S · Index nested-loops join = b_R + |R| * c, with c = 1 + h + matches · Sort-merge join = b_R + b_S (sorted), else sort(R) + sort(S) + b_R + b_S · Grace/hash join = 3 * (b_R + b_S) · Equality selectivity on V distinct values = 1/V; matches = |R|/V · Cluster MTBF for k identical nodes = MTBF_1 / k.

What students say

What students actually say about DATA3404

Recurring themes from student reviews, paraphrased in our own words.

On difficulty
  • No first-hand student difficulty rating is currently available for this unit, so the difficulty read is editorial: a 60% semi-open-book final, a double hurdle, and hand-worked cost models place it in the harder quant/CS cluster
Practise these topics with Sia →
How students revise
  • A substantial library of student-shared materials exists (lecture notes, topic summaries, tutorial and worksheet solutions, and final-exam study notes), pointing to heavy demand for revision support in this unit
  • Organised, cost-model-focused revision notes are what students reach for, with at least one widely viewed compiled note set described by its author as having achieved a high distinction
Make your own notes and flashcards →
Before the exams
  • The most-shared materials cluster around the exam-heavy mechanics: buffer management and page structures, the storage layer, query processing, and distributed data management
Get instant walkthroughs →

Recurring student opinions, paraphrased and aggregated, not official course information.

Set texts

The prescribed reading

The syllabus references map straight onto these.

Core reference paper

Architecture of a Database System

Hellerstein, J. M., Stonebraker, M. & Hamilton, J. (2007), Foundations and Trends in Databases. Publisher page

Recommended

Database Management Systems (3rd ed.)

Ramakrishnan, R. & Gehrke, J. (2003), McGraw-Hill.

Recommended

Database Internals

Petrov, A. (2019), O'Reilly.

Recommended

Database Systems: The Complete Book (2nd ed.)

Garcia-Molina, H., Ullman, J. D. & Widom, J.

Where it fits

Prerequisites, related units & why it matters

Formal prerequisite: DATA2001 or DATA2901 or ISYS2120 or INFO2120 or INFO2820. The unit assumes you already know SQL, schema design, relational algebra, and can program in Python or Java. It is prohibited against INFO3504 and INFO3404.

Why it matters beyond the grade. The cost-model fluency this unit builds (join I/O, index access, external sort, Spark execution plans) is exactly what data-engineering and backend-performance roles screen for. It turns a SQL user into someone who understands why a query is slow and how to engineer data systems at scale.

FAQ

Frequently asked questions

How is DATA3404 assessed?

A 60% paper-based, 2-hour semi-open-book final exam, plus two 12% group practical assignments (a SimpleDB engine extension due Week 7 and SQL plus Apache Spark performance tuning due Week 13), a 5% group video presentation, 6% weekly Canvas quizzes, and 5% weekly participation.

What is the hurdle to pass DATA3404?

You must score at least 40% on the final exam AND at least 50% overall. If you miss either, you may be capped at a maximum final mark of 45 regardless of your average. The exam is officially flagged a hurdle task.

Is the DATA3404 exam open book?

It is described as semi-open book and is paper-based, 2 hours, mostly short answer. However, the available course pages do not specify which materials are permitted, so confirm the exact allowed materials with the unit coordinator before relying on bringing notes.

Does the DATA3404 exam test coding?

No. The final exam explicitly does not assess coding. Coding is assessed in the two group practical assignments. The exam tests hand-worked engine internals: cost models, indexing, sorting and join calculations, and query-optimisation reasoning.

What are the most important things to drill for the DATA3404 exam?

The join I/O cost formulas (nested-loops, block nested-loops, index nested-loops, sort-merge and hash), external-merge-sort pass counting, B+-tree mechanics, extendible-hashing splits and directory doubling, buffer-replacement traces (LRU, CLOCK and GCLOCK), and selectivity-based query-optimisation costing.

What background do I need for DATA3404?

The formal prerequisite is DATA2001 or DATA2901 or ISYS2120 or INFO2120 or INFO2820. The unit assumes you already know SQL, schema design and relational algebra, and can program in Python or Java. It is prohibited against INFO3504 and INFO3404.

Is there a required textbook for DATA3404?

There is no single required textbook. The course leans on the paper Architecture of a Database System (Hellerstein, Stonebraker and Hamilton, 2007) and recommends references including Ramakrishnan and Gehrke's Database Management Systems and Petrov's Database Internals.

Study DATA3404 with Sia

Work through storage engines, tree-based indexing (b+-trees), hash-based and the rest of the unit with a tutor that knows it and quizzes you on the topics the assessments weight most heavily.

Start studying with Sia