University of Sydney · FACULTY OF COMPUTER SCIENCE

COMP9120 · Database Management Systems

Q: Why does WHERE price < 900 quietly drop rows whose price is NULL?

Because a comparison with NULL evaluates to unknown, not true or false, and WHERE keeps a row only when its condition is true — it treats unknown as false. So a NULL price makes price = 900 still drops NULL prices, since both halves are unknown; you must add OR price IS NULL explicitly to include them.

- one subject, every graph, every model, every mark

Computer Science14 Chapters8-page Bible

Our own words - no uploaded lecturer files

Updated for this semester

Chapter 6 of 12 · COMP9120

Advanced SQL

Advanced SQL is the Week 5 topic of COMP9120 Database Management Systems at the University of Sydney, extending the basic SELECT-FROM-WHERE surface into exam-grade querying. It covers nested queries and set comparison (IN, EXISTS, ANY/SOME, ALL), aggregation with GROUP BY and HAVING, views as reusable virtual tables, and the sign-critical corner of the whole unit — NULL and three-valued logic. These are the tools the in-class quiz and the paper-based final exam ask you to write and hand-evaluate on a small table instance.

In this chapter

What this chapter covers

01Write nested (sub-)queries and set-comparison predicates: v [NOT] IN R, [NOT] EXISTS R, v op ANY/SOME R, v op ALL R
02Express the ‘highest / cheapest’ idiom with >= ALL and <= ALL, and know both keep tied rows
03Tell correlated sub-queries (reference an outer column, re-evaluated per row) from uncorrelated ones
04Summarise rows with COUNT, SUM, AVG, MAX and MIN, and know COUNT(*) counts rows while COUNT(col) skips NULLs
05Partition rows with GROUP BY and filter groups with HAVING, keeping WHERE (rows) and HAVING (groups) in their correct roles
06Apply the true clause evaluation order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
07Define a view as a virtual table that stores no data and is re-evaluated against the base tables on every use
08Reason with NULL and three-valued logic: comparisons give unknown, WHERE treats unknown as false, test with IS NULL
09Know how aggregates treat NULL (all except COUNT(*) ignore it) and why AVG can divide by fewer rows than COUNT(*)
10Avoid the recurring mark-losers: NOT IN with a NULL returning zero rows, bare columns outside GROUP BY, and = NULL

Worked example · free

Grouped aggregation with a hidden NULL

Q [6 marks]. Given Listing(id, suburb, bedrooms, price) = {(1, Glebe, 2, 800), (2, Glebe, 3, 1000), (3, Glebe, 2, NULL), (4, Newtown, 1, 600), (5, Newtown, 2, 700), (6, Camperdown, 3, 900)}, evaluate: SELECT suburb, COUNT(*) AS n, AVG(price) AS avg_price FROM Listing WHERE bedrooms >= 2 GROUP BY suburb HAVING COUNT(*) >= 2; Give the exact result relation. Prices are in dollars per week.

+1WHERE runs first, before grouping. Keep rows with bedrooms >= 2: rows 1, 2, 3, 5, 6. Row 4 (Newtown, 1 bedroom) is dropped, leaving 5 rows.
+1GROUP BY suburb partitions the 5 surviving rows: Glebe = {1, 2, 3}, Newtown = {5}, Camperdown = {6}.
+1HAVING COUNT(*) >= 2 filters the groups: Glebe has 3 rows (keep), Newtown has 1 (drop), Camperdown has 1 (drop). Only Glebe survives.
+1COUNT(*) for Glebe counts all 3 rows, including row 3 whose price is NULL, because COUNT(*) counts rows and not values, so n = 3.
+1AVG(price) ignores the NULL: it averages only {800, 1000} and divides by 2, not 3: (800 + 1000) / 2 = 900.
+1Assemble the result relation: a single row (Glebe, 3, 900).

One row: (Glebe, 3, 900). The subtlety is that COUNT(*) = 3 but AVG divides by 2, because AVG skips the NULL price while COUNT(*) still counts its row. Dividing the sum by 3 would wrongly give 600.

Sia tip — Whenever a question plants a NULL in an aggregated column, separate the two counts: COUNT(*) counts rows, but SUM/AVG/MIN/MAX only see non-NULL values. To average treating a missing price as 0 you would need AVG(COALESCE(price, 0)) — a different question.

Glossary

Key terms

Sub-query (nested query): A SELECT written inside another query, usually in the WHERE clause, that returns a single value or a set the outer query compares against. It is uncorrelated if it can be computed once on its own, and correlated if it references a column from the outer query and must be re-evaluated for each outer row.
ALL / ANY (SOME): Set-comparison quantifiers. v op ALL R is true when v op r holds for every element r of R; v op ANY R (equivalently SOME) is true when it holds for at least one. The ‘highest’ idiom is salary >= ALL (...), the ‘cheapest’ idiom is price <= ALL (...); both keep tied rows.
EXISTS: A predicate that is true if and only if its sub-query returns at least one row; it tests emptiness, not the row contents. NOT EXISTS is NULL-safe, which makes it the preferred alternative to NOT IN when the sub-query may contain a NULL.
GROUP BY: Partitions the rows produced by FROM/WHERE into groups sharing the same grouping-list values, then applies each aggregate within a group to yield one summary row per group. Every column in SELECT or HAVING must be either an aggregate or listed in GROUP BY.
HAVING vs WHERE: WHERE filters individual rows before grouping and cannot mention an aggregate; HAVING filters whole groups after aggregating and is the only place a group-level condition such as COUNT(*) >= 2 can go. A query may use both.
Aggregate functions: COUNT, SUM, AVG, MAX and MIN collapse many rows into one value. All except COUNT take one attribute; COUNT(*) counts rows. Every aggregate except COUNT(*) ignores NULLs, and returns NULL (or 0 for COUNT) when all inputs are NULL. Aggregate calls cannot be nested.
View: A virtual table defined by a query. It stores no data of its own; the DBMS re-evaluates its defining query against the current base tables each time the view is used, giving abstraction, reuse and security. A later query against the view always sees the latest base rows.
NULL and three-valued logic: NULL means ‘unknown or not applicable’, not zero or blank. Any arithmetic with NULL yields NULL and any comparison with NULL yields unknown, so SQL has three truth values (true, unknown, false) with AND = min, OR = max, NOT = 1 - B. Test for NULL with IS NULL / IS NOT NULL, never = NULL.

FAQ

Advanced SQL FAQ

Why does WHERE price < 900 quietly drop rows whose price is NULL?

Because a comparison with NULL evaluates to unknown, not true or false, and WHERE keeps a row only when its condition is true — it treats unknown as false. So a NULL price makes price < 900 unknown and the row is discarded. Even the ‘covers everything’ test price < 900 OR price >= 900 still drops NULL prices, since both halves are unknown; you must add OR price IS NULL explicitly to include them.

When do I need HAVING instead of WHERE?

Use WHERE for conditions on individual rows and HAVING for conditions on whole groups. Because grouping happens after WHERE, an aggregate such as COUNT(*) >= 2 does not exist yet at WHERE time, so a group-level condition must go in HAVING. A common pattern uses both: WHERE first removes unwanted rows, then GROUP BY forms the groups, then HAVING removes unwanted groups — matching the evaluation order FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY.

Can AI help me with Advanced SQL in COMP9120?

Yes, for understanding. Sia is an AI tutor that explains the method step by step — it can walk you through why NOT IN with a NULL returns no rows, how a correlated sub-query finds the top earner per department, or why AVG divides by fewer rows than COUNT(*), generate fresh practice on the same pattern, and check your own working line by line. It never hands over an answer key and never completes your COMP9120 assignment or exam for you, and it cannot promise a mark or a pass; the aim is to help you learn to write and hand-evaluate queries yourself, which is exactly what the paper-based exam tests. Always follow the University of Sydney's academic-integrity rules on permitted assistance.

Studying with AI? Sia — free AI data science tutor works through COMP9120 step by step.

Study strategy

Exam move

Advanced SQL is examined by writing queries and hand-evaluating them on a small instance, so practise both directions rather than memorising syntax. Drill the four sub-query forms (IN, EXISTS, ANY/SOME, ALL) alongside their aggregate equivalents until the ‘highest / cheapest’ idiom with >= ALL and <= ALL is automatic, and always trace a query in its true clause order — FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY — so you never put a group condition in WHERE. Treat NULL as the trap it is: rehearse that comparisons give unknown, that WHERE treats unknown as false, that NOT IN with a NULL returns nothing, and that every aggregate except COUNT(*) ignores NULLs. Because SQL is examined in both the in-class quiz (24%) and the final exam (50%, two hours, closed book), and evaluate-on-data sub-parts carry only a few marks each, budget well under a minute per mark, write the query skeleton first, and move on if a sub-part stalls so the method marks are banked. Remember the unit-wide hurdle: you must score at least 40% in the final exam and at least 50% overall, or the final mark is capped at 45.

Keep going — explore the course

A+Everything unlocked

Unlocks this Bible + all 25 of your University of Sydney subjects - and 1,000+ Bibles across every Australian university.

Sia - your COMP9120 tutor, unlimited, worked the way the exam marks it

The full 8-page Bible + practice bank with worked solutions

Chrome extension - sync your LMS so Sia knows your deadlines

Bilingual EN / Chinese on every Bible and every Sia answer

$25/ month

30-day money-back · cancel in one tap · how it works