COMP9120 · Database Management Systems
SQL Fundamentals
SQL Fundamentals is the core query chapter of COMP9120 Database Management Systems at the University of Sydney, introduced in the Week 4 lecture on Relational Algebra & SQL. It covers the SELECT-FROM-WHERE block and how it maps to relational algebra, DISTINCT and ORDER BY, the full join family, and the set operators — the exact query surface the in-class quiz and the final exam ask you to write and evaluate by hand.
What this chapter covers
- 01Write a SELECT-FROM-WHERE query and map it to relational algebra (SELECT↔projection, FROM↔cross-product, WHERE↔selection)
- 02Apply the true clause evaluation order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY
- 03Understand why a SQL result is a bag and use DISTINCT to turn it back into a set
- 04Sort output with ORDER BY (ASC default, DESC, multiple keys) and know it has no relational-algebra equivalent
- 05Choose between implicit, INNER, NATURAL, USING and OUTER joins, and know when NULL padding matters
- 06Use table aliases for self-joins and to disambiguate columns across tables
- 07Combine union-compatible queries with UNION, INTERSECT and EXCEPT
- 08Apply the ALL multiplicity rules: UNION ALL = m+n, INTERSECT ALL = min(m,n), EXCEPT ALL = max(0, m-n)
- 09Match string constants correctly (single-quoted, case-sensitive) and pattern-match with LIKE (% and _)
- 10Spot the common mark-losing traps: missing DISTINCT, join with no ON, inner-vs-outer, swapped min/max
Distinct home countries of enrolled students, alphabetical
- +1Write the SFW skeleton: SELECT DISTINCT s.country FROM Student s JOIN Enrolled e ON s.sid = e.sid ORDER BY s.country ASC;
- +1FROM/join: pair each Student with its matching Enrolled rows on sid. The 5 enrolment rows give 5 joined rows; student 101 matches twice, so (Mei, AUS) appears twice: the country bag is {AUS, AUS, IND, AUS, BRA}.
- +1SELECT country before DISTINCT is a bag with AUS x3, IND x1, BRA x1; DISTINCT collapses it to the set {AUS, IND, BRA}.
- +1ORDER BY country ASC sorts alphabetically, giving the final ordered result.
Key terms
- SELECT-FROM-WHERE (SFW)
- The basic SQL query block. It is equivalent to the relational-algebra expression projection(selection(cross-product)): SELECT is projection, WHERE is selection, FROM is the cross-product (or join).
- DISTINCT
- A keyword after SELECT that removes duplicate output rows, applied to the whole select-list together. It converts the bag a SQL query returns into a set, mirroring the duplicate-elimination that projection does in relational algebra.
- ORDER BY
- Sorts the result rows; evaluated last, so it may name a SELECT alias. ASC (ascending) is the default; DESC reverses; multiple keys sort left-to-right. It has no relational-algebra equivalent because relations are unordered.
- Inner join
- A join (written implicitly with a comma in FROM plus a WHERE predicate, or explicitly with JOIN ... ON) that keeps only rows with a matching partner. Unmatched rows are dropped.
- Natural join
- An equi-join on all columns that share a name across the two tables, keeping one copy of each shared column. JOIN ... USING(cols) is a safer explicit variant that lists exactly which shared columns to match on.
- Outer join
- LEFT, RIGHT or FULL OUTER JOIN keeps unmatched rows from one or both sides and pads the missing columns with NULL. Use it when the answer must include rows that have no partner (e.g. students with no enrolments).
- Set operators
- UNION, INTERSECT and EXCEPT combine two union-compatible query results and remove duplicates by default. The ALL variants keep multiplicities: UNION ALL = m+n, INTERSECT ALL = min(m,n), EXCEPT ALL = max(0, m-n).
- Union-compatible
- The requirement for set operators: both query results must have the same number of columns with matching domains/types, so their rows can be treated as members of the same relation.
SQL Fundamentals FAQ
Why does my query return duplicate rows when relational algebra wouldn't?
A SQL result is a bag (multiset), not a set: projecting to non-key columns, or joining where one row matches several rows on the other side, produces repeats. Add SELECT DISTINCT to eliminate duplicate output rows and recover set behaviour. Projection in pure relational algebra removes duplicates automatically, which is why the two can differ.
What is the difference between UNION and UNION ALL (and the other ALL operators)?
UNION, INTERSECT and EXCEPT remove duplicates by default. The ALL versions keep multiplicities: for a value appearing m times on the left and n on the right, UNION ALL keeps m+n copies, INTERSECT ALL keeps min(m, n), and EXCEPT ALL keeps max(0, m-n). EXCEPT is also directional, so P EXCEPT Q is not the same as Q EXCEPT P.
Can AI help me with SQL in COMP9120?
Yes, for understanding. Sia is an AI tutor that explains the method step by step — it can walk you through how a SELECT-FROM-WHERE block maps to relational algebra, why a join produces duplicates, or how the INTERSECT ALL rule works, generate fresh practice on the same pattern, and check your own working line by line. It never hands over an answer key and never completes your COMP9120 assignment or exam for you, and it cannot promise a mark or a pass; the goal is to help you learn to write and evaluate queries yourself, which is exactly what the paper-based exam tests. Always follow the University of Sydney's academic-integrity rules on permitted assistance.
Studying with AI? Sia — free AI data science tutor works through COMP9120 step by step.
Exam move
Treat SQL as a language you write, not one you memorise: the exam makes you compose queries by hand and evaluate them on a small instance, so practise both directions. Start by drilling the SFW-to-relational-algebra mapping until it is automatic, then always write the SFW skeleton first and fill in the clauses in their true evaluation order (FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY). Rehearse the join family on a two-table instance where one side fans out, so you feel when DISTINCT and when an OUTER join are needed, and memorise the three ALL multiplicity rules (m+n, min, max(0, m-n)) as a single block. In the practice papers the combined SQL and relational-algebra question carried about 24 marks; with the final exam running two hours, budget well under a minute per mark — roughly 20 minutes — and move on if a sub-part stalls, jotting the skeleton so you can return to it.