COMP9120 · Database Management Systems
Relational Algebra
Relational algebra is the procedural query language of the relational model, and in COMP9120 Database Management Systems at the University of Sydney it is the algebra every SQL query is compiled into before it runs. Introduced in the Week 4 lecture on Relational Algebra & SQL, this chapter covers the six basic operators, the derived join and intersection, the set operations and expression equivalence — the exact toolkit the combined SQL & Relational Algebra question on the final paper tests.
What this chapter covers
- 01Name the six basic operators — selection σ, projection π, cross-product ×, union ∪, set-difference −, rename ρ — and know that intersection and join are derived
- 02Write and read selection (rows) and projection (columns), and why σ must sit inside π when it filters on a column you are about to drop
- 03Remember that projection removes duplicate rows, because relational algebra works on sets
- 04Build the join family from the basics: theta join, equi-join and natural join, plus the schema a natural join produces
- 05Apply the three set operations under the union-compatibility rule (same arity and matching domains)
- 06Use set-difference for the 'none / never' pattern and get its direction right (R − S is not S − R)
- 07Translate a SELECT-FROM-WHERE query into relational algebra (SELECT↔π, FROM↔×, WHERE↔σ)
- 08Rewrite an expression into a cheaper but equivalent one by pushing selections and projections down the tree
Artists who never released on the Indie label (set-difference)
- +1Get every artist name as a single column: πaname(Artist) = {Mia, Leo, Ada, Sam}. This will be the left side of a set-difference, so it must be union-compatible (one column) with the right side.
- +1Isolate the Indie tracks with a selection: σlabel='Indie'(Track) keeps tid 10, 12 and 14 (aids 1, 1 and 4).
- +1Find the artists who DID release on Indie: join those tracks to Artist on the shared aid and project the name — πaname( Artist ⋈ σlabel='Indie'(Track) ) = {Mia, Sam}.
- +1Subtract: πaname(Artist) − πaname( Artist ⋈ σlabel='Indie'(Track) ). Both sides are single aname columns, so the set-difference is valid.
- +2Evaluate: {Mia, Leo, Ada, Sam} − {Mia, Sam} = {Leo, Ada}. Note the direction matters — writing the operands the other way round would compute the wrong set.
Key terms
- Relational algebra
- A procedural query language over relations: a small set of operators, each taking relations and returning a relation, that can be composed into an expression. Every SQL query is internally mapped to an equivalent relational-algebra expression before it is optimised and run.
- Selection (σ)
- σcondition(R) returns the rows of R that satisfy the condition; the schema is unchanged. The condition is a Boolean combination of comparisons (< > ≤ ≥ = ≠) joined by AND (∧) and OR (∨). Think of it as a horizontal cut of the table.
- Projection (π)
- πlist(R) keeps only the listed columns, drops the rest, then eliminates duplicate rows; the result schema is exactly the list. A vertical cut. Because it removes duplicates, it can return fewer rows than the input.
- Cross-product (×)
- R × S pairs every tuple of R with every tuple of S, giving |R|·|S| rows and a schema that concatenates both. Rarely useful alone; it is the raw material a join filters down with a selection.
- Natural join (⋈)
- A derived operator: the equi-join on all commonly-named attributes, followed by projecting away the duplicate columns so each shared attribute appears once. If the two relations share no attribute names it degenerates to a cross-product.
- Union-compatibility
- The requirement that two relations have the same arity (number of columns) and matching domains column-by-column before union (∪), intersection (∩) or set-difference (−) can be applied to them.
- Set-difference (−)
- R − S returns the rows in R that are not in S; the inputs must be union-compatible. It is directional: R − S is generally not S − R. It is the standard way to express 'find the X with no / never Y'.
- Expression equivalence
- Two relational-algebra expressions are equivalent if they return the same relation on every database instance. Equivalent expressions can cost very differently, which is why optimisers rewrite queries — typically pushing selections and projections down to shrink the joins.
Relational Algebra FAQ
Can AI help me with relational algebra in COMP9120?
Yes, as a study aid. Sia is an AI tutor that explains relational algebra step by step: it can walk you through why a selection sits inside a projection, how a natural join builds its schema, or how to turn a 'never' query into a set-difference, using its own practice examples. Use it to understand the method and check your reasoning — it will not hand you answers to graded assignments, quizzes or the exam, and it cannot promise a particular mark. The point is to learn the operators well enough to write them yourself under exam conditions.
Is relational algebra actually on the final exam?
Yes. In the example papers relational algebra appears together with SQL in one applied, problem-based question (worth roughly 24 marks in those papers). You may be asked to write relational-algebra expressions for English queries, read an expression back into English, translate a SELECT-FROM-WHERE query into algebra, or give the schema of a result. Confirm the exact structure and weighting for your offering on Canvas.
What is the difference between relational algebra and SQL?
Relational algebra is procedural — it says how to compute the answer, operator by operator. SQL is declarative — it says what you want and lets the engine decide how. The two are closely linked: the database maps each SQL query to an equivalent relational-algebra expression, then rewrites that expression into a cheaper equivalent before running it, which is exactly why the equivalence rules in this chapter matter.
Studying with AI? Sia — free AI data science tutor works through COMP9120 step by step.
Exam move
Treat relational algebra as something you write, not something you memorise: the exam gives you a small schema and makes you compose expressions by hand, so practise both directions — English to algebra and algebra back to English — and translate a few SELECT-FROM-WHERE queries into π/σ/join form. Drill the two rules that lose the most marks: apply a selection before you project away the column it needs, and use a directional set-difference (not a not-equal selection) for 'never / none' queries, projecting both sides to the same column so they are union-compatible. Finish by rehearsing one push-down rewrite so you can show an equivalent, cheaper expression. Because marks follow method, write each operator as a separate step. The final exam is 50% of the unit, paper-based, 2 hours and closed book; across roughly 131 marks that is a little under a minute per mark, so budget about 22 minutes for the combined SQL & relational-algebra question. Remember the hurdle: you need at least 40% in the final exam and at least 50% overall, or the final mark is capped at no more than 45.