COMP9120 · Database Management Systems
Schema Refinement & Normalisation
Schema refinement is the COMP9120 topic where a workable relational design becomes a good one: the University of Sydney's Database Management Systems unit teaches you to detect redundancy formally through functional dependencies, grade a relation by normal form (1NF through BCNF and 4NF), and split a bad relation into good ones without losing information. It sits between ER→relational mapping and physical design, and on the paper it is a self-contained method question where every closure and every split earns marks.
What this chapter covers
- 01Why redundancy is the enemy: update, insertion and deletion anomalies
- 02Functional dependencies X→Y, and why an instance can disprove but never prove one
- 03Armstrong's axioms (reflexivity, augmentation, transitivity) plus the derived union and decomposition rules
- 04Attribute closure X+ as the one algorithm that tests keys and derives FDs
- 05Superkeys, candidate keys (minimal superkeys), and prime vs non-prime attributes
- 06The nested normal forms 1NF ⊇ 2NF ⊇ 3NF ⊇ BCNF ⊇ 4NF and how to name the highest one
- 07The 3NF-vs-BCNF gap and multivalued dependencies with 4NF
- 08Lossless-join and dependency-preserving decomposition, and the BCNF decomposition algorithm
Candidate key by closure, highest normal form, and a BCNF split
- +1Forced key attributes: StudentID and UnitCode never appear on any right-hand side, so both must be in every candidate key.
- +2Closure test: (StudentID, UnitCode)+ = {StudentID, UnitCode} → +Tutor → +UnitName (UnitCode→UnitName) → +TutorRoom (Tutor→TutorRoom) = all 5 attributes, so it is a superkey.
- +1Minimality: StudentID+ = {StudentID} and UnitCode+ = {UnitCode, UnitName} — neither half reaches everything, so the key is minimal. Candidate key = (StudentID, UnitCode).
- +1Highest normal form: the key is composite and UnitCode (a proper part) determines non-prime UnitName — a partial dependency — so the relation is 1NF only (not even 2NF).
- +1Decompose on UnitCode→UnitName: pull out Unit(UnitCode, UnitName); the remainder keeps UnitCode as the link.
- +2Decompose on Tutor→TutorRoom: pull out Tutor(Tutor, TutorRoom); what remains is Assigned(StudentID, UnitCode, Tutor) with key (StudentID, UnitCode). Every determinant is now a key of its table, so all relations are in BCNF.
Key terms
- Functional dependency (X→Y)
- A rule that whenever two tuples agree on all attributes in X they must also agree on all attributes in Y; it is decided by the meaning of the data, and an instance can only disprove one, never prove it.
- Armstrong's axioms
- A sound and complete rule set for deriving FDs: reflexivity (Y⊆X gives X→Y), augmentation (X→Y gives XZ→YZ) and transitivity (X→Y, Y→Z gives X→Z), plus the derived union and decomposition rules.
- Attribute closure X+
- Every attribute that X functionally determines under a set of FDs; X is a superkey exactly when X+ is all attributes, and X→Y holds exactly when Y is a subset of X+.
- Candidate key
- A minimal superkey — an attribute set whose closure is all attributes and from which removing any attribute stops it being a superkey. A relation may have several; one is chosen as the primary key.
- Prime vs non-prime attribute
- A prime attribute is a member of some candidate key; a non-prime attribute belongs to no candidate key. These labels drive the 2NF and 3NF tests.
- BCNF
- Boyce-Codd Normal Form: for every non-trivial FD X→Y the determinant X must be a superkey — 'every arrow comes out of a key'. It is stricter than 3NF, which also allows an FD when its right-hand side is prime.
- Lossless-join decomposition
- A split of R into R1 and R2 whose natural join returns exactly R with no spurious tuples; guaranteed when the shared attributes R1∩R2 form a key of at least one component.
- Dependency-preserving decomposition
- A split where the union of the FDs projected onto each component has the same closure as the original set, so every constraint can be checked on a single table without a join. It cannot always be achieved together with BCNF.
Schema Refinement & Normalisation FAQ
What is the difference between 3NF and BCNF?
Both require that for every non-trivial functional dependency X→Y the determinant X should ideally be a superkey. BCNF demands exactly that with no exceptions. 3NF is slightly weaker: it also accepts an FD whose determinant is not a superkey as long as the right-hand side Y is a prime attribute (part of some candidate key). That loophole means a 3NF relation can still hold a little redundancy that BCNF removes. The practical catch is that forcing BCNF can sometimes cost dependency preservation, whereas a lossless, dependency-preserving 3NF decomposition is always achievable, so 3NF is occasionally the right stopping point.
How do I find a candidate key quickly?
Start with the shortcut: any attribute that never appears on the right-hand side of any FD must be in every candidate key, because nothing can determine it. Seed your key with those attributes, compute their attribute closure, and add attributes until the closure reaches every attribute (a superkey). Then check minimality by removing one attribute at a time — if the closure still reaches everything, that attribute was not needed. A minimal superkey is a candidate key; repeat with different seeds to find any others.
Can AI help me with schema refinement and normalisation in COMP9120?
Yes, as a study aid. Sia is an AI tutor that explains concepts step by step — it can walk you through an attribute-closure calculation, show why a partial dependency breaks 2NF, or trace a BCNF decomposition on a practice relation so you see the method. It does not sit your assessments or hand you assignment or exam answers, and it cannot promise any grade; the goal is to help you understand the technique so you can compute closures and decompositions yourself under exam conditions. Always follow the University of Sydney's academic-integrity rules on permitted tools.
Studying with AI? Sia — free AI data science tutor works through COMP9120 step by step.
Exam move
Make attribute closure your reflex, because every other question reduces to it. Practise a fixed order of attack on each relation: first mark the attributes that appear on no right-hand side (they are in every candidate key), then compute the closure of your candidate to confirm it reaches all attributes and is minimal, then name the highest normal form by testing each FD's left-hand side for superkey-ness (and its right-hand side for prime-ness), and finally decompose by peeling off one violating FD at a time as X∪Y and R−Y, checking that the shared attributes are a key of one side so the join is lossless. Since the normalisation question is marked step by step, always write the closure out in full and name the exact FD each split removes — a correct method with a small slip still earns most of the marks. Confirm the exact mark split and exam timetable on Canvas.