BCNF & 3NF
Contents:
1. Relations
2. Functional dependencies
3. Keys
4. BCNF
5. Decomposing
6. 3NF
7. Why decomposing into 3NF doesn't work
8. FD basis
9. Synthesis
1. Relations
Relational databases are so called because they are based on the mathematical notion of a relation, which is simply a set of tuples. One of the more prominent occurrences of relations is in the equivalence relation.
There are two terminologies used for relational databases.
Programming Mathematical table relation metadata schema row tuple column attribute
Mathematically, we describe relations using a schema. For example, you can imagine a relation named orders described as follows.
orders(customer_name, shipping_addr, product_name, quantity, time_ordered, time_shipped)
We'll soon need to make some assumptions about how this relation works.
- Each customer has a distinct name, and each product has a disticnt name.
- An order may consist of multiple products; each product within the order is listed as an additional tuple within the relation. An order does not list the same product multiple times — instead, the quantity attribute is used to represent this.
- No customer places two orders at the same time.
- All items in an order are to be shipped to the same address.
2. Functional dependencies
Definition: A1 A2 … An → B1 B2 … Bm is an FD (a functional dependency) for a relation R if whenever two tuples in R agree on A1, A2, … An, they must necessarily agree on B1, B2, … Bm.
Example: For the orders relation with assumptions as listed above, two FDs are:
customer_name product_name time_ordered → quantity time_shipped
Since the left side is allowed more information than is necessary, another FD is
R is said to satisfy an FD, and A1, A2, … An are said to determine B1, B2, … Bm.
An FD is trivial if { B1, B2, … Bm } is a subset of { A1, A2, … An }. For example, product_name quantity → quantity is a trivial FD.
3. Keys
The set { A1, A2, … An } is a superkey for a relation R if it functional determines all attributes for a relation. In other words, any two tuples of R must necessarily agree on A1, A2, … An.
{ A1, A2,
… An } is a
key for a relation R if it is a
minimal superkey
: That is, the set is a superkey but no
proper subset is a superkey.
Example: For the orders relation, { customer_name, product_name, time_ordered } is a key.
4. BCNF
A relation R is in Boyce-Codd Normal Form (BCNF) if for every nontrivial FD A1 A2 … An → B1 B2 … Bm satisfied by R, the set { A1, A2, … An } is a superkey for R.
Our definition of orders is not in BCNF, since it satisfies the FD customer_name time_ordered → shipping_addr is an FD, but it is nontrivial and the left side { customer_name, time_ordered } is not a superkey for orders.
5. Decomposing
It's desirable for each database to have all relations in BCNF. So, given a relation not in BCNF, we would normally attempt to decompose it into two or more BCNF relations.
This is simple: We find a violating FD (i.e., a nontrivial FD for which the left side is not a superkey), expand the right side to include as many attributes as possible (called computing the closure of the left side), and then split the relation into:
- one relation containing all attributes of this expanded FD, and
- another relation containing all the attributes of the FD's left side and all attributes that were missing from the FD.
If any of the resulting relations aren't in BCNF, we recurse on it (them).
Take orders as an example. We already decided that customer_name time_ordered → shipping_addr is an FD that causes orders to violate BCNF. We observe that the right side can be expanded by adding customer_name and time_ordered to it; but the remaining attributes (product_name, quantity, time_shipped) don't follow — assuming that an order can contain multiple products, and that these products might be shipped out at varying times. Thus, we'll split into two relations:
- orders(customer_name, time_ordered, shipping_addr)
- order_contents(customer_name, time_ordered, product_name, quantity, time_shipped)
Both are in BCNF given our presumptions, so there's no more work to do; sometimes, though, one or both will need to be further decomposed.
6. Third normal form
Decomposing leads to a potential problem: In some
situations, we can lose
FDs through splitting them
apart. Let us suppose our orders relation also had a
restriction that no product can be shipped out to two
locations simultaneously; this would lead to the new FD
The FD customer_name time_ordered → shipping_addr still holds, so we might attempt to decompose the relation as we did. But after splitting the relation as in our example, we would have no way to express our new FD in the resulting relation. This seems problematic — we have split closely related information apart — and so this leads to a slightly looser normal form called third normal form.
To define third normal form, we must first define an attribute as prime if it is part of some key for its relation. Given this definition, a relation is in 3NF (third normal form) if every FD satisfies one of these three conditions:
- It is trivial.
- The left side is a superkey.
- Every attribute on the right side either appears on that FD's left side or is prime.
Note that BCNF has stricter restrictions on what FDs it allows, so any relation that is in BCNF is also in 3NF. In practice, well-designed relations are almost always in BCNF; but occassionally a non-BCNF relation is still well-designed (and is in 3NF).
7. Why decomposing into 3NF doesn't work
Decomposing a relation into 3NF leads to potential problems. Suppose the following schema.
students(first_name, hall, sex)
Suppose that all first names are either distinctively male or female, and that our campus had only single-sex halls. Then our FDs would be:
first_name → sex
hall → sex
Based on this, we'd conclude that the only key for this relation is {first_name, hall}. Note that both FDs violate 3NF.
Decomposition would propose that we would divide this relation into two relations based on one of the violating FDs. Suppose we chose to decompose based on hall → sex.
halls(hall, sex)
students(first_name, hall)
Notice that we have again split up an existing FD (first_name → hall). This is exactly what was wrong with BCNF. What have we gained?
8. FD Basis
There is another algorithm for breaking a non-3NF relation into 3NF relations. We'll get to that, but first we need to cover a new topic: the basis of a set of FDs. (For any who've studied Linear Algebra, this usage of the word basis is very similar to its usage in that field.)
A basis for a set F of FDs is a set G of FDs in which every FD of F is implied by the FDs in G. That is, for each F FD A → B, if we take the closure of A using G, we'd get a set of attributes that includes all attributes of B.
For example: In our students example above, suppose we have the following FDs.
a. first_name → sex
b. hall → sex
c. first_name → hall sex
Then the following constitutes a basis.
b. hall → sex
c. first_name → hall sex
We don't need a. in our basis, since it's implied by c..
Given a set F of FDs, a minimal basis is a set G of FDs for which:
- Every FD in G has just one attribute on its right side.
- G is a basis for F.
- Removing any FD from G results in a set of FDs that is not a basis for F.
- For any FD from G, removing an attribute from its left side results in a set of FDs that is not a basis for F.
Our basis example from above is not minimal, since one of the FDs has multiple attributes ot its right side. We can split it into two pieces.
b. hall → sex
d. first_name → sex
e. first_name → hall
However, this is still not minimal because we can remove d. and still have a basis for our original set of FDs. The set {hall → sex, first_name → hall} does constitute a minimal basis.
9. Synthesis
Here is our algorithm for splitting a non-3NF relation into 3NF relations while not splitting apart any FDs.
- Find a minimal basis for the set of known FDs.
- For each FD in the minimal basis, create a relation consisting of the attributes in that FD.
- If none of the relations have a set of attributes that forms a superkey for the non-3NF relation, select a key for the non-3NF relation, and create a relation with the attributes from that key.
- If any relation includes only a subset of attributes found in another relation, delete that smaller relation.
For our students example with FDs a., b., and c., we'd get two relations from Step 2:
halls(hall, sex)
names(first_name, hall)
Step 3 would delete no relations. Step 4 would add no relations, since {first_name,hall} is a superkey.
What would happen if students had only the FDs a. and b.? We'd get three relations.
halls(hall, sex)
names(first_name, sex)
students(first_name, hall)
I leave it for you to figure out how this came out of the synthesis algorithm.

