Building Candidate Keys
Functional Dependencies
A functional dependency is a functional relationship between sets of attributes which belong to the same table and whose definition may or may not change over time.
    Emp# ---> Salary
is an example of a functional dependency. For each employee (identified by an emp#) there is one and only one salary.

There are a number of different ways to describe a functional dependency; among them

    A ---> B
    A determines B
    B is dependent upon A
    if you know A then you know B
All these are equivalent to one another.

NOTATION: We will use the following notation in these notes. Capital letters A, B, C, etc near the beginning of the alphabet represent single attributes. Capital letters X, Y, Z, etc near the end of the alphabet represent sets of attributes. Capital letters R, S, T, etc represent relations and these same letters written with an underscore, R, S, T, etc, represent corresponding relation schemas.

The following rules can be used in reasoning about functional dependencies. In applying these rules we always assume that all attributes and attribute sets belong to the same schema.

  1. Composition/Decomposition of FDs: X --> A and X --> B is equivalent to X --> A,B.
  2. Identity: X --> X
  3. Transitivity: If A --> B and B --> C then A --> C.
  4. Trivial: X --> is true
  5. Augmentation: If X --> A then X, Y --> A
We say a functional dependency, X --> A is full if there does not exist a proper subset Y of X such that Y --> A.
We want to use the above rules to prove an additional rule we call Augmentation+.

If X --> A and Y --> B then X, Y --> A, B

Given X --> A we can conclude X, Y --> A by Augmentation. (i)

Given Y --> B we can conclude X, Y --> B by Augmentation. (ii)

Hence X, Y --> A, B from (i) and (ii) and Composition

Candidate Keys
Recall that for a table R its schema R consists of all attributes of R. We say X C R is a key to R if X --> R .

The primary use of functional dependencies is in discovering the candidate keys of a given relation. A candidate key is a set of attributes, X of a schema R with the property that

  X --> R \ X     
and no proper subset of X satisfies the same property. A superkey to R and any subset Y of R ( Y C R ) that itself contains a candidate key.

EXAMPLE 1: Suppose we start with the relation R whose schema is R = {A, B, C, D, E} and whose functional dependencies are

  1. A,C --> D
  2. B --> E
  3. D,A --> B
The idea is to build a functional dependency using all attributes from the schema of R, R. I always start with the functional dependency
 A, B, C, D, E --> 
Which says that "the empty set of attributes is known if you know anything else".

Now I can modify this functional dependency by moving E to the right-hand-side. My argument is that B --> E and adding A, C and D to the left-hand-side will still give me a functional dependency by Rules 5 and 6.

   B --> E          (i)       // FD 2
   A,B,C,D --> E    (ii)      // Rule 5
   A,B,C --> D,E    (iii)     // FD 1, FD 2 and  Rule 6
   A,C --> A, D     (iv)      // FD 1, Rule 2 and Rule 6
   A,C --> B        (v)       // (iv), FD 3 and Rule 3
   A,C --> E        (vi)      // (v), FD 2 and Rule 3
   A,C --> B,D,E    (vii)     // FD1, (v), (vi) and Rule 1
Now an observation is in order. It is that if an attribute never appears on the right-hand-side of any given functional dependency then you will never be able to move it to the right-hand-side of any functional dependency you create. Hence if you ever reach a functional dependency of the form
  candidate key --> all other attributes
such an attribute must be on the left-hand-side and so be part of the candidate key. In other words, any attribute which does not appear on the right-hand-side of any given functional dependency belongs to every candidate key.

Since neither A nor C appear on the right-hand-side of any of our three given functional dependencies both belong to every candidate key. This tells us that the functional dependency labeled (vii) above can not be further modified by pushing another attribute from the left-hand-side to the right-hand-side. Hence we conclude that {A,C} is a candidate key of the above relation.

By my reasoning about A and C, any other candidate key will have to be of the form A,C,X since both A and C must belong to it. However, if X is non-empty then this key will be larger than a candidate key we have already found and so it is not possible for it to be a candidate key. Kence X must be empty which says that in this case, the only candidate key for the table R is A,C

EXAMPLE 2: Suppose we start with the relation R whose schema is R = {A, B, C, D, E} and whose functional dependencies are

  1. A,B --> D
  2. C --> E
  3. E --> A
  4. D --> B
Since C does not appear on the right-hand-side of any functional dependency we can assume C belongs to every candidate key. To find the first candidate key we reason as follows:
    A,B,C --> D, E        (i)       // FD 1, FD 2 and Rule 3
Hence {A, B, C} forms a super-key. We know that C can not be removed to the right-hand-side so it remains to try to move A or B.
    C --> A,E             (ii)      // FD 2, FD 3, and Rule 4
    C,B --> A,B           (iii)     //  (ii), Rule 2 and Rule 3
    C,B --> D             (iv)      // (iii), FD 1 and Rule 4
    C,B --> A,D,E         (v)       // (ii), (iv) and Rule 3
We have now reached the point where we know we can not remove C from the left-hand-side and if we remove B then it must be possible to prove C --> B. Since B only depends on D and D, in part, depends on B we will not be able to put both of these attributes on the right-hand-side so B must remain where it is. Hence {B, C} determines all other attributes and no subset of this set can do the same thing so we conclude that {B, C} is a candidate key.

To find another candidate key we can start over again and look to put new attributes on the right-hand-side or we can use the observation that if X is a candidate key and Y --> X then Y is a super-key (and so contains a candidate key). We reason as follows:

    X --> all other attributes              // since X is a CK
    X --> X                                 // Rule 2
    X --> all attributes                    // Rule 1
    Y --> X                                 // given
    Y --> all attributes                    // Rule 4
    Hence Y is a super-key.

Following this reasoning we try to solve

      ? --> B,C
This is easy, since C must appear in any candidate key we can automatically put C on the left-hand-side and the fourth functional dependency gives us something else.
    D,C --> B,C           // FD 4, Rule 2 and Rule 3
Hence {C, D} is a super key. Using the same reasoning we applied above that C must belong to every candidate key and removing D means replacing it with B (which is the first candidate key we already found) we can conclude that {C, D} is a candidate key. No further candidate keys will be found.