Here are a couple common representations of data sets:
username | realname | phone
jdavis | Jeff Davis | 555-1212
jsmith | John Smith | 555-2323
We naturally think of the former as representing a relation, but both representations are logically equivalent, even if the latter is overly verbose. In both cases, we’ve essentially just labeled the data, by which I mean we’ve briefly documented the meaning of individual properties, independent of the other properties.
There exists a user with user name [username] and real name [realname] and phone number [phone].
Notice that it’s just several simpler predicates connected with
AND. We can freely omit parts of the predicate, such as
phone, and still have true statements (although, as explained below, this is different from relational projection).
However, this does not hold true in the general case, where a predicate is more complicated than just a collection of simple predicates connected with
AND. For example, let’s add a new property, called
during, that represents the time interval over which the predicate is (or was) true. Now the predicate looks something like this (call this
There exists (or existed) a user with user name [username] and real name [realname] and phone number [phone] during the interval of time [during].
Now, we can no longer freely remove the
during portion of the predicate, because the statement will no longer be true.
The logical reason for this is the the relational projection operator doesn’t merely eliminate a part of the predicate, it turns it into a bound variable by quantifying it. If we use projection to eliminate the
phone attribute from
P1, you get a new predicate (call it
There exists a phone number [phone] such that there exists a user with user name [username] and real name [realname] and phone number [phone].
In this case, the variable [phone] is bound, and the tuples that satisfy that predicate only have two attributes:
realname. This predicate is very similar to just removing
phone from the predicate entirely, as though it never existed. However, this predicate does tell you that the user has a phone number (although not what the number is), so it is not identical.
However, if we use projection to eliminate the
during attribute from
P2, we get the predicate (call it
There exists some time interval [during] such that there exists (or existed) a user with user name [username] and real name [realname] and phone number [phone] during the interval of time [during].
In this case, the resulting predicate is very different from the predicate where
during is removed entirely. That is,
P4 is different from
P1, even though the tuples that satisfy
P4 are of the same form (3 attributes) as those that satisfy
The point of all this is that the simple labeling of data has the underlying assumption that the attributes are independent, i.e., you can simply omit attributes of the data and still have accurate information. This is not true in the general case, because attributes are not independent except in the simplest cases. And the meaning of an individual attribute is much more complex than the “obvious” meaning that a label might convey.
Predicates, which form propositions, are a much more complete way to represent the meaning of data, and there is no requirement that the attributes be independent. Attribute names (i.e. labels) are better used as a reminder than as a representation of the actual meaning. This is where the relational model succeeds and XML fails: the relational model provides operators that act on predicates with concrete logical meaning, whereas XML relies on data labels, which don’t accurately represent complex meaning.