I think it is important to distinguish between the context of NULLs. Most of the time, NULLs are a problem with math, not the logic per se. So if they appear in a numeric column, certain rules apply, but in a logical sense, others apply, etc. Also, if they are generated by the engine, from say a JOIN where no rows exist, that’s another form of them.

If anyone is ineterested, please chime in on a SQL Server post I started called The Logic, Mathematics, and Utility of NULLs. Here is the link:

http://www.sqlservercentral.com/Forums/Topic970493-374-1.aspx.

Not so unique. You might be interested in the way R handles NULL value. It is fairly similar (even though many examples are not applicable). For instance:

> sum(NULL, 3)

[1] 3

> NULL + 3

numeric(0)

http://pugs.postgresql.org/node/404

http://www.mail-archive.com/dbi-users@perl.org/msg31943.html ]]>

>

> Set operations can appear to be very similar to “normal” mathematical operations, but since they may be missing key properties of that operation (say closure), they act in what appears to be a non-intuitive fashion.

What set operations lack closure? As far as I know set operations are a well-defined mathematical system, and the empty set is just a value in that system. Sets are closed over the operations UNION, INTERSECT, DIFFERENCE, etc.

Set operations are nothing like NULL semantics, which involve 3VL and strange exceptions. You certainly can’t explain all of my examples by just saying “NULL is the empty set”.

> {NULL} != NULL (i.e. the set containing NULL != NULL alone)

If you do “SELECT * FROM foo WHERE NULL = NULL”, which NULL is “{NULL}” and which NULL is “NULL”? They look indistinguishable to me, but the predicate does not evaluate to TRUE.

> Does that help anyone?

I think that this line of reasoning will lead to mistakes. For instance, how do you explain the fact that COUNT(*) doesn’t ignore NULLs? Or that “x IS NOT NULL” is not the same as “NOT x IS NULL” for all x?

]]>NULL is the empty set.

All operations in SQL are based on set theory, not algebra as we normally consider it. Set theory is the *basis* for (ordinary) mathematics and mathematical operations. Set operations can appear to be very similar to “normal” mathematical operations, but since they may be missing key properties of that operation (say closure), they act in what appears to be a non-intuitive fashion.

So the definition of NULL is always consistent, it is the definition of operations over a set which may treat NULL elements in the set differently.

What is often the case is that what appears to be a case of …

NULL == NULL

is really a case of …

{NULL} != NULL (i.e. the set containing NULL != NULL alone)

For example, Aggregates often appear to be mathematical operations, like SUM(), but it is really defined as the operation over a set of elements which are either a number or elements convertible to a number. And when converting a NULL to a number for the purposes of a “SUM”, it is easy to say that its number value is zero (which is the same as ignoring it). Otherwise, SUM would be largely useless as an operation.

As such the SUM *operator* is not “+” it is a different (albeit similar) operator, and it inherently ignores NULL elements in the set in its computation. It isn’t inconsistent with “+”, because it was never meant to be the same as “+”.

Does that help anyone?

]]>Unfortunately, I don’t think that’s a universal solution. (a) NULLs will be generated anyway, by outer join and aggregates; and (b) most SQL implementations don’t effectively optimize physical designs that avoid NULLs, so there may be a significant performance penalty.

> #3 – When you have a nullable field, go all over your client code and make sure you coalesce your comparisons, your strings concatenation, etc.

That’s certainly good practice. When you have a query, look at any fields that can possibly be NULL, consider how those might affect your application differently, and handle them appropriately.

> #5 – Never use null as a meaningful value. e.g. if you want to store the date value “infinite time in the past”, use the lowest date available in the system, not “null value”.

I think that’s a good rule. Don’t use NULL as an arbitrary “special value”

]]>