Sunday 10 June 2012

Avoiding NULLs

On the one hand I understand the added value of the concept of a "NULL" value - meaning that no value has been stored in a column in a row in a database table. On the other hand I have always felt that allowing NULL values in a column leads to more trouble of one form or another down the line. And recent experience with another database design has again reinforced my concerns. For another description about some of the problems with NULL values and what they do or do not mean, you can also read Much Ado About Nothing?, which is quite informative.

If I had to summarise my position on allowing NULL values, then I think of them similar to how I think of a loaded gun - not directly harmful itself, but a there is a reasonable chance of it going wrong and resulting in serious damages. If you need to use them, and you have a clear reason why, then use them. But otherwise avoid them as much as possible.

Allowing NULL values in a column can be bad for all kind of direct and indirect reasons.
  • Oracle does not store NULL values in normal B-Tree indexes, which limits when the indexes can be used on columns that allow NULL values
    • Such an index cannot be used to satisfy an "IS NULL" query constraint
    • The index also cannot be used to satisfy a "column != value" query constraint
  • Foreign keys that allow NULL are "optional", but again have index and optimizer side effects.
  • Outer joins may become necessary, which again limit the choices available to the Optimizer.
  • Many programming languages have no concept of NULL, being a value outside of the normal value range by definition.
    • Either NULL values require special programming, or require some kind of shadow boolean variable to indicate if it is really NULL (no real value in the main variable).
  • Many programmers struggle with the difference between NULL and empty or blank, which can all mean different things within a programming language e.g. a zero length string, or the number zero.
  • NULL cannot be using in normal equality and range tests. As it is outside the normal value range it cannot be greater than or less than a real value. And it cannot be equal to any other value in a column, including another NULL. You must use the special constraint "IS NULL" to test for a NULL value, and never "= NULL".
  • Calculations involving a NULL value evaluate to NULL i.e. unknown. 
So NULLs are definitely useful, but each case should be considered carefully before using them.  Generally it is easier to not allow NULL values in columns, as all operations are easier.  Allowing NULL values requires some thought to make sure that it ends up working as you hoped it would.

6 comments:

Jeffrey Kemp said...

Nice post, agreed with what you're saying. NULLs are dangerous in a physical database design mainly because they are difficult to deal with correctly - bugs are easy to introduce and it takes a lot of care and experience to be aware of them and code defensively; and even then it's easy to get it wrong :)

Just one minor correction: an index on a column can be used by the CBO with a != predicate, just as it can with a = predicate. It's perhaps not often likely to, but it is possible.

John Brady said...

Jeff,

I don't think an index can be used for a != test when the column being tested allows NULL values. This is because the NULL values are not stored in the index - only non-NULL values. So the only way to find all the other values in this case - including the NULL values - would be to do a full table scan.

I've not proved this (no test cases I'm afraid), but I'm sure I have seen this stated by other people. And it was one reason why I was not getting an expected execution plan from a complex query (other factors involved as well).

Charles Hooper said...

A composite B*tree index may be used to search for NULL values, as long as at least one column value in the index definition is not nullable. A constant number such as 0 or 1 may be used as the second column in an index definition to satisfy this restriction, for example:
CREATE INDEX IND_T1_C3 ON T1(C3,0);
With such an index, the optimizer may select to use the index when the following is found in the WHERE clause:
WHERE C3 IS NULL

An index may also be used to satisfy the != test, but the optimizer is very unlikely to select that access path without adding an INDEX hint to the SQL statement.

Narendra said...

John,

Looks like your post is "inspired" by your question on forum recently.
Anyways, you said in response to Jeff that So the only way to find all the other values in this case - including the NULL values - would be to do a full table scan.
But you can not find all other values, including NULL values, using != operator. Because a NULL value is neither equal to nor not equal to any other value. See below

SQL> with src as (select level as l from dual connect by level <= 5 union all select null from dual)
2 select * from src where l != 3 ;

L
----------
1
2
4
5

4 rows selected.

John Brady said...

Narenda,

I am writing a follow up post, and have just hit exactly the issue you mentioned - NULL is neither equal to nor unequal to any real value. I ran a "!=" query and got different row counts whether the column allowed NULL values or not. Thanks for pointing out yet another distinction about NULL values.

Jeffrey Kemp said...

An index most certainly can be used for a != predicate, proof:

create table jefftest (n number);
create index jefftesti on jefftest (n);
insert into jefftest values (null);
insert into jefftest values (1);
insert into jefftest values (2);
commit;
explain plan for select * from jefftest where n != 2;

--------------------------------------
| Id | Operation | Name |
--------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | INDEX FULL SCAN | JEFFTESTI |
--------------------------------------