Tests Overview
I'll create two simple tables - one as a master table of codes (reference with 10,000 rows), and another larger table that refers to the master table as a foreign key (transaction with 1,000,000 rows). This means that there are 100 rows per Reference Identifier value in Transaction. The Transaction table will have two such foreign key columns - one without NULL's (ref_id_nn), and one with NULL's (ref_id_n).Indexes will be created on these columns, and some example queries run using either reference column to show whether the indexes are being used or not. We can also look at the statistics on the indexes themselves to tell us something about whether NULL values are stored in them or not. I provide all the SQL to create and populate these two tables at the end of this post - I assume most people are interested in the results first.
Index Statistics
Immediately we can see that NULL values are not stored in an index by looking at the statistics for the indexes on the Transaction table.The results of this query are:col index_name heading 'Index' format a20 col lb heading 'Leaf|Blocks' format 999,999 col dk heading 'Distinct|Keys' format 999,999,999 col cf heading 'Clustering|Factor' format 999,999,999 col nr heading 'Num Rows' format 999,999,999 -- select i.index_name, i.leaf_blocks lb, i.num_rows nr, i.distinct_keys dk, i.clustering_factor cf from user_ind_statistics i where i.table_name = 'TRANSACTION' /
Leaf Distinct Clustering Index Blocks Num Rows Keys Factor -------------------- -------- ------------ ------------ ------------ PK_TRANSACTION 1,875 1,000,000 1,000,000 13,147 IX1_TRANSACTION_NN 2,090 1,000,000 10,000 1,000,000 IX1_TRANSACTION_N 2,090 999,900 9,999 999,900
Look at the "Number of Rows" values and the "Number of Distinct Keys" values for the two indexes on the two foreign key columns. The index on the column allowing NULL's has one less distinct key value - because the NULL value has not been stored in the index. Whereas the index on the column without any NULL's has the full 10,000 distinct key values in it. This is also reflected in the number of rows covered by the index - 100 less for the index with NULL values than for the index without NULL values. So already we have evidence that NULL values are not stored within a B-Tree index i.e. they are ignored by the index itself.
Queries
In my previous blog post I made two claims about indexes on columns allowing NULL values not being used:- Such an index cannot be used to satisfy an "IS NULL" query constraint
- The index also cannot be used to satisfy a "column != value" query constraint
"IS NULL" Constraint
The query is:When executed the query has the following execution plan (as fromselect count (*) from transaction where ref_id_n is null ;
dbms_xplan.display_cursor
immediately after executing the query):So the index was ignored and a full table scan was done, even though there were only 100 rows to retrieve.SQL_ID cwm2cmgn8q09n, child number 0 ------------------------------------- select count (*) from transaction where ref_id_n is null Plan hash value: 185056574 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 3578 (100)| | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | TABLE ACCESS FULL| TRANSACTION | 1 | 4 | 3578 (1)| 00:00:43 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("REF_ID_N" IS NULL)
Unfortunately we cannot run this query using the other column, because it does not allow NULL values. So the Optimizer will know that this query cannot return any rows at all. (It actually did an Index Fast Full Scan when I ran it, and returned a count of 0 rows). Instead we can change the query to compare to a real value, rather than NULL, and see that it uses the index for the execution now, even though it is still 100 rows again from the 1,000,000 in the table.
This shows that an index on a column allowing NULL values cannot be used when executing a query with an "SQL_ID 7burxn278qj8b, child number 0 ------------------------------------- select count (*) from transaction where ref_id_n = 5 Plan hash value: 1807025728 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 3 (100)| | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | INDEX RANGE SCAN| IX1_TRANSACTION_N | 101 | 404 | 3 (0)| 00:00:01 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("REF_ID_N"=5)
IS NULL
" constraint on that column."!=" Constraint
I've since realised that the Optimizer would be more likely to choose a Full Table Scan whether the column allowed NULL or not for a "not equals" constraint, simply because of the number of rows being returned. If 100 rows match a value, then 999,900 do not match that value. And a Full Table Scan will be the optimal access method regardless of whether the column allows NULL values or not.However, if the query only did a "count (*)" and did not retrieve any data columns from the table, then potentially an index could be used - with the Index Fast Full Scan access method. Consider the following performed on each of the two foreign key columns:
When I execute this I get 999,900 for the non-NULL column, and 999,800 for the NULL allowed column. Whoops! Something is not right somewhere. Again, we are back to the point that NULL is not a "normal value" within the value space of the data type of the column. And a NULL value is treated differently by Oracle rather than just another value. So although NULL is never equal to another real value, it is also never not equal to another real value. Even if I force a full table scan via the "select count (*) from transaction where ref_id_nn != 5 ;
full (transaction)
"
hint, I still get a count of 999,800 for the "ref_id_n != 5"
constraint. (The execution plan from dbms_xplan.display_cursor showed
that "TABLE ACCESS FULL" was used on TRANSACTION). [This very point
about "!=" was also mentioned in a comment by Narenda on my previous post, and I had independently stumbled upon this as a result of this series of tests].So even though we have not been able to show one of my claims about NULL values and indexes, we have instead stumbled upon another issue with NULL values. A NULL value is never equal to a real value, and also never not equal to a real value. I am sure that I have seen Jonathan Lewis discuss this in a presentation when talking about rewriting SQL statements into their equivalents. If a column allows NULL values then you must be careful when rewriting equality constraints into equivalent inequality constraints.
Further testing threw up another anomaly. Rather than test "!=" I thought I would test "not in", but contrived so that there is only one value in the "not in" list. For the NULL allowed column the Optimizer chose a full table scan:
SQL> select count (*) from transaction where ref_id_n not in (select 5 from dual) ; COUNT(*) ---------- 999800 SQL> @xplastexec SQL_ID 0f3nh5h11yu6p, child number 0 ------------------------------------- select count (*) from transaction where ref_id_n not in (select 5 from dual) Plan hash value: 297680891 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 23425 (100)| | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | FILTER | | | | | | | 3 | TABLE ACCESS FULL| TRANSACTION | 998K| 3900K| 3578 (1)| 00:00:43 | |* 4 | FILTER | | | | | | | 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter( IS NULL) 4 - filter(LNNVL(:B1<>5))
While for the non-NULL column it used the index:
So I think that this example does show that there are times when the Optimizer will not use an index on a column allowing NULL values, but would otherwise use an index if NULL's were not allowed.SQL> select count (*) from transaction where ref_id_nn not in (select 5 from dual) ; COUNT(*) ---------- 999900 SQL> @xplastexec SQL_ID fw2svuam6vzb1, child number 0 ------------------------------------- select count (*) from transaction where ref_id_nn not in (select 5 from dual) Plan hash value: 1550919231 --------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 20422 (100)| | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | FILTER | | | | | | | 3 | INDEX FAST FULL SCAN| IX1_TRANSACTION_NN | 998K| 3900K| 575 (2)| 00:00:07 | |* 4 | FILTER | | | | | | | 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 | --------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter( IS NULL) 4 - filter(LNNVL(:B1<>5))
NULL as "Anything"
Another pattern I have seen is where a NULL value is used to mean "any value", rather than "no value". So in a system that involves processing transactions according to type, and each type is processed by a different handler, then a NULL value means that the transaction can be processed by any available handler. This leads to a query similar to the following:The execution plan for this cannot use the index because NULL values are not stored in it. Hence you get an execution plan ofselect ... from transaction where (ref_id_n = 5 or ref_id_n is NULL) ...
However, if the database design was different, and the value '0' used as a special case for "any handler", with a corresponding row inserted into the "reference" table, then the query could be changed to the following and Oracle would use the index on the column:SQL_ID 4tsn8vbt3s3gh, child number 0 ------------------------------------- select count (*) from transaction where ref_id_n = 5 or ref_id_n is null Plan hash value: 185056574 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 3580 (100)| | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | TABLE ACCESS FULL| TRANSACTION | 101 | 404 | 3580 (1)| 00:00:43 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(("REF_ID_N"=5 OR "REF_ID_N" IS NULL))
Again, the presence of NULL values results in sub-optimal query execution, and a different design can provide a better solution, allowing the Optimizer to use an index for much faster access (cost of 4 versus 3580).SQL_ID d2v9fgscm8nrq, child number 0 ------------------------------------- select count (*) from transaction where ref_id_nn = 5 or ref_id_nn = 0 Plan hash value: 1215059433 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 4 (100)| | | 1 | SORT AGGREGATE | | 1 | 4 | | | | 2 | INLIST ITERATOR | | | | | | |* 3 | INDEX RANGE SCAN| IX1_TRANSACTION_NN | 201 | 804 | 4 (0)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(("REF_ID_NN"=0 OR "REF_ID_NN"=5))
Conclusion
For now I think I have written enough in one post about NULL's and indexes. My main point was to provide some real evidence that NULL values are not stored in B-Tree indexes in Oracle, and that if NULL values are allowed in columns it can affect whether the Optimizer will use an index or not. And I think I have done that. I'm sure that people experienced with Oracle already know this, but I just wanted to provide some proof points for anybody who doubted it for any reason. As ever, the benefit of some real tests is that you get to either verify any assumptions you have made or show them up to be wrong and correct them. In this case I have been able to correct some weak assumptions I had, and have learnt some more about how the Oracle Optimizer works, specifically when handling NULL values.Table Creation SQL
Here is the SQL to create the tables, populate them with data and index them. Note the two foreign key columns in "transaction", and that one value is mapped to NULL when generating the data.So there are 10,000 rows in Reference, and 1,000,000 rows in Transaction. This means that there are 100 rows per Reference Identifier value in Transaction.-- -- Create the tables necessary for the NULL values investigation -- Create tables, load in generated data, index, gather statistics -- prompt Creating tables -- -- Reference table - a lookup of master values -- create table reference ( ref_id number (*,0) not null, group_id number (*,0) not null, description varchar2 (512) not null, constraint pk_reference primary key (ref_id) ) ; -- -- Transaction table -- create table transaction ( trans_id number (*,0) not null, ref_id_nn number (*,0) not null, ref_id_n number (*,0) , location_id number (*,0) not null, padding varchar2 (512), constraint pk_transaction primary key (trans_id) ) ; -- alter table transaction add constraint fk_ref_nn foreign key (ref_id_nn) references reference (ref_id) ; -- alter table transaction add constraint fk_ref_n foreign key (ref_id_n) references reference (ref_id) ; -- prompt Loading data -- -- SPECIAL DUMMY ROW!! Keep in sync with any newly added columns!*! insert into reference values (0, 0, 'Unknown - Dummy entry for referential integrity purposes') ; -- insert into reference with generator as (select rownum r from (select rownum r from dual connect by rownum <= 1000) a, (select rownum r from dual connect by rownum <= 1000) b, (select rownum r from dual connect by rownum <= 1000) c where rownum <= 1000000 ) select r, mod (r, 1000), 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' || to_char (r, '00000000') from generator g where g.r <= 10000 / commit ; -- insert into transaction with generator as (select rownum r from (select rownum r from dual connect by rownum <= 1000) a, (select rownum r from dual connect by rownum <= 1000) b, (select rownum r from dual connect by rownum <= 1000) c where rownum <= 1000000 ) select r, mod (r, 10000), decode (mod (r, 10000), 0, null, mod (r, 10000) + 1), mod (r, 1999), 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789' || to_char (r, '000000000') from generator g where g.r <= 1000000 / commit ; -- prompt Creating indexes -- create index ix1_transaction_nn on transaction (ref_id_nn) ; create index ix1_transaction_n on transaction (ref_id_n) ; -- prompt Gathering Statistics -- exec dbms_stats.gather_schema_stats ('JB') --
2 comments:
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| |
|* 1 | INDEX RANGE SCAN| T1_IDX | 4 | 20 | 1 (0)| 00:00:15 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("N1">2 AND "N2" IS NULL)
filter("N2" IS NULL)
The previous poster shows that indexes CAN be used to find NULLs in some circumstances. I'll do it here as well using your tables.
SQL> create index ix1_transaction_n_loc on transaction (ref_id_n,location_id);
Index created.
SQL> select count(*) from transaction where ref_id_n is null;
Execution Plan
----------------------------------------------------------
Plan hash value: 1786339511
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
|* 2 | INDEX RANGE SCAN| IX1_TRANSACTION_N_LOC | 100 | 400 | 3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("REF_ID_N" IS NULL)
Post a Comment