Previously we saw that if we had an index on the
ref_id_n
column in the transaction
table, which allowed NULL values, then the Optimizer would not use the index for an "IS NULL
" constraint:The index onSQL_ID cwm2cmgn8q09n, child number 0 ------------------------------------- select count (*) from transaction where ref_id_n is null Plan hash value: 185056574 ---------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 3578 (100)| | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | TABLE ACCESS FULL| TRANSACTION | 1 | 4 | 3578 (1)| 00:00:43 | ---------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("REF_ID_N" IS NULL)
ref_id_n
is being ignored, as the NULL values are not stored in that index. However, if we create another index, with ref_id_n
as the first column followed by another column that does not allow NULL values, then the index will contain entries where ref_id_n is NULL. With such an index, the Optimizer can now use it for an "IS NULL
" constraint:So multi-column indexes can store NULL values in them, providing at least one column does not allow NULL values. The benefit of such an index can be significant - in this test case the cost came down from 3,578 to 2, simply because the number of NULL rows were so few. But an index on only theSQL> create index ix2_transaction on transaction (ref_id_n, ref_id_nn) ; Index created. SQL> select count (*) from transaction where ref_id_n is null ; COUNT(*) ---------- 100 SQL> @xplastexec SQL_ID 8tubzdty7vdnv, child number 0 ------------------------------------- select count (*) from transaction where ref_id_n is null Plan hash value: 176942238 ------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 2 (100)| | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | INDEX RANGE SCAN| IX2_TRANSACTION | 1 | 4 | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("REF_ID_N" IS NULL)
ref_id_n
column itself is of no use for this query, and is ignored by the Optimizer.Potentially you can also gain some benefit from an index where the
ref_id_n
column is not the leading index, as the index may be smaller in size
than the table, and the Optimizer may chose an Index Full Scan rather
than a Full Table Scan. And that is the case with the test data set I
have been using:SQL> drop index ix2_transaction ; Index dropped. SQL> create index ix2_transaction on transaction (ref_id_nn, ref_id_n) ; Index created. SQL> select count (*) from transaction where ref_id_n is null; COUNT(*) ---------- 100 SQL> @xplastexec SQL_ID 2j64r2n1nq4xm, child number 0 ------------------------------------- select count (*) from transaction where ref_id_n is null Plan hash value: 1095380460 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 726 (100)| | | 1 | SORT AGGREGATE | | 1 | 4 | | | |* 2 | INDEX FAST FULL SCAN| IX2_TRANSACTION | 1 | 4 | 726 (1)| 00:00:09 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("REF_ID_N" IS NULL)
The Index Fast Full Scan is costed at 726, compared to the 3,578 of the Full Table Scan. Being a "
count (*)
" only, no other data columns are needed from the table itself.So if you are allowing NULL values in columns within your database design, and you want to find those rows that have a NULL value stored, then you cannot use an index on just that column alone. You will need a multi-column index, and include another column that does not allow NULL values. However, it may be that the only column that does not allow NULL values in your database design is the primary key column(s) itself, if you simply allow NULLs for every column by default.
2 comments:
Hi,
If you don't want to include another (non null) column, then you can simply add a constant, as in:
create index ix2_transaction on transaction (ref_id_n, 0) ;
this is just one additional byte in the index.
Regards,
Franck.
"So multi-column indexes can store NULL values in them, providing at least one column does not allow NULL values."
You're absolutely correct - in the scenario where at least one column has a NOT NULL constraint, the index will store an entry for every row in the database, even if a row has some NULLs in the other (NULLable) columns.
You might want to note also, however, that in the scenario where a multi-column index is across columns that are ALL NULLable (i.e. there are no NOT NULL constraints), you can still get NULLs stored in the index. This is because the index will store entries for each row where at least one of the columns happens to have a non-NULL value. Of course, in this scenario it's not guaranteed that EVERY row in the database will be represented in the index.
So, it's more accurate to say that all indexes (whether single or multi-column) will store entries for every row that has at least one non-NULL value for a column in the index. It doesn't matter whether the columns have NOT NULL constraints or not.
Post a Comment