Tuesday 15 September 2009

Reported Execution Plan Anomalies

In doing some tests of the effects of different indexes on execution plans, I came across a small anomaly with the way the execution plan was being reported. I was using the Autotrace feature of SQL*Plus to report the execution plan after a SQL query was executed. I noticed that the costs reported for each step were not completely true, although the overall execution total cost was correct.

The SQL query I was playing with was:

select count (*) from (
select t1.i1, t1.i2, t1.c1, t2.i1, t2.i2, t2.c1
from t1 join t2 on t1.i3 = t2.i1
where t1.i2 = 111
and t2.i4 = 11
)

With indexes on the I1 columns in both tables, and the I3 column of table T1 (foreign key). Oracle will therefore scan T2 and join to T1 using the index on I3, as seen in this reported plan:

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 18 | 1236 (1)| 00:00:15 |
| 1 | SORT AGGREGATE | | 1 | 18 | | |
|* 2 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 9 | 102 (0)| 00:00:02 |
| 3 | NESTED LOOPS | | 10 | 180 | 1236 (1)| 00:00:15 |
|* 4 | TABLE ACCESS FULL | T2 | 20 | 180 | 195 (1)| 00:00:03 |
|* 5 | INDEX RANGE SCAN | I2_T1 | 100 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("T1"."I2"=111)
4 - filter("T2"."I4"=11)
5 - access("T1"."I3"="T2"."I1")

As can be seen, the access to the data row of table T1 is performed after the join to T2 is complete i.e. outside and after the NESTED LOOP join has completed. However, the reported cost of 1236 for the Nested Loop is incorrect. It should be 236, based on the other reported values for cost and cardinality (row count). The cost of a Nested Loop join is the cost of the inner, first table access, plus the cost of the outer, second table access multiplied by the number of rows retrieved from the inner, first table (cardinality):

195 + (20 * 2) + CPU = 195 + 40 + 1 = 236.

Likewise the row count of the T1 table access is incorrectly shown as 1, when it should be 10, the same as the Nested Loop count. The cost of accessing data rows in T1 that match a single value in the I2_T1 index is about 100. (This can be verified by looking at the values reported in the 10053 trace file for Clustering Factor and Filter Factor, or deriving these manually). The cost for accessing all necessary data rows in T1 as a result of the Nested Loop is the single access cost times the cardinality of the number of times it is performed i.e. 100 * 10 = 1000.

Thus the total cost for the two table join query is 236 (NL) + 1000 (T1 data) = 1236. And indeed the total cost of the query is correctly reported as this. It is just the 2 inner steps of NESTED LOOPS and TABLE ACCESS BY INDEX ROWID that have their costs and row count incorrectly reported, causing this anomaly.

I guess this anomaly is caused by the execution plan moving the table access to outside the nested loop, rather than inside where it most often occurs. If the table access was instead inside the nested loop, between steps 4 and 5 as the parent of 5, then the reported costs would be correct. Most probably the Optimizer arrived at the original plan, then modified the execution steps when it realised it could move the table access outside of the nested loop. Unfortunately, it did not adjust the cost values for this, and so they are mis-reported.

No comments: