Monday 2 July 2012

All Outer Joins Are Not Equal

When executing SQL involving ANSI syntax OUTER JOIN's Oracle first converts this to its own native syntax before executing the SQL, and sometimes this conversion is not totally transparent thereby affecting and restricting the execution plan produced.

This started when I had a long running query, and it happened to involve outer joins between most of the tables (but not all of them). From the execution plan it was clear that the main problem was a full table scan on a very large table - out of a total cost of 35,000, 28,000 was from this full table scan alone. So replacing the full table scan by a more efficient index access, if possible, could lead to a significant improvement.

After some analysis I created an index on all of the referenced columns in the table. This meant that all necessary data for the query could be obtained from the index, with no need to visit the data rows in the table at all. As the data row was quite wide (180 bytes), the index was much narrower and much smaller in size. Also, because the index stores data in sorted order by the leading columns, it meant that the rows the query wanted would all be stored together within the index structure, further reducing the number of disk I/Os needed to retrieve it all. But, the Oracle Optimizer would not use this index at all. It just seemed to ignore it, and always do a full table scan.

At this point I tried another of my techniques for understanding how a query gets executed - remove tables from the query one at a time, until you get a more "sensible" execution plan, and then build it back up again to the final query by adding a table at a time, monitoring the execution plan after each change.

What I noticed was that when I got down to 3 tables from the original 7, the Optimizer would now choose to use the new index (doing an Index Fast Full Scan, at a lower cost of 4,000). But when I added back in a fourth table, it went back to doing a Full Table Scan at a cost of 28,000.

I did a 10053 trace on the execution of each query (3 table and 4 table queries) and noted that in the SINGLE TABLE ACCESS PATH section of the 3 table query it was costing an Index Fast Full Scan (IndexFFS), whereas in the 4 table query it was not costing this access path, and only had the Full Table Scan costed.

At this point while I could see what was going on (in one case the Optimizer knows about the Index Fast Full Scan as an option, but in the other case it doesn't) I could not explain why it was happening, nor what to do about it. So I turned to the Oracle Forums and raised a question about Why Optimizer ignoring Index Fast Full Scan? While waiting for replies I also continued to tinker with the queries in the hope of gaining some extra understanding of what was going on.

The cause of the problem was identified by Dom Brooks as being the fact that Oracle internally rewrites ANSI syntax OUTER JOIN's to its own native syntax (using "(+)") before then optimizing and executing the query, and that this can result in some complex SQL that cannot be optimised as you might otherwise expect. Also see his earlier reply which gives more background information and links to other explanations about this.

When it rewrites the ANSI syntax JOIN's to Oracle native syntax, it wraps each join up in its own in-line view, to ensure that it is equivalent to the ANSI meaning. It then tries to merge these inline views back into the main query. For a single outer join, and other simple cases, this view merging happens smoothly. But in other cases it does not, and the Optimizer is left with some complex SQL with the joins being performed within the inline views. And this restricts the options available to it when building the final execution plan.

There is more information on this at the Oracle Optimizer Blog - Outer Joins in Oracle
"In Oracle, ANSI left and right outerjoins are internally expressed in terms of left outerjoined lateral views. In many cases, a left outerjoined lateral view can be merged and the ANSI left (or right) outerjoin can be expressed entirely in terms of Oracle native left outerjoin operator. (A lateral view is an inline view that contains correlation referring to other tables that precede it in the FROM clause.)"
So for a simple single outer join such as the following:
SELECT T1.d, T2.c
FROM T1 LEFT OUTER JOIN T2 ON (T1.x = T2.x);
it first becomes this on initial conversion to a lateral inline view:
SELECT T1.d, LV.c
FROM T1,  LATERAL (SELECT T2.C FROM T2 WHERE T1.x = T2.x)(+) LV;
which then becomes this when the view is merged:
SELECT T1.d, T2.c
FROM T1, T2
WHERE T1.x = T2.x (+) ;
However, if the joins are more complex, or there are other issues e.g. NULL values allowed within indexed columns, then it may not be possible to collapse the lateral views back into a single level query. In which case the options then open to the Optimizer are less than they would be otherwise, as it must treat the unmerged views on their own.

So if you have a query involving OUTER JOIN's that is not getting as good an execution plan as you think it could get, it may be due to you having used ANSI join syntax. And when Oracle rewrites this to its native syntax it is not able to merge back together the inline views for one reason or another. In my case it seems to have been due to the fact that almost all columns allowed NULL values to be stored in them (except the primary key columns), and NULL values are not stored within indexes on disk. As a result, Oracle was not able to utilise this extra index when more tables were added to the query that were also outer joined.

No comments: