Wednesday 27 May 2009

Optimizer Costing of Index Scans

Previously I gave a brief overview of the fact that the optimizer costs each potential execution plan for a SQL query and then chooses the one with the lowest cost, and described how it costs a Full Table Scan. Now I will carry on and look at the cost calculations for using an Index to access data in a table. Again, most of this is based on the information in A Look Under The Hood Of CBO: The 10053 Event by Wolfgang Breitling, and various tests I have done to confirm the details of this. Now we will look at the calculation of costs for table access using an Index.

Note that I am concentrating here on the I/O cost as that is the largest component of the cost of an execution plan. However, there is also a CPU cost, which is not shown here. Typically this is far less than the I/O cost, and is so in the cases discussed so far. It is possible for some execution plans that the CPU cost is significant, and this will increase the total cost of such an execution plan.

Just as Oracle can scan all blocks in a table, so it can scan all leaf blocks in an index. This scan can be performed in two different ways potentially - as a scan using single block reads, or a scan using multi-block reads as for a table scan. The former is termed an Index Full Scan, while the latter is termed an Index Fast Full Scan. The cost calculations of each are based on the number of Leaf Blocks in that particular index.

For a Full Scan the cost formula seems to be:
  • #Levels + #Leaf Blocks
For a Fast Full Scan the cost formula uses MBRC as for a Full Table Scan:
  • #Levels + (#Leaf Blocks * MREADTIM / (MBRC * SREADTIM))
Testing with different queries and different indexes shows this to be the case.

Note that I have seen Oracle report both access methods as an "INDEX FAST FULL SCAN" operation in an EXPLAIN PLAN (DBMS_STATS.DISPLAY_CURSOR). I wonder whether this is a bug within the labelling of operations within the Plan Table used?

Normally however indexes are used to implement a predicate condition, and restrict the number of rows being returned. To do this the index is traversed from the root block down through the levels of the index to the leaf blocks that contain the matching values. Not all leaf blocks will be visited - only those containing a matching entry for the predicate condition. The Oracle Optimizer terms this restriction on the number of matching entries the Filter Factor. It is the proportion of rows in the whole table that will match the predicate condition i.e. a value always less than 1. Thus the cost of using only the index itself is:
  • #Levels + (#Leaf Blocks * Filter Factor)
In many situations some data needed for the query is not present in the index, and so the data row needs to be read from the table itself. Thus the cost increases by the cost of reading these data rows, which is essentially the number of rows matching the predicate condition:
  • #Rows * Filter Factor
There are however several refinements which need to be made to these formulae to be reflect how the Optimizer actually costs the access paths.

First, it is possible that matching data rows are stored in the same data block. In such cases the data block is only read once, even though multiple data rows are used from it. Such clustering of similar values will reduce the number of data blocks that need to be read i.e. the read of one data block from disk can satisfy multiple entries from the same leaf block. Oracle calculates this clustering as a statistic for each index, and uses this to report the number of data blocks that would need to be read – the Clustering Factor. This is visible in the CLUSTERING_FACTOR column in the USER_INDEXES and USER_IND_STATISTICS data dictionary views.

For many indexes it will be the case that the Clustering Factor is the same value as the number of Rows in the table – no co-location of data values for that particular column. But for some indexes, the Clustering Factor will be much lower, such as an index on the date the data was loaded, or an incrementing sequential account number. The Clustering Factor Statistic factors in the benefits of the reduced disk I/O from this co-location of Index and Data entries, to better estimate the real disk I/O that would actually take place.

The final full index cost formula is thus:
  • #Levels + (#Leaf Blocks * Filter Factor) + (Clustering Factor * Filter Factor)
Second the Filter Factor is calculated in different ways for different types of predicate conditions within a query. For now I will restrict myself to equality conditions (=), as they are the most common. Please read Wolfgang's Under The Hood paper for information on the Filter Factor for other conditions.

For a column with no Histograms (this is important), the Filter Factor for an equality condition will be the density of data values within that column - a measure of the proportion of data rows that have the same value for that column. This is true whether the condition uses a literal value or a bind variable. In turn the density value is simply the inverse of the number of distinct values in that column i.e. 1 / NDV. This count is available in the DISTINCT_KEYS column of USER_IND_STATISTICS and USER_INDEXES.

The Filter Factor can also be used to calculate the number of matching rows produced by the predicate condition i.e. the Cardinality of the result set. This is used by the Optimizer when multiple conditions are combined, either on the same table or joined tables.

Filter Factors on multiple conditions in a query are combined together according to the standard rules for probabilities:
  • c1 AND c2 ==> FF1 * FF2
  • c1 OR c2 ==> FF1 + FF2 – (FF1 * FF2) i.e. exclude duplicate matches
If the column has a Histogram on it, the Density value will be calculated from the Density values of each bucket within the Histogram that fall within the query range. This might be close to 1 / NDV for the whole table, or it might be different. That is one way in which Histograms can have such a big impact on query execution plans.

Examples


Using the index cost formula of:
  • #Levels + (#Leaf Blocks * Filter Factor) + (Clustering Factor * Filter Factor)
For a query with a condition on column C3, which has an index on it with a Density of 0.001002, 250 Leaf Blocks, 1 Level and 100,000 rows and the same value for Clustering Factor the cost should be:
  • 1 + 250 * 0.001002 + 100000 * 0.001002 = 1 + 0.2505 + 100.2 = 101.4505
The reported I/O cost by Oracle was 102, which agrees very closely with our calculated value. The difference is probably due to rounding errors. The expected cardinality is 100,000 * 0.001002 = 100.2.

Friday 8 May 2009

Oracle Optimizer Plan Costing - Full Table Scans

Hopefully this is the start of a series on how the Oracle Cost Based Optimizer determines the cost of a particular execution plan for a SQL statement. Knowing how it arrives at the cost of each possible access path to a data set, we can better understand why a particular access path was chosen, and why an alternative access path had a higher cost. This information was kick started by A Look Under The Hood Of CBO: The 10053 Event by Wolfgang Breitling, and also information in Troubleshooting Oracle Performance by Christian Antognini, and refined by various tests I did.The Under the Hood paper is very good, and has a good description of how the Optimizer goes about costing different access paths to data sets. Unfortunately it seems to be based on Oracle 9i, and things have changed slightly in Oracle 10g. The principles are the same, but the trace information output by the optimizer has changed slightly.

For this post I will focus on how the optimizer works generally, and specifically how a Full Table Scan is costed. These will not be full, exhaustive explanations of the inner workings of the Oracle Cost Based Optimiser, but short and to the point descriptions of the key costings.

Plan Costing


The Optimizer itself costs all execution plans in terms of the number of disk I/Os involved, because these are generally the largest component of the execution plan cost. Other costs such as CPU costs are also included in the total plan cost, after being converted to an I/O count via a conversion value. When the lowest cost plan has been produced, this can be easily converted to an elapsed time value by simply multiplying the cost by an average I/O time value.

When producing an execution plan for a SQL query (queries are the most common and also the most complicated things that the Optimizer has to deal with), the Optimizer simply tries to produce all possible execution plans and cost each of them. From these it chooses the plan with the lowest cost. Clearly this could be quite an exhaustive way of doing things. So what the Optimizer actually does is to fully cost one execution plan, and then only partially cost the subsequent execution plans.

Each subsequent plan is costed as each step of the plan is determined by the Optimizer. If the partial cost of such a plan so far is greater than the current best plan, then it is abandoned. This early pruning avoids the optimizer wasting too much time on plans that are too costly. Likewise if a new plan is produced with a total cost less than the current best plan, then the new plan becomes the current best plan.

After considering all possible execution plans the optimizer will have the lowest cost execution plan for the SQL query.

The execution plans themselves are formed in an iterative fashion. First, for each table referenced in the SQL query the optimizer costs each potential access path to the data in that table. Indexes are considered if they can be used directly to implement a predicate - a filter condition within the SQL query restricting the rows returned. Then the optimizer considers each possible join combination between the tables in the SQL query, using each possible join method - Nested Loop, Sort Merge or Hash Join. These joins use some of the costs for accessing each of the tables determined eariler, combined according to how that join method works. Thus each possible join path (A to B to C, A to C to B, B to A to C, etc.) is costed in turn using each of the 3 possible join methods, to arrive at the lowest cost execution plan for the SQL query.

Full Table Scan


This is one access method to data in a table that is always available to the optimizer. Logically the cost is the number of disk I/Os required to read every data block in the table. The number of data blocks in a table is obtained from the BLOCKS column in USER_TABLES. It is as accurate as the last time you gathered statistics on that table. Oracle 10g has an automatic job that runs every night by default to update the statistics on tables that have changed significantly, to minimise the drift between actual table sizes and the statistics stored about them in the database.

An enhancement Oracle has is the ability to issue multi-block reads that read more than one block from disk in a single read request. This reduces the total number of disk I/Os needed. Thus the number of disk I/Os needed is reduced by the multi-block read count (MBRC), according to the formula:
  • Blocks / MBRC

The Optimizer wants all costs to be in units of equal disk I/Os i.e. in units of single block reads. However, the time for a multi-block read is likely to be different – generally slightly longer given that more data is transferred. To convert a multi-block read cost into units of a single block read, the multi-block read cost is divided by the time for a multi-block read and multiplied by the time for a single block read. This adjustment means that the “cost” is now in units of cost of single block reads, as used by other access paths such as indexes.

Therefore the cost of a Full Table Scan is actually:
  • (Blocks * MREADTIM) / (MBRC * SREADTIM)

MREADTIM and SREADTIM, as well as MBRC, are part of a set of statistics Oracle has about the relative performance of the computer system it is running on. These are termed the System Statistics, and are visible in sys.aux_stats$ or via the dbms_stats.get_system_stats procedure. Out of the box following installation, Oracle only has a minimal set of default values for some of these statistics - IOSEEKTIM, IOTFRSPEED and CPUSPEEDNW. All the other system statistics have no value. They can be set in a number of ways - the intended way being to get Oracle to measure these values from the system itself while it is executing some representative workload (these are termed Workload Statistics) - but they can be set manually if desired, or left unset.

Oracle uses those System Statistics that it has values for. For those with no value Oracle uses various substitution formulae to calculate values to use instead.

If there is a value for MBRC in the System Statistics then the Optimizer uses that value. Otherwise, it uses the value of the db_file_multiblock_read_count (DBFMBRC) initialization parameter.

If MREADTIM and SREADTIM do not have any values in the System Statistics, then the following substitutions are made:
  • SREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE / IOTFRSPEED)
  • MREADTIM = IOSEEKTIM + (MBRC * DB_BLOCK_SIZE / IOTFRSPEED)

This is probably the general case on most systems - no Workload System Statistics are present (MBRC, SREADTIM, MREADTIM and others) - and only default values for IOSEEKTIM (10 milliseconds) and IOTFRSPEED (4 KB / millisecond). For an 8 KB block size and DBFMBRC of 8 this gives values for SREADTIM of 12 ms and for MREADTIM of 26 ms, and the ratio of MREADTIM to SREADTIM of 2.167.

A simple test shows this to be the case - a SELECT on a table with no indexes. The table I used had 1252 blocks of 8 KB each, and DBFMBRC was set to 8. No System Statistics had been collected, so only the defaults exist for IOSEEKTIM and IOTFRSPEED. Using the substituted formulae for MREADTIM and SREADTIM the calculated cost should be:
  • (1252 * 26) / (8 * 12) = 32552 / 96 = 339.08

The reported cost by Oracle was 341, which is very close. The difference is due to various factors, such as rounding within the calculation, the optimizer probably costing an extra read for the table header (+1), and the optimizer adding in a CPU cost for processing the data.

Using these formulae and the values of the System Statistics, we can calculate for ourselves the expected cost of a Full Table Scan of any table.