Note that I am concentrating here on the I/O cost as that is the largest component of the cost of an execution plan. However, there is also a CPU cost, which is not shown here. Typically this is far less than the I/O cost, and is so in the cases discussed so far. It is possible for some execution plans that the CPU cost is significant, and this will increase the total cost of such an execution plan.
Just as Oracle can scan all blocks in a table, so it can scan all leaf blocks in an index. This scan can be performed in two different ways potentially - as a scan using single block reads, or a scan using multi-block reads as for a table scan. The former is termed an Index Full Scan, while the latter is termed an Index Fast Full Scan. The cost calculations of each are based on the number of Leaf Blocks in that particular index.
For a Full Scan the cost formula seems to be:
- #Levels + #Leaf Blocks
- #Levels + (#Leaf Blocks * MREADTIM / (MBRC * SREADTIM))
Note that I have seen Oracle report both access methods as an "INDEX FAST FULL SCAN" operation in an EXPLAIN PLAN (DBMS_STATS.DISPLAY_CURSOR). I wonder whether this is a bug within the labelling of operations within the Plan Table used?
Normally however indexes are used to implement a predicate condition, and restrict the number of rows being returned. To do this the index is traversed from the root block down through the levels of the index to the leaf blocks that contain the matching values. Not all leaf blocks will be visited - only those containing a matching entry for the predicate condition. The Oracle Optimizer terms this restriction on the number of matching entries the Filter Factor. It is the proportion of rows in the whole table that will match the predicate condition i.e. a value always less than 1. Thus the cost of using only the index itself is:
- #Levels + (#Leaf Blocks * Filter Factor)
- #Rows * Filter Factor
First, it is possible that matching data rows are stored in the same data block. In such cases the data block is only read once, even though multiple data rows are used from it. Such clustering of similar values will reduce the number of data blocks that need to be read i.e. the read of one data block from disk can satisfy multiple entries from the same leaf block. Oracle calculates this clustering as a statistic for each index, and uses this to report the number of data blocks that would need to be read – the Clustering Factor. This is visible in the CLUSTERING_FACTOR column in the USER_INDEXES and USER_IND_STATISTICS data dictionary views.
For many indexes it will be the case that the Clustering Factor is the same value as the number of Rows in the table – no co-location of data values for that particular column. But for some indexes, the Clustering Factor will be much lower, such as an index on the date the data was loaded, or an incrementing sequential account number. The Clustering Factor Statistic factors in the benefits of the reduced disk I/O from this co-location of Index and Data entries, to better estimate the real disk I/O that would actually take place.
The final full index cost formula is thus:
- #Levels + (#Leaf Blocks * Filter Factor) + (Clustering Factor * Filter Factor)
For a column with no Histograms (this is important), the Filter Factor for an equality condition will be the density of data values within that column - a measure of the proportion of data rows that have the same value for that column. This is true whether the condition uses a literal value or a bind variable. In turn the density value is simply the inverse of the number of distinct values in that column i.e. 1 / NDV. This count is available in the DISTINCT_KEYS column of USER_IND_STATISTICS and USER_INDEXES.
The Filter Factor can also be used to calculate the number of matching rows produced by the predicate condition i.e. the Cardinality of the result set. This is used by the Optimizer when multiple conditions are combined, either on the same table or joined tables.
Filter Factors on multiple conditions in a query are combined together according to the standard rules for probabilities:
- c1 AND c2 ==> FF1 * FF2
- c1 OR c2 ==> FF1 + FF2 – (FF1 * FF2) i.e. exclude duplicate matches
Examples
Using the index cost formula of:
- #Levels + (#Leaf Blocks * Filter Factor) + (Clustering Factor * Filter Factor)
- 1 + 250 * 0.001002 + 100000 * 0.001002 = 1 + 0.2505 + 100.2 = 101.4505