Tuesday 1 March 2016

Full Table Scan - Friend or Foe?

[Or Don't be afraid of Full Table Scans]

UPDATE 30 March 2016
I've come to realise that there are some mistakes and inaccuracies in this post [see this post for more details], so I've edited this post and corrected what I can.  Rather than leave behind incorrect claims, I've replaced them by the corrected text, so hopefully now this post no longer has its previous faults.

ORIGINAL POST:
Many people consider a Full Table Scan (FTS) in a query execution plan to be a bad thing - reading every record from a table to find only those records the query needs. This is indicated by a "TABLE ACCESS FULL" in an execution plan. But is it really all that bad? Is it actually sometimes the right tool for the job? Can a Full Table Scan sometimes have a lower cost than using an Index?

My view is that the Oracle Optimizer will only choose a FTS under the following two conditions:
  • There is no other possible access method to get the needed data from the table
    • No other access method is currently available, but another access method might be a lower cost if it was available
  • A FTS is a lower cost (and so should be faster) than all other possible access methods
    • The FTS is the "best" access method of all available and the Optimizer is correct to use it
In other words, when you see a "TABLE ACCESS FULL" in an execution plan you should not jump to the conclusion that this is wrong and "needs to be fixed", because maybe it is correct and it is indeed the best access method for that particular step of the query operation. If you have concerns about the FTS then you need to double check things to find out whether it is the right thing for the Optimizer to be choosing, or whether a better access method would reduce the execution cost. Jumping to the wrong conclusion can lead you down a dead end when trying to improve a query's performance.

So when you see a Full Table Scan in an execution plan you should check the query and all other relevant factors to see if a FTS is indeed the "best" access method to get that data because it has the lowest cost.  Don't assume that the Optimizer is wrong - normally it isn't. It is just as likely to be your query that is affecting the execution plan chosen.

Calculating the Cutover Point

The actual percentage or fraction of rows in the table that is the cutover point between the Optimizer using a Full Table Scan or an available Index is itself mainly dependent on the average number of rows stored per data block. The other main factor is the value being used by the Optimizer for Multi-Block Read Count (MBRC), which is part of the System Statistics stored in the database. It is possible to calculate this cutover point yourself on a given table to see when a FTS really is cheaper and when an Index might help.

For a specific query involving an equality filter on a column, the other factor is what fraction of the data rows in the table have the same value stored in them. Oracle maintains this as a statistic on each column named "Density". By comparing the Density value for a column against the fraction of rows in the table needed by the query, you can see whether a FTS would be cheaper or not for a filter on that column. Remember that a percentage is just a fractional decimal value multiplied by 100 - so a Density of 0.0025 means 0.25%.

First the value used for MBRC (Multi-Block Read Count). This is stored in the AUX_STAT$ table owned by SYS. Its value only gets set when you gather system statistics. If set, then that particular value is used. If not set then a default value of 8 is used (note that it seems to ignore the value of the initialization parameter db_file_multiblock_read_count). The Optimizer also uses the values for MREADTIM and SREADTIM (Multi-block read time and Single block read time) also in the AUX_STAT$ table. Again, if these have not been set then it will use a default formula to derive them from the values for IOSEEKTIM and IOTFRSPEED.

To try and keep this explanation short we can jump to the following formulae used when you have not gathered system statistics.
SREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE / IOTFRSPEED)  
MREADTIM = IOSEEKTIM + (MBRC * DB_BLOCK_SIZE / IOTFRSPEED) 
For the default values of MBRC (8), IOSEEKTIM (10) and IOTFRSPEED (4096) and a DB_BLOCK_SIZE of 8192 (8 KB) you get an SREADTIM value of 12 milliseconds, and a MREADTIM value of 26 milliseconds.

When a Full Table Scan occurs the Optimizer knows that it will be doing multi-block reads. However, it needs to cast or convert the cost of these reads into units of single block reads. This is because all other I/O costs are in terms of single block reads internal to the Optimizer. And the "cost" of a multi-block read is not the same as the "cost" of a single block read - a multi-block read should take longer given the greater number of blocks being transferred.

Instead of dividing the number of blocks in the table by the full MBRC value, it "adjusts" this value by the ratio of SREADTIM to MREADTIM, and then uses this value in the cost calculation.

The cost of a FTS would then be calculated as (note the brackets):
FTS Cost = #Blocks in Table / (MBRC * SREADTIM / MREADTIM)
FTS Cost = (#Blocks in Table * MREADTIM) / (MBRC * SREADTIM)
For an Index lookup to be cheaper than this FTS cost, we can calculate the fraction of rows in the table where the cost of using an Index would be slightly less than this FTS cost. For this we need to know the following values from the statistics Oracle has on the table and the particular column used in the equality filter:
  • Number of Rows in the table - NUM_ROWS in USER_TABLES or USER_TAB_STATISTICS
  • Number of Blocks used for the table - BLOCKS in USER_TABLES or USER_TAB_STATISTICS
  • Density of the column - DENSITY in USER_TAB_COLS or USER_TAB_COL_STATISTICS
A Full Table Scan is cheaper than using an Index lookup when:
Density > (BLOCKS * MREADTIM) / (MBRC * SREADTIM * NUM_ROWS)
When the Density of a column is less than this value then an Index lookup would be cheaper.

Conclusions

Although a Full Table Scan can seem a "brute force" approach to finding some matching records in a table, it can sometimes be the better way of doing it though. It depends on several factors including how many rows you want back from the table, the number of blocks for the table, and the Clustering Factor of any indexes.

It is possible that for even low percentages of data being retrieved from a table that a Full Table Scan can be the cheapest and best access method. Trying to force the Optimizer into using another access method in this circumstance is a waste of time, because all other access methods will be more expensive. The only exception might be another Full Scan type access, such as an Index Full Scan. But even then the gains (reduction in cost) will only be marginal i.e. not a full order of magnitude less.

When you see a FTS in a query execution plan you should check many things including, but not limited to, the number of rows in the table, the number of blocks used for the table, the Clustering Factor for each possible index, and the estimated row count for the filters being used. If the estimate is correct then a FTS is the lowest cost access method and the Optimizer is right to choose it. You should also check if your query is correct, or if there is something wrong with the filter conditions in it.

Tests

Lets show whether this holds true with some tests. There is quite a lot of output here, but I want to provide everything so that anyone else can reproduce these tests on their own systems.

Database version - 12.1.0.2.0
Operating System - Oracle Linux 7.2

Scan Test table - 10 million rows, with columns of different repeated values:
drop table scantest ;
--
prompt Loading data ....
create table scantest 
tablespace testdata 
as 
select r pkid
     , 1 one -- a constant, which forces actual data row access
     , mod (r, 10)     pct10   -- 10 values     = 10% of data in table
     , mod (r, 20)     pct5    -- 20 values     = 5% of data in table
     , mod (r, 50)     pct2    -- 50 values     = 2% of data in table
     , mod (r, 100)    pct1    -- 100 values    = 1% of data in table
     , mod (r, 200)    pct05   -- 200 values    = 0.5% of data in table
     , mod (r, 500)    pct02   -- 500 values    = 0.2% of data in table
     , mod (r, 1000)   pct01   -- 1,000 values  = 0.1% of data in table
     , mod (r, 2000)   pct005  -- 2,000 values  = 0.05% of data in table
     , mod (r, 5000)   pct002  -- 5,000 values  = 0.02% of data in table
     , mod (r, 10000)  pct001  -- 10,000 values = 0.01% of data in table
     , mod (r, 20000)  pct0005 -- 20,000 values = 0.005% of data in table
     , mod (r, 50000)  pct0002 -- 50,000 values = 0.002% of data in table
  from (select rownum r
          from (select rownum r from dual connect by level <= 1000) a,
               (select rownum r from dual connect by level <= 1000) b,
               (select rownum r from dual connect by level <= 1000) c
         where rownum <= 10000000) ;
--
prompt Gathering Statistics ....
exec dbms_stats.gather_table_stats ('JOHN', 'SCANTEST')
--
prompt Creating Indexes ....
create unique index ix_scan_pkid on scantest (pkid) ;
create index ix_scan_pct2 on scantest (pct2) ;
create index ix_scan_pct1 on scantest (pct1) ;
create index ix_scan_pct05 on scantest (pct05) ;
create index ix_scan_pct02 on scantest (pct02) ;
create index ix_scan_pct01 on scantest (pct01) ;
This produces a table with the following statistics:
                                TABLE STATISTICS

Table                           %F  IT           In Ext         Next Ext  %I
------------------------------ --- --- ---------------- ---------------- ---
SCANTEST                        10   1           65,536        1,048,576

                                     Avg Spc
    Num Rows       Blocks E Blocks  Free/Blk Chains Avg Row Len
------------ ------------ -------- --------- ------ -----------
  10,000,000       80,951        0       .00      0      53.000

Table                           Num Extents       Blocks   Avg Blocks
------------------------------ ------------ ------------ ------------
SCANTEST                               151        81,920          543

                                  Leaf         Distinct       Clustering
Index                  Height   Blocks             Keys           Factor
-------------------- -------- -------- ---------------- ----------------
IX_SCAN_PKID                3   22,132       10,000,000           80,528
IX_SCAN_PCT2                3   19,503               50        4,026,368
IX_SCAN_PCT1                3   19,518              100        8,052,718
IX_SCAN_PCT05               3   20,212              200       10,000,000
IX_SCAN_PCT02               3   20,629              500       10,000,000
IX_SCAN_PCT01               3   20,768            1,000       10,000,000

Full Table Scan cost is calculated as follows:
FTS Cost = (BLOCKS * MREADTIM) / (MBRC * SREADTIM)
On my system MREADTIM and SREADTIM are not set, and the others have default values in the system statistics in AUX_STAT$ (IOSEEKTIM = 10, IOTFRSPEED = 4096), so using the formula from before this gives SREADTIM of 12 ms and MREADTIM of 26 ms.

Plugging these values into the previous formula gives about 22,000 as the cost for the FTS:
(80951 * 26) / (8 * 12) = 2104726 / 96 = 21,924.23 
We can calculate the cutover point up to which a FTS would be cheaper than an Index Scan using the formula given before:
Density > (BLOCKS * MREADTIM) / (MBRC * SREADTIM * NUM_ROWS)
Density > (80,951 * 26) / (8 * 12 * 10,000,000) = 0.002192 or 0.22% approximately
Remember that this is based on several assumptions (Index Clustering Factor) and simplifications (ignoring CPU costs). Thus the cutover point will not be a precise value of 0.0022 (0.22%) but something around this value.

Having calculated the FTS cost at about 22,000 we can also calculate the expected Index Scan costs, and see whether they do drop below the FTS cost under 0.22% of the data in the table.

An Index Scan cost has 2 components:
Index Access Cost = Levels + (Leaf Blocks * Filter Factor)
 Data Access Cost = Clustering Factor * Filter Factor
The Filter Factor is the selectivity of a single value for an equality predicate filter, being the Density of the column, or one over the Number of Distinct Values. In real terms it is the percentage of rows being retrieved, which in our test table is indicated by the column name.

For all indexes the number of Levels is 3 and the number of Leaf Blocks is about 20,500. Yes, it does vary but it is not significantly different for the tests we are doing here. Also from index PCT05 onwards the Clustering Factor is always 10 million i.e. it is not clustered and it is the row count in the table.

Column Name% of Rows returnedFraction of RowsIndex CostData CostTotal Cost
pct11%0.013 + (20,500 * 0.01) = 3 + 205 = 2088,000,000 * 0.01 = 80,00080,208
pct050.5%0.0053 + (20,500 * 0.005) = 3 + 102.5 = 105.510M * 0.005 = 50,00050,106
pct020.2%0.0023 + (20,500 * 0.002) = 3 + 41 = 4410M * 0.002 = 20,00020,044
pct010.1%0.0013 + 20.5 = 23.510M * 0.001 = 10,00010,024

Our calculations confirm that the expected cost of an Index Scan should drop below that of a Full Table Scan when less than 0.22% of the data in the table is being selected.

Does this bear out in practise? Will Oracle switch from a Full Table Scan to an Index Scan when the fraction of rows requested drops below 0.0022? Lets see.

Note: The buffer cache and shared pool were flushed from a SYSDBA session before each query execution, so the buffer cache was empty in each case. And we are only interested in the I/O statistics from the query execution - other statistics such as Redo and SQL*Net bytes sent are not relevant and have been removed.

Query 1:
set autotrace on statistics 
--
select sum (one) from scantest where pct2 = 1 ;

  SUM(ONE)
----------
    200000

Statistics
----------------------------------------------------------
      80704  consistent gets
      80541  physical reads
          2  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          1  rows processed
Execution Plan (from dbms_xplan.display_cursor for the SQL_ID):
select sum (one) from scantest where pct2 = 1

Plan hash value: 1745049784

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |       |       | 22028 (100)|          |
|   1 |  SORT AGGREGATE    |          |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS FULL| SCANTEST |   200K|  1171K| 22028   (1)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PCT2"=1)
Note:
  • 2% of 10 million rows is 200 thousand, which the Optimizer has correctly estimated.
  • Full table scan cost is 22,028. This is very close to my own estimate of about 22,000. The missing cost component would be for CPU work to filter each row.
  • 80,541 physical reads occurred, which is very close to the 80,951 blocks reported in the table.

Query 2:
Repeat this with a lower percentage column - pct05
select sum (one) from scantest where pct05 = 1 ;

  SUM(ONE)
----------
     50000

1 row selected.

Statistics
----------------------------------------------------------
      80704  consistent gets
      80541  physical reads
          2  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          1  rows processed

select sum (one) from scantest where pct05 = 1

Plan hash value: 1745049784

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |       |       | 22040 (100)|          |
|   1 |  SORT AGGREGATE    |          |     1 |     7 |            |          |
|*  2 |   TABLE ACCESS FULL| SCANTEST | 50000 |   341K| 22040   (1)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("PCT05"=1)
Note:
  • Rows estimate is now 50,000, as expected
  • Full table scan cost is slightly different for some unknown reason, but still about 22,000.
  • Same number of physical reads occurred - 80,541.

Query 3:
Continue down to the next lower percentage column - pct02
select sum (one) from scantest where pct02 = 1 ;

  SUM(ONE)
----------
     20000

1 row selected.

Statistics
----------------------------------------------------------
  20212  consistent gets
  20060  physical reads
      2  SQL*Net roundtrips to/from client
      6  sorts (memory)
      0  sorts (disk)
      1  rows processed

select sum (one) from scantest where pct02 = 1

Plan hash value: 3458310886

------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |               |       |       | 20049 (100)|          |
|   1 |  SORT AGGREGATE                      |               |     1 |     7 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SCANTEST      | 20000 |   136K| 20049   (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | IX_SCAN_PCT02 | 20000 |       |    44   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PCT02"=1)
Note:
  • Now 0.2% of data in this table, which is low enough that an Index Scan should be cheaper
  • Index cost is indeed lower than a Full Table Scan, as expected - 20,049 versus 22,040
  • Index cost is very close to our own calculation of 20,044 - again CPU cost is the difference
  • Only 20,060 physical reads now - Index branch blocks + Leaf blocks + Data rows

Query 4:
Repeat this with a lower percentage column - pct01
select sum (one) from scantest where pct01 = 1 ;

  SUM(ONE)
----------
     10000

1 row selected.

Statistics
----------------------------------------------------------
      10192  consistent gets
      10038  physical reads
          2  SQL*Net roundtrips to/from client
          6  sorts (memory)
          0  sorts (disk)
          1  rows processed

select sum (one) from scantest where pct01 = 1

Plan hash value: 1707962624

------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |               |       |       | 10025 (100)|          |
|   1 |  SORT AGGREGATE                      |               |     1 |     7 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SCANTEST      | 10000 | 70000 | 10025   (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | IX_SCAN_PCT01 | 10000 |       |    23   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PCT01"=1)
Note:
  • Index cost is halved because row count is halved - 10,025 versus 10,024 calculated
  • Only 10,038 physical reads now - Index branch blocks + Leaf blocks + Data rows

Query 5:
If we force the use of an index for the query on the PCT05 column using a hint we get the following:
select /*+ index (scantest (pct05)) */ sum (one) from scantest where pct05 = 1 ;

  SUM(ONE)
----------
     50000

Statistics
----------------------------------------------------------
  50271  consistent gets
  50133  physical reads
      2  SQL*Net roundtrips to/from client
      6  sorts (memory)
      0  sorts (disk)
      1  rows processed

select /*+ index (scantest (pct05)) */ sum (one) from scantest where pct05 = 1

Plan hash value: 3145193111

------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |               |       |       | 50116 (100)|          |
|   1 |  SORT AGGREGATE                      |               |     1 |     7 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SCANTEST      | 50000 |   341K| 50116   (1)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN                  | IX_SCAN_PCT05 | 50000 |       |   104   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("PCT05"=1)
Note how the estimated cost is 50,116, which is far greater than the 22,028 of a full table scan, and agrees with my earlier calculation of 50,106 for this index. Instead of the 80,000 physical disk reads done before, only 50,000 have been done. However, the 80,000 disk reads were really multi-block disk reads i.e. nearer 10,000 real disk reads would have been done, each for 8 disk blocks at once. The 50,000 disk reads for the index execution plan would be single block disk reads.

No comments: