Monday 21 September 2009

Data Modelling & Scalability

For what I call Enterprise Class Applications, scalability is an important factor. Scalability can occur in multiple directions - increases in user count, transaction volume, database size. These can occur individually or in combination together. The application must perform well when any of these scales in size, and performance must not degrade severely. For this to be true, scalability must be designed in to the application from the beginning. You cannot simply add scalability onto an application later on, much like you cannot just add security on. There is no magical sticking plaster that you can apply to an existing application and suddenly have it scale wonderfully. If there was, then everyone would be doing it.

No, scalability must be designed into the application and the way it works. This leads to the corollary that only good designs scale well, and bad ones don't. Skipping the database design phase in any application development can only result in a poor design, with associated poor scalability. At some point as the workload increases on the application, performance will hit the dreaded knee and throughput will level off and response times will increase.

Poor performance results from things like lack of primary and foreign keys, general purpose tables with a type field indicating which data fields are relevant, use of key fields that only have a few values ('Y' or 'N' values are typical of this), outer joins between tables where the child records may not exist. Any of these can result in weak SQL when used in a query, and poor execution as a result.

So if the scalability of your application is important to you, and it may not be important for everyone, then make sure you design your database properly rather than just implementing the first set of tables that comes into someones mind. And this design requires capturing information about all of the entities that need to be stored, and modelling their relationships both at a high, conceptual level and then at a detailed low, physical level.

To ensure all entities and their relationships are captured, clearly requires knowledge about all of the entities within the application. But in an iterative development, as typified by Agile, not all of the entities are known at the beginning when the first version of the database design is needed. So how does Agile deal with this? It is the one thing lacking from the documentation I have seen on Agile so far - how do you produce a good, scalable database design when it all takes place in an iterative development environment and when the full requirements and data entities are not yet known?

I've heard of Agile projects delivering poor database designs from other database architects and administrators, and know of one specific one at the moment where no database skilled person has been involved at all.

For me this is a key sticking point with what I have seen of Agile Development, especially the aspect of delaying decisions for as long as possible. There seems to be this perception that the database design "can be finished off later on", after more of the application has been developed. But when the database design becomes an issue and needs to be changed, it is too late, as the time and effort involved in changing and rewriting the application to reflect these database changes is too great.

Given that a scalable application requires a good database design to store its data in, why is it that database design seems missing from most descriptions of Agile Development? It may get mentioned every once in a while, in general lists of development tasks, but I have seen very little on the specifics of what I would term "Agile Database Design".

My concern is that by ignoring database design within Agile Development, it will most of the time result in a poor database design, and so poor application scalability.

Either way it seems to me that to ensure you do end up with a good, scalable database design, you need a Data Modeler within the development team, responsible for the database design. All database changes should be routed through the Data Modeler, rather than a developer making them directly themselves.

Tuesday 15 September 2009

Reported Execution Plan Anomalies

In doing some tests of the effects of different indexes on execution plans, I came across a small anomaly with the way the execution plan was being reported. I was using the Autotrace feature of SQL*Plus to report the execution plan after a SQL query was executed. I noticed that the costs reported for each step were not completely true, although the overall execution total cost was correct.

The SQL query I was playing with was:

select count (*) from (
select t1.i1, t1.i2, t1.c1, t2.i1, t2.i2, t2.c1
from t1 join t2 on t1.i3 = t2.i1
where t1.i2 = 111
and t2.i4 = 11
)

With indexes on the I1 columns in both tables, and the I3 column of table T1 (foreign key). Oracle will therefore scan T2 and join to T1 using the index on I3, as seen in this reported plan:

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 18 | 1236 (1)| 00:00:15 |
| 1 | SORT AGGREGATE | | 1 | 18 | | |
|* 2 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 9 | 102 (0)| 00:00:02 |
| 3 | NESTED LOOPS | | 10 | 180 | 1236 (1)| 00:00:15 |
|* 4 | TABLE ACCESS FULL | T2 | 20 | 180 | 195 (1)| 00:00:03 |
|* 5 | INDEX RANGE SCAN | I2_T1 | 100 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("T1"."I2"=111)
4 - filter("T2"."I4"=11)
5 - access("T1"."I3"="T2"."I1")

As can be seen, the access to the data row of table T1 is performed after the join to T2 is complete i.e. outside and after the NESTED LOOP join has completed. However, the reported cost of 1236 for the Nested Loop is incorrect. It should be 236, based on the other reported values for cost and cardinality (row count). The cost of a Nested Loop join is the cost of the inner, first table access, plus the cost of the outer, second table access multiplied by the number of rows retrieved from the inner, first table (cardinality):

195 + (20 * 2) + CPU = 195 + 40 + 1 = 236.

Likewise the row count of the T1 table access is incorrectly shown as 1, when it should be 10, the same as the Nested Loop count. The cost of accessing data rows in T1 that match a single value in the I2_T1 index is about 100. (This can be verified by looking at the values reported in the 10053 trace file for Clustering Factor and Filter Factor, or deriving these manually). The cost for accessing all necessary data rows in T1 as a result of the Nested Loop is the single access cost times the cardinality of the number of times it is performed i.e. 100 * 10 = 1000.

Thus the total cost for the two table join query is 236 (NL) + 1000 (T1 data) = 1236. And indeed the total cost of the query is correctly reported as this. It is just the 2 inner steps of NESTED LOOPS and TABLE ACCESS BY INDEX ROWID that have their costs and row count incorrectly reported, causing this anomaly.

I guess this anomaly is caused by the execution plan moving the table access to outside the nested loop, rather than inside where it most often occurs. If the table access was instead inside the nested loop, between steps 4 and 5 as the parent of 5, then the reported costs would be correct. Most probably the Optimizer arrived at the original plan, then modified the execution steps when it realised it could move the table access outside of the nested loop. Unfortunately, it did not adjust the cost values for this, and so they are mis-reported.

Monday 7 September 2009

Index Selectivity Analysis

Why does Oracle sometimes not use an index when executing a SQL query? Even when the index includes most of the columns from a table explicitly referenced in the query? The obvious answer is that using the index does not produce a lower cost than other access methods, such as a full table scan. The Optimizer has calculated that the number of disk I/Os needed to use the index (depth of index plus matching leaf blocks) and then get the data blocks (number of matching rows) would be greater than the number of disk I/Os needed to do a full table scan with multi-block reads.

In a case like this, such an index will almost never be used by Oracle even though the index includes columns directly referenced in the WHERE clause of a query. This is because the number of estimated disk I/Os to use it will always be greater than the number of disk I/Os to do a full table scan. There might be exceptions in cases where the data is very skewed and you have a histogram on the columns in the index, and you are selecting a value with a low occurrence count. But generally such an index is unlikely to be used by any queries you run.

Which raises the question, how can we find out which indexes in our database are selective enough to be used by a query before we actually run those queries? Or conversely, which indexes are not selective enough given the data in the database and will never be used? It is a bit too late to find out when we are running a query and Oracle is doing a full table scan and taking a long time to do it. Of course we could do an EXPLAIN PLAN of the query beforehand, but execution plans can change over time dependent on the volumes of data in a table, and the distribution of data values.

We would like to have a query that we can run on a complete database, and which will tell us which indexes are not selective given the actual data in the tables. This would flag weak indexes to us, regardless of which queries might be running i.e. indexes that are unlikely to be chosen by the Optimizer even if all columns in them are referenced in the query.

How would such a query work? Well, we know how the Optimizer costs the different access methods to data in a table (see previous posts on Full Table Scans and Index Scans). We can calculate each of these for a table and an index, and see if the table scan cost is lower than the index access cost. In fact, using a bit of maths we can manipulate the formula for these two costs together in a way that results in a more direct comparison. If the maths bit is too much for you then just skip ahead to the final formula followed by the SQL to run it.

The costs of each type of access are:
  • FTS Cost = (Blocks * mreadtim) / (MBRC * sreadtim)
  • Index Cost = Levels + (Leaf Blocks + Clustering Factor) * Filter Factor
And an index will be ignored when:
  • Index Cost > Full Table Scan Cost
Assuming that the table is large, then the number of blocks in the index will be far greater than the depth or number of levels in the index tree, so we can ignore this. This gives the relationship between the two that an index will be ignored when (approximately):
  • (Leaf Blocks + Clustering Factor) * Filter Factor > (Blocks * mreadtim) / (MBRC * sreadtim)
Dividing both sides gives:
  • Filter Factor > (Blocks * mreadtim) / (MBRC * sreadtim * (Leaf Blocks + Clustering Factor))
Now when there is no histogram on the columns or no data skew then the Filter Factor will be the inverse of the Number of Distinct Values or Keys i.e.
  • Filter Factor = 1 / Distinct Keys
Inverting both sides, and the relationship too gives us
  • Distinct Keys < (MBRC * sreadtim * (Leaf Blocks + Clustering Factor)) / (Blocks * mreadtim)
All of these values are stored in the Data Dictionary of the database in one place or another. So we can write a single SQL query that can calculate both sides of the comparison directly from the database itself. Note that this assumes that the statistics on your tables and indexes are up to date, and reflect the true data distributions in the tables.

To make some things easier we can group together values that are table and index independent, finally giving us that for the index to be ignored the following must be true:
  • Distinct Keys < (MBRC * sreadtim / mreadtim) * (Leaf Blocks + Clustering Factor) / Blocks
We can calculate the expression "(MBRC * sreadtim / mreadtim)" once, and then use this value with the other values on each index on each table in a database. We can then list all the indexes that fail the test i.e. indexes that will be ignored because their cost of access is greater than a full table scan.

For a system with no System Statistics gathered, which is actually the majority of cases, then the following calculations are used for the disk read times:
  • sreadtim = ioseektim + (db_block_size / iotfrspeed)
  • mreadtim = ioseektim + (db_file_multiblock_read_count * db_block_size / iotfrspeed)
By default ioseektim will be 10 milliseconds and iotfrspeed will be 4 KB / millisecond.

This means that the only non-table dependent values in the fully expanded formula are for db_block_size and db_file_multiblock_read_count. These can only be obtained directly from the Oracle database if the user has SELECT permission on the dynamic performance views (V$PARAMETER specifically). And generally this is not true for normal users. To work around this, the SQL script below simply prompts the user to enter these two values.

The SQL script then calculates the derived values, and then uses these on all indexes on all tables to list those that fail the test i.e. number of distinct values in the index is too low.

As mentioned earlier, we can only ignore the depth of the index when the table is large enough, so the SQL includes an additional constraint on the number of blocks in a table, which is hardcoded at 100 blocks. You could easily change this to another value if desired.

This is a SQL*Plus script and must be run via SQL*Plus, as it uses SQL*Plus features such as substitution variables. And remember that your statistics must be up to date on each table and index in the database.

--
-- Check which indexes on all tables are better or worse than a full table scan
-- When the number of distinct values in a column is low, a full table scan
-- is more efficient (less I/Os) than using an index.
--
set define on
set verify off
set heading off
--
prompt
prompt Report to show which indexes are selective enough and which are not.
prompt Non-selective indexes will be ignored and full table scans preferred.
prompt

define small_table_threshold = 100

accept block_size prompt 'Enter database block size in bytes (e.g. 8192) : '
accept mbrc prompt 'Enter db_file_multiblock_read_count value (e.g. 8): '

prompt Using following database wide configuration settings:
prompt : Database Block size is &&block_size (bytes)
prompt : db_file_multiblock_read_count is &&mbrc
prompt : Threshold for a small table is &&small_table_threshold (blocks)
prompt : (Smaller tables than this are ignored in this report)
prompt : [If these are incorrect, edit this script and re-run]

column sreadtim new_value sreadtim
column mreadtim new_value mreadtim
column effective_mbrc new_value effective_mbrc

set termout off
-- Hide from screen internal calculations of derived values
select 10 + (&&block_size / 4096) sreadtim from dual ;
select 10 + (&&mbrc * &&block_size / 4096) mreadtim from dual ;
select &&mbrc * &&sreadtim / &&mreadtim effective_mbrc from dual ;
set termout on

prompt
prompt Assuming that system wide statistics have these derived values:
prompt : Single Block Read Time used is &&sreadtim (milliseconds)
prompt : Multi Block Read Time used is &&mreadtim (milliseconds)

set heading on
column table_name heading 'Table' format a20
column index_name heading 'Index Name' format a20
column distinct_keys heading 'Number of|Distinct|Values' format 999,999
column min_ndv heading 'Min|NDV' format 999,999
column selectivity heading 'Selectivity'
column sel heading 'Sel?'
column blocks heading 'Blocks' format 999,999
column mb heading 'Table|MB' format 999,999

select table_name, index_name, distinct_keys, min_ndv,
case
when (distinct_keys < min_ndv)
then 'N'
else 'Y'
end sel,
(blocks * &&block_size / (1024 * 1024)) mb
from ( select t.table_name, t.blocks, t.num_rows num_trows,
i.index_name, i.num_rows num_irows, i.distinct_keys,
&&effective_mbrc *
((i.leaf_blocks + i.clustering_factor) / t.blocks) min_ndv
from user_ind_statistics i, user_tables t
where t.table_name = i.table_name
and i.num_rows != 0
)
where distinct_keys < min_ndv
and blocks >= &&small_table_threshold
order by table_name, index_name ;

prompt Selectivity (S) indicates whether index will be chosen or not in a query
prompt given the values of the database settings you have entered.
prompt

Running this should list indexes that will NOT generally be used by the Optimizer for any query you might run that references that table. There might be exceptional cases where the Optimizer might use such an index e.g. an Index Fast Full Scan, but generally such indexes would not be used as expected for direct access to the matching rows.

As ever, remember that higher values for db_file_multiblock_read_count lower the cost for a full table scan, and so lower the threshold at which a full table scan becomes cheaper than an index access.