Monday, 28 October 2013

AWR Summary Reporting #5 - System Statistics

So far our AWR report query shows overall duration between two consecutive snapshots, database time (i.e. doing work), SQL execution time (may be lower), total wait time, wait time by class, and some common specific wait events. We know how busy the database was and how much time it spent waiting and on what. Now I want to add in some more data captured by the AWR on system activity. This series is dragging on a bit, so I'm going to try and keep this one shorert and just get it posted out.

AWR snapshots V$SYSSTAT into DBA_HIST_SYSSTAT. This contains individual system statistics with cumulative running totals for each. These are stored as Name / Value pairs in each table. Most statistics are just counts of event occurrences e.g. "physical reads", "physical writes", but some have been added that are time values. This lets you calculate average time per event when you have the corresponding event count as well. Beware that many of the "time" values are in odd units, such as hundredths of seconds (1/100 second), and will need to be adjusted if you want to report in standard units of time.

Statistics recorded include the following, which I've tried to group together into related sets. The details don't really matter on these, as the point of these AWR Summary Reports is to pull out as much as we can at once from a database for later analysis. Some of these statistics might only be relevant in specific scenarios - but by pulling them out at the same time as others, we already have them when they are needed.
  • SQL statement execution - how many? Elapsed Time is in the Database Time Model data set
    • "execute count" - SQL statements executed, both user and recursive
    • "user calls" = Number of user calls such as login, parse, fetch, or execute.
    • "recursive calls"
  • Transactions = units of work = groups of related SQL statements
    • "user commits" = commit count
    • "user rollbacks" = rollback count = abandoned transactions
  • Average Redo Write Time - how long to write to the redo log?
    • "redo write time" = Total write time in 100'ths of seconds aka. 10's of milliseconds
    • "redo writes" = Number of redo writes that occurred
    • Also "redo blocks written" - Divide by "redo writes" to get average blocks per write
    • Can break redo writes down into those due to a "commit" and others (log buffer full enough to warrant a write)
    • Commit == Redo Synch -> "redo synch time" / "redo synch writes"
  • Disk I/O - how many disk operations? Elapsed time is in the Wait Events data set
    • "physical reads" = Physical Reads
    • "physical writes" - Physical Writes
  • Logical reads = "consistent gets" + "db block gets"
    • Note that "session logical reads" is defined as 'The sum of "db block gets" plus "consistent gets"', but might also include logical reads from 'process private memory'.
Because the statistics are stored as Name / Value pairs, each statistic is in a separate row rather than a separate column in the same row as other statistics. We can use a sub-query that retrieves each change in statistic value for each AWR snapshot, and then use this within the main query in an in-line view for each statistic with an appropriate alias name.
, sysstats as 
-- One row per System Statistic with change in value between snapshots
(select sysstat.snap_id
      , sysstat.dbid
      , sysstat.stat_name
      , sum (sysstat.value - psysstat.value) value
         from dba_hist_sysstat sysstat, dba_hist_sysstat psysstat
        where sysstat.snap_id = psysstat.snap_id + 1
          and sysstat.dbid = psysstat.dbid
          and sysstat.instance_number = psysstat.instance_number
          and sysstat.stat_id = psysstat.stat_id
-- Assume if stat_id the same so is the stat_name
        group by sysstat.snap_id, sysstat.dbid, sysstat.stat_name
) -- sysstats

select to_char (snaps.end_snap_time, 'DD/MM/YY HH24:MI') snap_time
...
     , execs.value executions
     , user_calls_st.value user_calls
     , user_commits_st.value user_commits
     , redo_write_st.value redo_writes
     , redo_time_st.value / 100 redo_write_time
     , phys_reads.value physical_reads
     , phys_writes.value physical_writes
     , table_scans_st.value table_scans
from
...
     join (select * from sysstats where stat_name = 'execute count') execs
       on snaps.snap_id = execs.snap_id and snaps.dbid    = execs.dbid
     join (select * from sysstats where stat_name = 'user calls') user_calls_st
       on snaps.snap_id = user_calls_st.snap_id and snaps.dbid = user_calls_st.dbid
     join (select * from sysstats where stat_name = 'user commits') user_commits_st
       on snaps.snap_id = user_commits_st.snap_id and snaps.dbid  = user_commits_st.dbid
     join (select * from sysstats where stat_name = 'redo writes') redo_write_st
       on snaps.snap_id = redo_write_st.snap_id and snaps.dbid  = redo_write_st.dbid
     join (select * from sysstats where stat_name = 'redo write time') redo_time_st
       on snaps.snap_id = redo_time_st.snap_id and snaps.dbid  = redo_time_st.dbid
     join (select * from sysstats where stat_name = 'physical reads') phys_reads
       on snaps.snap_id = phys_reads.snap_id and snaps.dbid    = phys_reads.dbid
     join (select * from sysstats where stat_name = 'physical writes') phys_writes
       on snaps.snap_id = phys_writes.snap_id and snaps.dbid  = phys_writes.dbid
     join (select * from sysstats where stat_name = 'table scans (long tables)') table_scans_st
       on snaps.snap_id = table_scans_st.snap_id and snaps.dbid  = table_scans_st.dbid
...
This means that we can now correlate overall system performance (Database Time and Wait Time) against other activities on the system and drill down further. Using the template given above you can extract any System Statistic you are interested in, and it is easy to add other ones in the future when you need them.

Tuesday, 15 October 2013

AWR Summary Reporting #4 - More Waits

Given the previous sub-query for total wait times including class level waits from the AWR snapshot data, we are going to extend that particular sub-query in order to add some extra wait event data items to it. This will give us a further breakdown of what the major wait events were, and how much time was spent waiting versus doing real work. The upside to these changes are that we will be able to pull out some other wait event times from the AWR snapshots in the same query, for little extra cost or increase in complexity. The downside is that the number of data items being selected for output will increase a lot, and this may start to look like a "mother of all database monitoring queries". However, the query itself is not more complex, and the intention is to get as much data as we can from each AWR snapshot in one hit to allow us to analyse it later outside Oracle (typically in a spreadsheet).

But before we do that I want to refactor that sub-query to make some of the later additions to it easier and simpler. Remember that we had just one sub-query, with the wait time always being calculated by subtracting the current snapshot's wait time from the previous snapshot's time. I want to refactor this by pushing this subtraction down into a nested sub-query, so that the rest of the wait sub-query becomes simpler.

So before we had elements like:
, sum (sysevent.time_waited_micro - psysevent.time_waited_micro) all_wait_time
, sum (decode (sysevent.wait_class, 'Commit', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) commit_time
The refactor changes this to:
, syswaits as 
-- One row for total wait time, plus break down into major wait classes, and events
(select syswaitevents.snap_id
      , syswaitevents.dbid
      , sum (syswaitevents.time_waited_micro) all_wait_time
      , sum (decode (syswaitevents.wait_class, 'Commit', syswaitevents.time_waited_micro, 0)) commit_time
...
   from 
        (select sysevent.snap_id
              , sysevent.dbid
              , sysevent.wait_class
              , sysevent.event_name
              , sum (sysevent.time_waited_micro - psysevent.time_waited_micro) time_waited_micro
              , sum (sysevent.total_waits - psysevent.total_waits) wait_count
           from dba_hist_system_event sysevent, dba_hist_system_event psysevent
          where sysevent.snap_id = psysevent.snap_id + 1
...
            group by sysevent.snap_id, sysevent.dbid, sysevent.wait_class, sysevent.event_name
        ) syswaitevents
  group by syswaitevents.snap_id
         , syswaitevents.dbid
) -- syswaits
The key points are:
  • The inner sub-query groups by wait class and individual event name, as well as by snapshot and database id, and sums the wait time to this granularity.
  • It also sums the total occurrences of each wait event, for further use in the outer query.
  • The outer sub-query can now just refer to "syswaitevents.time_waited_micro" as the elapsed wait time, as the subtraction between the current and previous snapshot values are done in the inner sub-query.
  • The outer sub-query still does the "decode" of the class name to extract individual sums for each class desired
Previously I said that there were too many individual events to report them all - around 100 individual wait events. However, experienced DBA's find that they do tend to run across a very common set of wait events that occur most often on "slow" systems. There is a strong enough argument for extracting and reporting the wait times for some individual wait events that you consider to be the "most common" wait events.

The benefit is that if any one of these particular wait event times is very high, then we already have extracted it and reported it for analysis and can see it in the output. If all of these particular wait event times are low, then there was almost no extra cost involved in retrieving them at the same time as the other wait event times (there is no extra disk I/O, only some extra CPU to process the wait time data). It avoids the need to run another query afterwards to drill down into the wait event data, and if the extra wait data is irrelevant then we can just ignore it. And the mechanism is the same one as for the wait classes, using "decode" against the event name.

The top wait events that I am going to include are listed below. It's not a large list as I want to focus on the query, not the specific wait events. You can disagree with some of these, or want to add some others of your own. You can do this just by modifying the query I give below and changing the event names used in the "decode" functions.
  • db file sequential read
  • db file scattered read
  • direct path read
  • log file sync
  • log file parallel write
  • gc buffer busy
  • gc cr block busy
As well as the total wait time for each event, we would also want to know the number of waits that occurred so we can calculate the average time per wait. This is important as it might not be the number of waits that is the problem but the average time per wait. We can achieve a "count" by doing a "sum" using one (1) when the wait event name matches, and zero (0) otherwise. This keeps it uniform with the wait time per event.

The only changes to the refactored sub-query given earlier are the addition of these extra data items to the "select" output list:
      , sum (decode (syswaitevents.event_name, 'db file sequential read', syswaitevents.time_waited_micro, 0)) db_file_sqrd_time
      , sum (decode (syswaitevents.event_name, 'db file sequential read', 1, 0)) db_file_sqrd_count
      , sum (decode (syswaitevents.event_name, 'db file scattered read',  syswaitevents.time_waited_micro, 0)) db_file_sctrd_time
      , sum (decode (syswaitevents.event_name, 'db file scattered read',  1, 0)) db_file_sctrd_count
      , sum (decode (syswaitevents.event_name, 'direct path read',        syswaitevents.time_waited_micro, 0)) direct_path_read_time
      , sum (decode (syswaitevents.event_name, 'direct path read',        1, 0)) direct_path_read_count
      , sum (decode (syswaitevents.event_name, 'log file sync',           syswaitevents.time_waited_micro, 0)) log_file_sync_time
      , sum (decode (syswaitevents.event_name, 'log file sync',           1, 0)) log_file_sync_count
      , sum (decode (syswaitevents.event_name, 'log file parallel write', syswaitevents.time_waited_micro, 0)) log_file_prl_wr_time
      , sum (decode (syswaitevents.event_name, 'log file parallel write', 1, 0)) log_file_prl_wr_count
      , sum (decode (syswaitevents.event_name, 'gc buffer busy',          syswaitevents.time_waited_micro, 0)) gc_buff_busy_time
      , sum (decode (syswaitevents.event_name, 'gc buffer busy',          1, 0)) gc_buff_busy_count
      , sum (decode (syswaitevents.event_name, 'gc cr block busy',        syswaitevents.time_waited_micro, 0)) gc_cr_blk_busy_time
      , sum (decode (syswaitevents.event_name, 'gc cr block busy',        1, 0)) gc_cr_blk_busy_count
Then we need to add these extra columns to the output of the main "select", remembering to divide times in microseconds by one million:
     , syswaits.db_file_sqrd_time     / 1000000 db_file_sqrd_time
     , syswaits.db_file_sctrd_time    / 1000000 db_file_sctrd_time
     , syswaits.direct_path_read_time / 1000000 direct_path_read_time
     , syswaits.log_file_sync_time    / 1000000 log_file_sync_time
     , syswaits.log_file_prl_wr_time  / 1000000 log_file_prl_wr_time
     , syswaits.gc_buff_busy_time     / 1000000 gc_buff_busy_time
     , syswaits.gc_cr_blk_busy_time   / 1000000 gc_cr_blk_busy_time
     , syswaits.db_file_sqrd_count     db_file_sqrd_count
     , syswaits.db_file_sctrd_count    db_file_sctrd_count
     , syswaits.direct_path_read_count direct_path_read_count
     , syswaits.log_file_sync_count    log_file_sync_count
     , syswaits.log_file_prl_wr_count  log_file_prl_wr_count
     , syswaits.gc_buff_busy_count     gc_buff_busy_count
     , syswaits.gc_cr_blk_busy_count   gc_cr_blk_busy_count
Note that I have chosen to keep the times together and the counts together, rather than alternate between them for each event in turn. This particular ordering makes it easier to copy an "average wait time" formula from one cell to another in a spreadsheet, as each referenced column shifts by one column in each case.

With these columns added, the output from my AWR summary report becomes:
SNAP                SNAP       DB SQL EXEC ALL WAIT  BG WAIT   COMMIT  CLUSTER CONCRNCY  NETWORK   SYS IO  USER IO  DBF SEQ DF SCTRD DRCT PTH  LOG FILE LG FL PR  GC BUFF GC CRBLK    DBF SEQ   DF SCTRD   DRCT PTH   LOG FILE   LG FL PR    GC BUFF   GC CRBLK
TIME            DURATION     TIME     TIME     TIME     TIME  WAIT TM  WAIT TM  WAIT TM  WAIT TM  WAIT TM  WAIT TM  RD TIME  RD TIME  RD TIME SYNC TIME  WR TIME BSY TIME BSY TIME     RD CNT     RD CNT     RD CNT   SYNC CNT     WR CNT    BSY CNT    BSY CNT
--------------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- --------- -------- -------- -------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
11/09/13 06:00     3,615    1,594    1,292      894      154      104       61        6      482       52      114       85        3       24       104       22        0       18     20,096        880     31,531     45,162     47,067         18      1,390
11/09/13 07:00     3,591    1,638    1,352      907      190       71       38        8      491       98      119       53        9       53        71       49        0        2     13,934      2,584     51,890     47,190     49,327         30        232
11/09/13 08:00     3,589    2,149    1,869    1,324      188       64       70        8      503       86      518      388       14       98        64       41        0        2    199,512      5,722     74,116     49,184     51,518         28        849
11/09/13 09:00     3,640    5,516    5,062    3,344      223       83      382       24      582       94    2,081    1,681       38      218        83       50      146        3    635,825     10,174    147,400     80,669     85,350     65,098      2,136 
11/09/13 10:00     3,558   11,193   10,150    6,625      326      264    1,040      146      670      161    4,007    2,980      227      710       264       87      106       34    744,366     90,168    415,035    129,443    136,572    116,076      8,471
11/09/13 11:00     3,630   16,981   15,578   10,260      486      371    1,294      116      865      255    7,116    5,804      368      495       371      144       14       18  1,346,338    162,856    274,090    191,730    201,510        834      4,778
11/09/13 12:00     3,609   15,954   14,349    9,516      497      508    1,456      143      876      231    5,887    4,979      185      579       508      125       70       68  1,230,115     92,290    317,980    186,878    197,001      3,207      8,931
11/09/13 13:00     3,602   10,800    9,651    6,013      383      278      774       81      747      181    3,665    3,191       87      323       278      102       18       21    954,932     32,045    200,258    173,877    184,167        661      9,443
11/09/13 14:00     3,562    8,532    7,562    4,359      319      203      488       55      698      158    2,566    2,364       25      149       203       92       22       12    553,064      6,607    133,380    159,407    169,369        388      6,391
11/09/13 15:00     3,601   11,486   10,210    6,721      402      374      899      111      751      189    4,102    3,485      157      304       374      109       42       50    814,112     79,733    214,011    172,358    181,824      1,944      7,554
As I said at the beginning this is a lot more columns of data than before. But the whole point is that we can now cut and paste this into a spreadsheet and do further analysis of the data to identify high values of one form or another. Now we not only have a break down of wait time by wait class, we also have the times and counts for the most common wait events.

Previously we saw that from 9am to 10am the major wait time was User I/O at 4,007 seconds out of 6,625 seconds for all waits. Now we can see that the one wait event of "db file sequential read" accounted for almost 3,000 seconds (2,980) out of the 4,000 for User I/O. We can also see that the next longest wait is probably "direct path read" at 710 seconds. So we now know which wait event contributed the most wait time, and how many times it occurred. And this lets us calculate the average "db file sequential read" for that period as being 0.004 seconds i.e. 4 milliseconds. Depending on your particular situation you may view this as being a high average time, or normal and acceptable. If the average wait time is acceptable then you would look at the number of such reads and try and find which SQL statements are causing them and tune those.