Monday, 28 October 2013

AWR Summary Reporting #5 - System Statistics

So far our AWR report query shows overall duration between two consecutive snapshots, database time (i.e. doing work), SQL execution time (may be lower), total wait time, wait time by class, and some common specific wait events. We know how busy the database was and how much time it spent waiting and on what. Now I want to add in some more data captured by the AWR on system activity. This series is dragging on a bit, so I'm going to try and keep this one shorert and just get it posted out.

AWR snapshots V$SYSSTAT into DBA_HIST_SYSSTAT. This contains individual system statistics with cumulative running totals for each. These are stored as Name / Value pairs in each table. Most statistics are just counts of event occurrences e.g. "physical reads", "physical writes", but some have been added that are time values. This lets you calculate average time per event when you have the corresponding event count as well. Beware that many of the "time" values are in odd units, such as hundredths of seconds (1/100 second), and will need to be adjusted if you want to report in standard units of time.

Statistics recorded include the following, which I've tried to group together into related sets. The details don't really matter on these, as the point of these AWR Summary Reports is to pull out as much as we can at once from a database for later analysis. Some of these statistics might only be relevant in specific scenarios - but by pulling them out at the same time as others, we already have them when they are needed.
  • SQL statement execution - how many? Elapsed Time is in the Database Time Model data set
    • "execute count" - SQL statements executed, both user and recursive
    • "user calls" = Number of user calls such as login, parse, fetch, or execute.
    • "recursive calls"
  • Transactions = units of work = groups of related SQL statements
    • "user commits" = commit count
    • "user rollbacks" = rollback count = abandoned transactions
  • Average Redo Write Time - how long to write to the redo log?
    • "redo write time" = Total write time in 100'ths of seconds aka. 10's of milliseconds
    • "redo writes" = Number of redo writes that occurred
    • Also "redo blocks written" - Divide by "redo writes" to get average blocks per write
    • Can break redo writes down into those due to a "commit" and others (log buffer full enough to warrant a write)
    • Commit == Redo Synch -> "redo synch time" / "redo synch writes"
  • Disk I/O - how many disk operations? Elapsed time is in the Wait Events data set
    • "physical reads" = Physical Reads
    • "physical writes" - Physical Writes
  • Logical reads = "consistent gets" + "db block gets"
    • Note that "session logical reads" is defined as 'The sum of "db block gets" plus "consistent gets"', but might also include logical reads from 'process private memory'.
Because the statistics are stored as Name / Value pairs, each statistic is in a separate row rather than a separate column in the same row as other statistics. We can use a sub-query that retrieves each change in statistic value for each AWR snapshot, and then use this within the main query in an in-line view for each statistic with an appropriate alias name.
, sysstats as 
-- One row per System Statistic with change in value between snapshots
(select sysstat.snap_id
      , sysstat.dbid
      , sysstat.stat_name
      , sum (sysstat.value - psysstat.value) value
         from dba_hist_sysstat sysstat, dba_hist_sysstat psysstat
        where sysstat.snap_id = psysstat.snap_id + 1
          and sysstat.dbid = psysstat.dbid
          and sysstat.instance_number = psysstat.instance_number
          and sysstat.stat_id = psysstat.stat_id
-- Assume if stat_id the same so is the stat_name
        group by sysstat.snap_id, sysstat.dbid, sysstat.stat_name
) -- sysstats

select to_char (snaps.end_snap_time, 'DD/MM/YY HH24:MI') snap_time
...
     , execs.value executions
     , user_calls_st.value user_calls
     , user_commits_st.value user_commits
     , redo_write_st.value redo_writes
     , redo_time_st.value / 100 redo_write_time
     , phys_reads.value physical_reads
     , phys_writes.value physical_writes
     , table_scans_st.value table_scans
from
...
     join (select * from sysstats where stat_name = 'execute count') execs
       on snaps.snap_id = execs.snap_id and snaps.dbid    = execs.dbid
     join (select * from sysstats where stat_name = 'user calls') user_calls_st
       on snaps.snap_id = user_calls_st.snap_id and snaps.dbid = user_calls_st.dbid
     join (select * from sysstats where stat_name = 'user commits') user_commits_st
       on snaps.snap_id = user_commits_st.snap_id and snaps.dbid  = user_commits_st.dbid
     join (select * from sysstats where stat_name = 'redo writes') redo_write_st
       on snaps.snap_id = redo_write_st.snap_id and snaps.dbid  = redo_write_st.dbid
     join (select * from sysstats where stat_name = 'redo write time') redo_time_st
       on snaps.snap_id = redo_time_st.snap_id and snaps.dbid  = redo_time_st.dbid
     join (select * from sysstats where stat_name = 'physical reads') phys_reads
       on snaps.snap_id = phys_reads.snap_id and snaps.dbid    = phys_reads.dbid
     join (select * from sysstats where stat_name = 'physical writes') phys_writes
       on snaps.snap_id = phys_writes.snap_id and snaps.dbid  = phys_writes.dbid
     join (select * from sysstats where stat_name = 'table scans (long tables)') table_scans_st
       on snaps.snap_id = table_scans_st.snap_id and snaps.dbid  = table_scans_st.dbid
...
This means that we can now correlate overall system performance (Database Time and Wait Time) against other activities on the system and drill down further. Using the template given above you can extract any System Statistic you are interested in, and it is easy to add other ones in the future when you need them.

No comments: