- Duration - real clock time that occurred between the AWR 2 snapshots being reported on
- Database Time - time spent by the database being active
- SQL Execution Time - time spent by the database executing SQL statements
- Wait event data - individual wait events and wait time, and wait classes
- System statistics - as normally seen in
V$SYSSTAT
as cumulative counters - SQL execution statistics for individual SQL statements
WITH
clause so that they produce one row of data values per snapshot, and
then joining to these named sub-queries in the main query and outputting
the data values we want. This keeps the main query simpler, and pushes
issues about grouping and summarising into the sub-query. It also lets
us refer to the same sub-query multiple times if needed - which is the
case for some of the data captured by AWR e.g. System Statistics are
captured as sets of Name / Value pairs per snapshot.For now lets add on wait event data telling us how long the system waited on different types of events. As we already have two different times for database activity (DB Time & SQL Execution Time), we would most like to know the total wait time within that database activity. This would let us calculate the actual time doing real SQL execution work, and see wait time as a percentage of total database time. The basic query for this extracts the wait times in microseconds from
DBA_HIST_SYSTEM_EVENT
, and would be:Note that I exclude "Idle" waits, which are waits unrelated to the execution of a SQL statement., syswaits as (select sysevent.snap_id , sysevent.dbid , sum (sysevent.time_waited_micro - psysevent.time_waited_micro) all_wait_time from dba_hist_system_event sysevent, dba_hist_system_event psysevent where sysevent.snap_id = psysevent.snap_id + 1 and sysevent.dbid = psysevent.dbid and sysevent.instance_number = psysevent.instance_number and sysevent.event_id = psysevent.event_id -- Ignore Idle wait events and sysevent.wait_class != 'Idle' group by sysevent.snap_id , sysevent.dbid ) -- syswaits
The next logical thing would be to break down the total wait time somehow into various major components, to show which type of waits occurred more often than the others. There are 3 major ways to break down wait events:
- Foreground versus background waits i.e. waits experienced directly by sessions for connected users, and waits experienced by permanent background server sessions
- By Wait Class - these group individual events into common classes such as Concurrency, User I/O, Commit, Network, Cluster.
- By individual wait event - unfortunately there are around 100 of these, and outputting these as individual columns in the query would be too cumbersome.
DBA_HIST_BG_EVENT_SUMMARY
. So the query for this would be:This gives the total background wait time in each snapshot for each database instance. The main query can then join to this as it does to the other sub-queries., sysbgwait as -- One row per System Wait Event with change in value between snapshots (select sysevent.snap_id , sysevent.dbid , sum (sysevent.time_waited_micro - psysevent.time_waited_micro) time_waited_micro from dba_hist_bg_event_summary sysevent, dba_hist_bg_event_summary psysevent where sysevent.snap_id = psysevent.snap_id + 1 and sysevent.dbid = psysevent.dbid and sysevent.instance_number = psysevent.instance_number and sysevent.event_id = psysevent.event_id and sysevent.wait_class != 'Idle' group by sysevent.snap_id, sysevent.dbid ) -- sysbgwait
For the wait classes we now have two main choices for how to write such sub-queries:
- Have separate sub-queries for each class
- Collapse the sub-queries into just one sub-query using the "decode" function to sum values over different criteria
The classes can be broken out using the Oracle "
DECODE
" function as an "IF" function to test the class name.The principle is that if the wait class name matches the target class name then the corresponding wait time is used in the sum, otherwise zero is used meaning that particular wait time is not included in the sum for that wait class., syswaits as (select sysevent.snap_id , sysevent.dbid , sum (sysevent.time_waited_micro - psysevent.time_waited_micro) all_wait_time , sum (decode (sysevent.wait_class, 'Commit', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) commit_time , sum (decode (sysevent.wait_class, 'Cluster', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) cluster_time , sum (decode (sysevent.wait_class, 'Concurrency', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) concurrency_time , sum (decode (sysevent.wait_class, 'Network', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) network_time , sum (decode (sysevent.wait_class, 'System I/O', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) system_io_time , sum (decode (sysevent.wait_class, 'User I/O', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) user_io_time from dba_hist_system_event sysevent, dba_hist_system_event psysevent where sysevent.snap_id = psysevent.snap_id + 1 and sysevent.dbid = psysevent.dbid and sysevent.instance_number = psysevent.instance_number and sysevent.event_id = psysevent.event_id -- Ignore Idle wait events and sysevent.wait_class != 'Idle' group by sysevent.snap_id , sysevent.dbid ) syswaits
Adding this to our existing query gives the following - note the divides by one million to convert microseconds to seconds:
And again, it would be good to define some nice column formats when running this in SQL*Plus:with snaps as ... , systimes as ... , sysbgwait as ... , syswaits as ... select to_char (snaps.end_snap_time, 'DD/MM/YY HH24:MI') snap_time , extract (second from snaps.snap_interval) + (extract (minute from snaps.snap_interval) + (extract (hour from snaps.snap_interval) + (extract (day from snaps.snap_interval) * 24) ) * 60 ) * 60 snap_duration , dbtime.value / 1000000 db_time , sqlexectime.value / 1000000 sql_exec_time , syswaits.all_wait_time / 1000000 all_wait_time , sysbgwait.time_waited_micro / 1000000 bg_wait_time , syswaits.commit_time / 1000000 commit_time , syswaits.cluster_time / 1000000 cluster_time , syswaits.concurrency_time / 1000000 concurrency_time , syswaits.network_time / 1000000 network_time , syswaits.system_io_time / 1000000 system_io_time , syswaits.user_io_time / 1000000 user_io_time from snaps join (select * from systimes where stat_name = 'DB time') dbtime on snaps.snap_id = dbtime.snap_id and snaps.dbid = dbtime.dbid join (select * from systimes where stat_name = 'sql execute elapsed time') sqlexectime on snaps.snap_id = sqlexectime.snap_id and snaps.dbid = sqlexectime.dbid join syswaits on snaps.snap_id = syswaits.snap_id and snaps.dbid = syswaits.dbid join sysbgwait on snaps.snap_id = sysbgwait.snap_id and snaps.dbid = sysbgwait.dbid order by snaps.end_snap_time /
When run the output is something like this:col snap_time format a15 heading 'SNAP|TIME' col snap_duration format 999,999 heading 'SNAP|DURATION' col db_time format 999,990 heading 'DB|TIME' col sql_exec_time format 999,990 heading 'SQL EXEC|TIME' col all_wait_time format 999,990 heading 'ALL WAIT|TIME' col bg_wait_time format 999,990 heading 'BG WAIT|TIME' col commit_time format 999,990 heading 'COMMIT|WAIT TM' col cluster_time format 999,990 heading 'CLUSTER|WAIT TM' col concurrency_time format 999,990 heading 'CONCURNCY|WAIT TM' col network_time format 999,990 heading 'NETWORK|WAIT TM' col system_io_time format 999,990 heading 'SYS IO|WAIT TM' col user_io_time format 999,990 heading 'USER IO|WAIT TM'
As before, this particular database is relatively inactive before 9am, then becomes more active. Now we can see that the wait time is significant e.g. about 60% of Database Time is actually Wait Time in the hour up to 10am. And we can see that the largest component of this Wait Time is User I/O. We can also see that the next highest Wait Time component is Cluster wait time.SNAP SNAP DB SQL EXEC ALL WAIT BG WAIT COMMIT CLUSTER CONCURNCY NETWORK SYS IO USER IO TIME DURATION TIME TIME TIME TIME WAIT TM WAIT TM WAIT TM WAIT TM WAIT TM WAIT TM --------------- -------- -------- -------- -------- -------- -------- -------- --------- -------- -------- -------- 11/09/13 06:00 3,615 1,594 1,292 894 154 104 61 6 482 52 114 11/09/13 07:00 3,591 1,638 1,352 907 190 71 38 8 491 98 119 11/09/13 08:00 3,589 2,149 1,869 1,324 188 64 70 8 503 86 518 11/09/13 09:00 3,640 5,516 5,062 3,344 223 83 382 24 582 94 2,081 11/09/13 10:00 3,558 11,193 10,150 6,625 326 264 1,040 146 670 161 4,007
What next? We could either try and drill down into individual wait events, or maybe look at the SQL statements being executed and the waits they experience. More in future blog posts.
1 comment:
Great post, thanks.
Two minor misprints:
sum (decode (sysevent.wait_class), 'Commit', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) commit_time
-> sum (decode (sysevent.wait_class, 'Commit', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) commit_time
) syswaits
-> ) -- syswaits
Post a Comment