But before we do that I want to refactor that sub-query to make some of the later additions to it easier and simpler. Remember that we had just one sub-query, with the wait time always being calculated by subtracting the current snapshot's wait time from the previous snapshot's time. I want to refactor this by pushing this subtraction down into a nested sub-query, so that the rest of the wait sub-query becomes simpler.
So before we had elements like:
The refactor changes this to:, sum (sysevent.time_waited_micro - psysevent.time_waited_micro) all_wait_time , sum (decode (sysevent.wait_class, 'Commit', (sysevent.time_waited_micro - psysevent.time_waited_micro), 0)) commit_time
The key points are:, syswaits as -- One row for total wait time, plus break down into major wait classes, and events (select syswaitevents.snap_id , syswaitevents.dbid , sum (syswaitevents.time_waited_micro) all_wait_time , sum (decode (syswaitevents.wait_class, 'Commit', syswaitevents.time_waited_micro, 0)) commit_time ... from (select sysevent.snap_id , sysevent.dbid , sysevent.wait_class , sysevent.event_name , sum (sysevent.time_waited_micro - psysevent.time_waited_micro) time_waited_micro , sum (sysevent.total_waits - psysevent.total_waits) wait_count from dba_hist_system_event sysevent, dba_hist_system_event psysevent where sysevent.snap_id = psysevent.snap_id + 1 ... group by sysevent.snap_id, sysevent.dbid, sysevent.wait_class, sysevent.event_name ) syswaitevents group by syswaitevents.snap_id , syswaitevents.dbid ) -- syswaits
- The inner sub-query groups by wait class and individual event name, as well as by snapshot and database id, and sums the wait time to this granularity.
- It also sums the total occurrences of each wait event, for further use in the outer query.
- The outer sub-query can now just refer to "
syswaitevents.time_waited_micro
" as the elapsed wait time, as the subtraction between the current and previous snapshot values are done in the inner sub-query. - The outer sub-query still does the "decode" of the class name to extract individual sums for each class desired
The benefit is that if any one of these particular wait event times is very high, then we already have extracted it and reported it for analysis and can see it in the output. If all of these particular wait event times are low, then there was almost no extra cost involved in retrieving them at the same time as the other wait event times (there is no extra disk I/O, only some extra CPU to process the wait time data). It avoids the need to run another query afterwards to drill down into the wait event data, and if the extra wait data is irrelevant then we can just ignore it. And the mechanism is the same one as for the wait classes, using "decode" against the event name.
The top wait events that I am going to include are listed below. It's not a large list as I want to focus on the query, not the specific wait events. You can disagree with some of these, or want to add some others of your own. You can do this just by modifying the query I give below and changing the event names used in the "decode" functions.
- db file sequential read
- db file scattered read
- direct path read
- log file sync
- log file parallel write
- gc buffer busy
- gc cr block busy
The only changes to the refactored sub-query given earlier are the addition of these extra data items to the "select" output list:
Then we need to add these extra columns to the output of the main "select", remembering to divide times in microseconds by one million:, sum (decode (syswaitevents.event_name, 'db file sequential read', syswaitevents.time_waited_micro, 0)) db_file_sqrd_time , sum (decode (syswaitevents.event_name, 'db file sequential read', 1, 0)) db_file_sqrd_count , sum (decode (syswaitevents.event_name, 'db file scattered read', syswaitevents.time_waited_micro, 0)) db_file_sctrd_time , sum (decode (syswaitevents.event_name, 'db file scattered read', 1, 0)) db_file_sctrd_count , sum (decode (syswaitevents.event_name, 'direct path read', syswaitevents.time_waited_micro, 0)) direct_path_read_time , sum (decode (syswaitevents.event_name, 'direct path read', 1, 0)) direct_path_read_count , sum (decode (syswaitevents.event_name, 'log file sync', syswaitevents.time_waited_micro, 0)) log_file_sync_time , sum (decode (syswaitevents.event_name, 'log file sync', 1, 0)) log_file_sync_count , sum (decode (syswaitevents.event_name, 'log file parallel write', syswaitevents.time_waited_micro, 0)) log_file_prl_wr_time , sum (decode (syswaitevents.event_name, 'log file parallel write', 1, 0)) log_file_prl_wr_count , sum (decode (syswaitevents.event_name, 'gc buffer busy', syswaitevents.time_waited_micro, 0)) gc_buff_busy_time , sum (decode (syswaitevents.event_name, 'gc buffer busy', 1, 0)) gc_buff_busy_count , sum (decode (syswaitevents.event_name, 'gc cr block busy', syswaitevents.time_waited_micro, 0)) gc_cr_blk_busy_time , sum (decode (syswaitevents.event_name, 'gc cr block busy', 1, 0)) gc_cr_blk_busy_count
Note that I have chosen to keep the times together and the counts together, rather than alternate between them for each event in turn. This particular ordering makes it easier to copy an "average wait time" formula from one cell to another in a spreadsheet, as each referenced column shifts by one column in each case., syswaits.db_file_sqrd_time / 1000000 db_file_sqrd_time , syswaits.db_file_sctrd_time / 1000000 db_file_sctrd_time , syswaits.direct_path_read_time / 1000000 direct_path_read_time , syswaits.log_file_sync_time / 1000000 log_file_sync_time , syswaits.log_file_prl_wr_time / 1000000 log_file_prl_wr_time , syswaits.gc_buff_busy_time / 1000000 gc_buff_busy_time , syswaits.gc_cr_blk_busy_time / 1000000 gc_cr_blk_busy_time , syswaits.db_file_sqrd_count db_file_sqrd_count , syswaits.db_file_sctrd_count db_file_sctrd_count , syswaits.direct_path_read_count direct_path_read_count , syswaits.log_file_sync_count log_file_sync_count , syswaits.log_file_prl_wr_count log_file_prl_wr_count , syswaits.gc_buff_busy_count gc_buff_busy_count , syswaits.gc_cr_blk_busy_count gc_cr_blk_busy_count
With these columns added, the output from my AWR summary report becomes:
As I said at the beginning this is a lot more columns of data than before. But the whole point is that we can now cut and paste this into a spreadsheet and do further analysis of the data to identify high values of one form or another. Now we not only have a break down of wait time by wait class, we also have the times and counts for the most common wait events.SNAP SNAP DB SQL EXEC ALL WAIT BG WAIT COMMIT CLUSTER CONCRNCY NETWORK SYS IO USER IO DBF SEQ DF SCTRD DRCT PTH LOG FILE LG FL PR GC BUFF GC CRBLK DBF SEQ DF SCTRD DRCT PTH LOG FILE LG FL PR GC BUFF GC CRBLK TIME DURATION TIME TIME TIME TIME WAIT TM WAIT TM WAIT TM WAIT TM WAIT TM WAIT TM RD TIME RD TIME RD TIME SYNC TIME WR TIME BSY TIME BSY TIME RD CNT RD CNT RD CNT SYNC CNT WR CNT BSY CNT BSY CNT --------------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- -------- --------- -------- -------- -------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- 11/09/13 06:00 3,615 1,594 1,292 894 154 104 61 6 482 52 114 85 3 24 104 22 0 18 20,096 880 31,531 45,162 47,067 18 1,390 11/09/13 07:00 3,591 1,638 1,352 907 190 71 38 8 491 98 119 53 9 53 71 49 0 2 13,934 2,584 51,890 47,190 49,327 30 232 11/09/13 08:00 3,589 2,149 1,869 1,324 188 64 70 8 503 86 518 388 14 98 64 41 0 2 199,512 5,722 74,116 49,184 51,518 28 849 11/09/13 09:00 3,640 5,516 5,062 3,344 223 83 382 24 582 94 2,081 1,681 38 218 83 50 146 3 635,825 10,174 147,400 80,669 85,350 65,098 2,136 11/09/13 10:00 3,558 11,193 10,150 6,625 326 264 1,040 146 670 161 4,007 2,980 227 710 264 87 106 34 744,366 90,168 415,035 129,443 136,572 116,076 8,471 11/09/13 11:00 3,630 16,981 15,578 10,260 486 371 1,294 116 865 255 7,116 5,804 368 495 371 144 14 18 1,346,338 162,856 274,090 191,730 201,510 834 4,778 11/09/13 12:00 3,609 15,954 14,349 9,516 497 508 1,456 143 876 231 5,887 4,979 185 579 508 125 70 68 1,230,115 92,290 317,980 186,878 197,001 3,207 8,931 11/09/13 13:00 3,602 10,800 9,651 6,013 383 278 774 81 747 181 3,665 3,191 87 323 278 102 18 21 954,932 32,045 200,258 173,877 184,167 661 9,443 11/09/13 14:00 3,562 8,532 7,562 4,359 319 203 488 55 698 158 2,566 2,364 25 149 203 92 22 12 553,064 6,607 133,380 159,407 169,369 388 6,391 11/09/13 15:00 3,601 11,486 10,210 6,721 402 374 899 111 751 189 4,102 3,485 157 304 374 109 42 50 814,112 79,733 214,011 172,358 181,824 1,944 7,554
Previously we saw that from 9am to 10am the major wait time was User I/O at 4,007 seconds out of 6,625 seconds for all waits. Now we can see that the one wait event of "db file sequential read" accounted for almost 3,000 seconds (2,980) out of the 4,000 for User I/O. We can also see that the next longest wait is probably "direct path read" at 710 seconds. So we now know which wait event contributed the most wait time, and how many times it occurred. And this lets us calculate the average "db file sequential read" for that period as being 0.004 seconds i.e. 4 milliseconds. Depending on your particular situation you may view this as being a high average time, or normal and acceptable. If the average wait time is acceptable then you would look at the number of such reads and try and find which SQL statements are causing them and tune those.
2 comments:
Can we get the whole script please.
Can we get the whole script.
I liked the blog
Post a Comment