Assuming that has all been addressed, then the first thing I do with an AWR Report is a few high level checks - a kind of sanity check to get a feel for whether there might be some kind of performance issue there or not.
A performance problem is normally when a task takes too long to complete, and that is affected by the various resources it uses while doing its work. The key measurements are therefore both sides of this - the input work requests, and the system resource usage while doing that work. I basically want to check:
- The resource usage as a percentage of system capacity i.e. utilisation
- The amount of work requested / done (assuming they are the same), which is really SQL statements executed for a database
- Amount of "wait time" within "SQL execution time" i.e. wait as percentage of work time
- Top wait events to see how these correlate with the previous observations
Lets use the following AWR Report to show what I mean:
WORKLOAD REPOSITORY report for DB Name DB Id Instance Inst Num Startup Time Release RAC ------------ ----------- ------------ -------- --------------- ----------- --- O12DB 3429470280 o12db 1 05-Dec-17 06:59 12.1.0.2.0 NO Host Name Platform CPUs Cores Sockets Memory(GB) ---------------- -------------------------------- ---- ----- ------- ---------- xxxxxxxxxx.local Linux x86 64-bit 2 2 1 7.80 Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- --------- Begin Snap: 79 05-Dec-17 09:53:43 142 3.0 End Snap: 80 05-Dec-17 10:03:44 142 4.1 Elapsed: 10.03 (mins) DB Time: 0.39 (mins) Top 10 Foreground Events by Total Wait Time ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Wait Wait % DB Wait Event Waits Time (sec) Avg(ms) time Class ------------------------------ ----------- ---------- ---------- ------ -------- DB CPU 12.4 53.8 log file sync 2,727 9.3 3.42 40.4 Commit db file sequential read 177 1.2 6.82 5.2 User I/O control file sequential read 827 .1 0.12 .4 System IThe snapshot duration was 10 minutes, and the system had 2 CPU Cores, so there were 20 minutes of CPU capacity available. The database processing time is reported as "DB Time" and is 0.39 minutes, which is about 1.95% resource utilisation. From this I can see that the database was doing very little work at all, so there is not a database wide performance problem worth investigating. Even though the Top Wait Events seem to show that 40% of the time was spent waiting on the "
log file sync
"
event, the actual amount of Time is very trivial - 9.3 seconds of
waiting out of a 600 second period (10 minutes). Such low workloads and
resource utilisation can lead to various measurement anomalies, so it
is not worth trying to drill down further into the wait event details.
Any actual problem is probably specific to a single session, and should
be investigated using session specific tools.Another AWR Report:
WORKLOAD REPOSITORY report for DB Name DB Id Instance Inst Num Startup Time Release RAC ------------ ----------- ------------ -------- --------------- ----------- --- O12DB 3429470280 o12db 1 05-Dec-17 06:59 12.1.0.2.0 NO Host Name Platform CPUs Cores Sockets Memory(GB) ---------------- -------------------------------- ---- ----- ------- ---------- xxxxxxxxx.local Linux x86 64-bit 2 2 1 7.80 Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- --------- Begin Snap: 71 05-Dec-17 07:55:35 39 1.8 End Snap: 72 05-Dec-17 08:05:39 40 2.0 Elapsed: 10.06 (mins) DB Time: 9.35 (mins) Load Profile Per Second Per Transaction Per Exec Per Call ~~~~~~~~~~~~~~~ --------------- --------------- --------- --------- Redo size (bytes): 3,470,831.7 5,403.2 Logical read (blocks): 61,507.4 95.8 Block changes: 20,563.9 32.0 User calls: 495.9 0.8 Parses (SQL): 302.6 0.5 Executes (SQL): 13,287.8 20.7 Transactions: 642.4 Top 10 Foreground Events by Total Wait Time ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Wait Wait % DB Wait Event Waits Time (sec) Avg(ms) time Class ------------------------------ ----------- ---------- ---------- ------ -------- DB CPU 372.9 66.5 log file sync 272,633 196.8 0.72 35.1 Commit db file sequential read 472 2.2 4.75 .4 User I/OAgain, same database instance and a 10 minute period of time but a different workload. Now we see that the resource utilisation is up at 46.47%, being 9.35 minutes of DB Time out of 20 minutes of potential CPU capacity. From the Load Profile we can see that the database was executing over 13,000 SQL statements per second. From the wait events we see that "DB CPU" is 66.5% of the "DB Time" i.e. doing real work. So waiting as a percentage of SQL execution time would be about 33.5%. And we can see that almost all of this wait time is due to a single wait event - "
log file sync
".Again, this does not directly tell us whether there is a performance problem or not. In all databases doing work there will be some wait events occurring, and one of these will always be top when ordered by wait time. That does not mean that it is necessarily a problem, or that it can be "fixed" in any way. In this case the workload is doing a lot of inserts and updates in small transactions, so there is a high frequency of commits which all involve writes to the redo logs. And we would expect the "
log file sync
"
wait event to be there in the set of top 10 wait events in this
scenario i.e. it is not a "problem" but a "result" of the workload
running on the database.
No comments:
Post a Comment