Thursday 23 January 2014

RAC high block sharing causes high log sync times

Some time ago I dealt with a system that was experiencing high "log file sync" times, as well as other RAC related waits ("gc buffer busy" and "gc cr block busy"). Initially I assumed that the "log file sync" waits and the RAC Global Cache waits were independent of each other, but it turned out that they were interlinked. This post is mainly to make other people aware that on a RAC system there is a relationship between high inter-node block sharing traffic (Global Cache) and "log file sync", and that the log file sync events are a direct side effect of the high inter-node block sharing that is occurring. I'll also provide some evidence to support this in terms of AWR reports from the system.

The system was a 2 node RAC cluster running 10.2.0.4.0. An AWR report from an hour of peak day time activity showed:
Elapsed:   59.58 (mins)    
DB Time:  469.97 (mins)
So a database busy time of nearly 480 minutes in an elapsed period of nearly 60 minutes, meaning an average of almost 8 active sessions during that period. This is high for a system with only 4 CPUs. Theoretically there would only be 240 minutes of CPU capacity available on a 4 CPU node in a 60 minute period.

The top 5 wait events were:
Event                     Waits  Time(s)  Avg Wait(ms) % Total Call Time  Wait Class
CPU time                          8,450                              30.0  
log file sync            77,234   4,930         64                   17.5  Commit
gc buffer busy           45,532   4,902        108                   17.4  Cluster
db file sequential read 269,971   1,587          6                    5.6  User I/O
gc cr block busy         45,388   1,401         31                    5.0  Cluster
8,450 seconds is about 140 minutes. This is just over half of the 240 minutes of CPU capacity available on the 4 CPU node, indicating that waiting is excessive i.e. the total Database Time is made up of more waiting than actively executing SQL statements. And we can see that the other 4 top wait events add to over 12,800 seconds, which is far more than the CPU time - again more waiting than doing real work.

As I said I initially focussed on trying to tune the "log file sync" events, but this lead nowhere. However a blog post I read when Googling this combination (sorry I cannot seem to find the link now) stated that in fact the log file sync is a consequence of the "gc buffer busy" event, and that there are other clues in the RAC specific section of the AWR report. And indeed on looking in the "RAC Statistics" under "Global Cache and Enqueue Services - Workload Characteristics" I saw the following:
Avg global enqueue get time (ms):                       0.6
Avg global cache cr block receive time (ms):           11.2
Avg global cache current block receive time (ms):       7.2
Avg global cache cr block build time (ms):              0.0
Avg global cache cr block send time (ms):               0.2
Global cache log flushes for cr blocks served %:       10.8
Avg global cache cr block flush time (ms):             72.4
Avg global cache current block pin time (ms):           0.1
Avg global cache current block send time (ms):          0.2
Global cache log flushes for current blocks served %:   0.5
Avg global cache current block flush time (ms):       227.5
The stand out statistic is the last one - 227.5 milliseconds on average to flush a "global cache current block". This is way too high. I would expect an inter-node network message to be of the order of a few milliseconds, and anything involving disk writes to be of the order of tens of milliseconds. So hundreds of milliseconds is way too high.

Further digging around revealed that in some cases a Global Cache request for a block from another node may cause that node to flush its work so far to its redo log before the block is copied across to the other node i.e. a "gc buffer" request from one node may cause a "log file sync" to occur on the other node. And I found this confirmed in this Pythian Blog post on Tuning "log file sync" Event Waits
In a RAC cluster, this problem is magnified, since both CR- and CUR-mode buffers need the log flush to complete before these blocks can be transferred to another instance’s cache.
At this point I switched focus onto the causes of those Global Cache requests and the high inter-node traffic.

In the "SQL Statistics" section of the AWR report is "SQL Ordered by Cluster Wait Time", and I could see a set of SQL statements all acting on the same table and with very high Cluster Wait Times as a percentage of elapsed execution time (CWT%). The percentage cluster wait time was around 85% for all of these related SQL statements i.e. it was spending almost 6 times longer waiting than doing the actual work of the SQL statement.

All of these SQL statements acted on one table that was being both heavily inserted into, and selected from. This is kind of the classic "hot table" or "hot block" problem seen in some OLTP applications on RAC databases. This was further confirmed by the "Segments Statistics" section of the AWR report, in "Segments by Global Cache Buffer Busy". The only objects mentioned are the one table and some of the indexes on it.

In this case there were 2 suggestions I could make:
  1. Make the primary key unique index a Reverse Key index, so that new records would not be inserted into the same last index leaf block.
    • The primary key on this table was a generated sequential number, so it will always be increasing in value, and inserted at the end of a normal index.
  2. Create a new multi-column index on the table that would better satisfy most of the queries being run.
    • The current execution plan used a less selective index and had to also retrieve a number of data blocks from the table to further filter the data according to the constraints in the "WHERE" clause.
These 2 changes reduced both the number of data blocks being accessed in the table for each query execution, and reduced specific contention on the last block in the primary key index. The net effect was to reduce the total number of blocks being continually shared between the nodes in the Global Cache for these SQL statements.

Suffice it to say that performance improved significantly and the users were happy after these changes. The "Avg global cache current block flush time" came down to under 5 milliseconds, which is a significant reduction. The original high wait events all but disappeared, so that disk I/O became the top wait event, and the total Database Time went down significantly because it was no longer waiting so much and so was executing more efficiently.

The key lesson for me was that on a RAC database high inter-node global cache traffic can give rise to redo log buffer flushes and consequently high "log file sync" wait times. And this is primarily due to a high volume of blocks being continually shared between the nodes in the cluster. Rather than trying to directly address the log file sync event, you instead need to address the global cache buffer sharing and reduce the total number of blocks being shared somehow. In my case I was able to create a better index that enabled the execution to access fewer data blocks in total, reducing the global cache traffic volume and contention for hot blocks.

Tuesday 14 January 2014

Oracle VM on VirtualBox

I needed to "play" with Oracle VM and learn how to use it for some upcoming work involving virtual machines. Oracle VM (not VirtualBox) is virtualization software that installs directly onto a system i.e. it installs onto "bare metal" and does not need a host operating system. The problem I had was that I had no "spare" unused system on which I could install Oracle VM. I really wanted to be able to install it within a host environment as a kind of test sandpit. Which lead me to thinking about VirtualBox - Oracle's other virtualization software. VirtualBox runs inside a host operating system and creates virtual machines in which you can then install other operating systems i.e. it provides virtual machines that each look like an x86 system. Which means that in principle I could use VirtualBox to create a virtual machine, in which I would them install Oracle VM.

And indeed this does work, but there are a couple of important gotchas to be aware of:
  • I was using Oracle VM 3.2 which also needs a separate system for Oracle VM Manager to run on
    • So I created 2 virtual machines in VirtualBox
  • The Oracle VM virtual machine should be quite large e.g. 8 GB of memory
    • As you will then want to create virtual machines within this
  • The Oracle VM virtual machine will need additional disk storage which can be used for its virtual machines
    • I created the VirtualBox VM with a 5 GB disk for installing Oracle VM itself onto
    • And a second 100 GB virtual disk to use for Oracle VM virtual machines later on
  • Oracle VM and Oracle VM Manager must be on the same network as each other for communication
    • I added a second network adaptor to each virtual machine on an Internal Network (vmnet)
  • This second network adaptor must have Promiscuous Mode enabled i.e set to "Allow All"
    • The default of "Deny" will not allow Oracle VM to communicate successfully with Oracle VM Manager
  • VirtualBox does not emulate the Intel VT-x virtualization instructions in its virtual machines
    • So within Oracle VM you can only do Paravirtualized Machines (PVM)
    • So I used Oracle Enterprise Linux has it has suitable paravirtualized drivers in it for use in a PVM
    • You cannot create a Hardware Virtualized Machine (HVM) as this requires the Intel VT-x in the CPU used
    • This is a limitation of VirtualBox, and nothing to do with the real CPU in your physical system on which your host operating system is running
  • If you want a virtual machine running under Oracle VM to be able to network to the outside world you need to configure something else in the network somewhere. There are different options for doing this, depending on what you want and how much change you can do at your network level.
    • You could use the Bridged Adaptor type of network adaptor in VirtualBox, but this means that each virtual machine is visible on your public network and its IP address and hostname must not clash with any other existing system.
    • If you only want internet access for web browsing and downloading operating system updates then you can use the Oracle VM Manager system as an intermediate. It may seem complicated but it worked for me. There are 2 parts:
      • Configure and run some DNS (Domain Name System) software on the Oracle VM Manager system, to let it act as a DNS server to the Oracle VM virtual machines. I used "Dnsmasq" which is simple and straightforward, and can act as a bridge out to cascade out to another DNS server. On the Oracle VM virtual machine configure it to use your Oracle VM Manager system as its DNS server.
      • Configure the Oracle VM Manager to do NAT (Network Address Translation), which involves a few steps to configure "iptables" to do packet forwarding. Then configure the Oracle VM virtual machine to use the Oracle VM Manager as a network gateway (using "route").

The most important point is probably the one about the network adaptor setting in VirtualBox. This caused me no ends of problems as I could successfully install both Oracle VM and Oracle VM Manager on the VirtualBox virtual machines, and I could get Oracle VM Manager to discover and see the Oracle VM system. Which meant that the network between the two systems was definitely working. But whenever I tried to create a new virtual machine in Oracle VM and boot it, it always failed at the Oracle Enterprise Linux installation stage after entering a URL to install from. There is something special done at that point of the installation procedure that requires the network adaptors to be in promiscuous mode. After some digging I came across this "Promiscuous Mode" and tried its alternative setting, and it all worked then.