Wednesday, 26 March 2014

Red Gate Source Control for Oracle

The background to this - I was "given" a free license to Red Gate Source Control for Oracle some months ago as part of some marketing and awareness activity Red Gate were doing. I've been busy with other things, so I've only now got around to trying to understand what the product does and see if it can be of any benefit to me. Hence this review.

Before I start my review I want to make two things clear. The first thing is about how you treat Databases in a software development world. Databases are fundamentally different to Application Software. Whereas Application Software is normally rebuilt each time by compiling it from its source code, Databases are never rebuilt - they are always modified in place. A customer's data cannot be thrown away and replaced by a new empty database. Changes to the database must happen in place, in an incremental way. To put it another way, Application Software evolves between each release, with each release being a next generation that replaces the one before it. Databases though metamorphose between releases, with changes being applied directly to the database so that it changes in place and existing data is retained.

This means that a Database must be maintained through a series of "change scripts" that make incremental changes to the database, and these change scripts must be run in a controlled sequence when upgrading the database from one version of its associated Application Software to another version. I would expect any "Database Source Code Control" tool to help in the production of these "database change scripts", and to help manage their check in to the source code tree.

The second point is that I am only interested in source code control tools that truly track "changes" to source code files. I want to use something that knows what changed in a file, and can use this to do things like merges between different branches in a sensible way. I don't want something that just tracks the final contents of each file, and can reverse engineer what changed between two versions by working out the differences after the event. I've used Mercurial and I know that it works this way. And I'm also aware that Git works in a similar way by tracking changes to files. These are the main tools that truly track changes (there may be a few other ones as well), but it means that most other source code control tools do not track changes in the same way. Tracking changes to files is a very powerful way of doing source code control, and enables a whole bunch of advanced features around propagating changes between versions, and merging changes together.

Red Gate Source Control for Oracle

Before I tried to use the product I thought I would read up on it to understand exactly what it thought it was capable of, and how I would need to configure it to get out of it what I wanted. The first disappointment was the slightly weak product information on their web site, which was of the generic "we can do it" type without being specific about what "it" actually was that it did. But then I guess that is marketing for you.

What I did get from the product web page was that it only works with the source code control tools Subversion (SVN) or Team Foundation Server (TFS) from Microsoft. This is the first major shortcoming for this product. I'm not going to stop using Mercurial, which I find to be really good at what it does.

After that I did manage to find documentation for the product online, via the Support link on the Red Gate web site, which was good enough at describing how to use it. It seems that it will generate a set of SQL script files containing the DDL SQL statements to create the database you are using i.e. it reverse engineers from your current database the SQL that would be needed to recreate it. And having generated these SQL files it will then check them into your source code tree for you. When you change the structure of your database in any way, you can make it do a refresh and it will re-generate these SQL files, and re-upload them to your source code control tool. This means that it will update those SQL files that have changed in some way since the last source code update.

This was pretty much confirmed in a review by Lewis Cunningham, which described using Red Gate Source Control for Oracle and the series of SQL files it generated.

Which brings me to the second major shortcoming of the product - as Lewis notes "It generated create scripts rather than alters". Remember the point I made at the very start about deployed production databases must metamorphose in place, keeping the existing data inside them. This means that databases must be maintained through a series of "change scripts" that alter existing database structures to update them. We do not want just another set of "create" scripts that will create a new empty database. This is confirmed in a comment in Lewis's post by a Red Gate employee:-
I am one of the developers ... You were wondering ... how you might get the alter scripts. ... You can checkout a version of the database from SVN and Schema Compare can compare it to a live database and generate the alters you need to deploy the new version.
He is saying that Red Gate Source Code Control for Oracle will not do the "alters" for you, but Red Gate have another product called Schema Compare that will produce the "alters" for you.

Which leads on to the third shortcoming with this tool. How did my local test database get "changed" in the first place, so that Red Gate Source Control for Oracle could detect this change? If the Red Gate tool did not produce the SQL that I ran to change my local database, then it must have come from somewhere else. Most likely it came from another tool, or from SQL statements I wrote myself manually and ran against the database.

Now, if I've already got the SQL statements that "alter" the database structure, and I am sensible enough to put these into a file before running them against my local test database, then I already have the necessary "change script" that updates my database in place. And I could just check in this "change script" directly to my source code tree, and not bother with Red Gate Source Control for Oracle at all.

Summary

To summarise what Red Gate Source Control for Oracle does:
  • Red Gate Source Control for Oracle only works with SVN and TFS
  • It will not produce "alter" SQL scripts to run against a database - only "create" SQL scripts
  • You must manually or otherwise create these "alter" SQL scripts yourself to change your local database
  • It only detects changes to a database after you have already run your own "alter" SQL to change it
As a result of its limitations I didn't get around to using it because it would not work with my preferred source code control tool (Mercurial), and it would not deliver what I really needed anyway (database "change scripts").

The "holy grail" of database development tools for me would be one that helped produce the "alter" SQL statements in the first place, as a series of "change scripts". I could then test these "change scripts" locally in my test database, and if happy with the results check them into the source code tree. Unfortunately this tool from Red Gate does not do this at all, and does something completely different - only generating a complete set of "create" SQL scripts all the time, and only after you have already changed the structure of your local test database somehow.

Furthermore, I'm not sure what this tool really delivers beyond running "DBMS_METADATA.GET_DDL" (or the equivalent of it) in a loop to get the SQL DDL statements for each object in the database. I've already got my own set of scripts that do just this. The only thing Red Gate seems to add is a nice GUI in front of it all, and some integration to two source code control tools.

Tuesday, 11 March 2014

Speeding up Imports

There are a number of techniques you can use to speed up an Oracle Import, some of which I'll describe here. This is not any attempt at a comprehensive list, just some of the main techniques I have used that can really speed up some parts of an import. I've seen a standard import come down from 2.5 hours to about 0.75 hours using these techniques.

The first thing to realise is that an exported dump file is a "logical" dump of the database - it contains the SQL statements necessary to recreate all the database structures, and the data records to be loaded into each table. The import works by executing these SQL statements against the target database, thus recreating the contents of the original database that was exported. We can leverage this to our advantage in various ways.

The objects and their corresponding SQL are in a certain order in the exported dump file, and they are imported in a strict sequence by the "impdp" utility. This order is roughly the following for the main objects (it is not a complete or absolute list, as I have simplified some of the details a bit):
  • Sequences
  • Tables - Creation, but no data
  • Table Data
  • Indexes
  • Constraints - Check & Referential
  • Packages, Functions, Procedures
  • Views
  • Triggers
  • Functional and Bitmap Indexes
  • Materialized Views
When you run a full import, each of these sets of objects is done in turn, one after the other. This means that the import is not really parallelised. There is a PARALLEL option, but this only really affects the "Table Data" section, and allows multiple slave readers of the table data to run INSERT statements i.e. different tables can be loaded at the same time, but that is all. The PARALLEL option does not affect other options, such as index builds, which is a shame.

 

Its all about Disk Bandwidth

Before I get into specific techniques, I want to point out that an Oracle Import is fundamentally a disk I/O bound task. The techniques I'll be describing will be using parallelism of various forms to get more work done at the same time on the system, to reduce the total elapsed time. And this in turn will produce more disk I/O operations running at the same time. So you will ultimately be limited by the system's ability to get data off and on to the disks, rather than being limited by the amount of CPU or Memory it has in it.

Consider how each of the object types in turn is created within the database during the import:
  • The initial table data load is mainly disk I/O - read all the data records from the dump file and INSERT
  • Index creation is mainly disk I/O - read all the data in the table, sort it for the index, write the index to disk
  • Constraints are mainly disk I/O - read all the data in one table and verify value or existence in another table
  • Materialized Views involve executing the SELECT to populate the MV on disk
Yes, some CPU will be used (certainly for data load and index creation), but the CPU's must each be fed with data from disk to keep them busy. So CPU utilisation can only go up by increasing the disk bandwidth used.

In my experience it is the index creation and the referential integrity constraints that take the longest elapsed time during imports, due to their need to always read all the data records in a table.

 

Divide And Conquer

The main technique I use to speed up an import is to import each object type separately via its own "impdp" command, using the "INCLUDE" option to only do those type of objects. While this does not directly speed up the import itself, it sets the foundation for being able to do so. Now we have a series of individual import commands to import the whole database, we can then sub-divide these up within themselves.

So instead of loading data into all the tables in one command, which does them sequentially one after the other, we can run separate import commands at the same time for each of the largest tables, and run another import command for all the other tables (using the EXCLUDE option to omit the largest tables). Now we have a high level of parallelism within the table data load, which we can fine tune based on the relative sizes of our tables and the disk I/O bandwidth available on the system.

 

Parallelizing Indexes

We could use a similar technique on indexes of doing different tables at the same time, but this can have other issues to deal with e.g. an increase in the amount of temporary tablespace storage used. Also you will probably be limited by the time taken for all of the indexes on the largest table in the database, which would be built serially one after the other.

Instead what I do is use the "SQLFILE" option of impdp with the "INCLUDE=INDEX" option to get the "CREATE INDEX" statements into a single SQL file. Then I edit this file and add a "PARALLEL" clause to the end of each "CREATE INDEX". In fact, import actually puts a "PARALLEL 1" on the end of each "CREATE INDEX" already (I'm using 11.2.0.3.0), so I only need to do a global replace of this by the degree of parallelism I want. You probably want something like the same as the number of disks you have, but you can experiment to get the best value.

Once edited I can then create the indexes by running this SQL script from SQL*Plus, after all the tables have had their data loaded in. As each index is being created using parallelism, each will be maximising the resources on the system, and we can just let it build one index at a time. There may not be any benefit from trying to split the indexes into two separate SQL files and running them both at the same time.

Note that you can create and edit this index SQL script as soon as you have the dump file you want to import. You can do this while the table data is being loaded, so that the index SQL script is ready before the tables have finished loading. This way there is no need for a gap between the end of the table data being loaded and the index creation starting.

 

Skipping Constraints

As mentioned, constraints involve reading all of the data in a table to validate that it is correct against the specified constraint. If we are importing a complete database from another system, and we know for certain that all of the data satisfies all of the constraints in that database, then we can basically skip validating the constraints completely. The speed up from this is quite significant, as we completely avoid enormous amounts of disk I/O for every table in the database, but can still end up with all the constraints in place.

As for indexes I use the SQLFILE option to produce a file containing the SQL for the constraints. I then edit this file and add "NOVALIDATE" to the end of each constraint statement. This means that Oracle will create the constraint, which is what we want, but it will not validate the data i.e. it will assume that the data is valid and not explicitly check it, which is also what we want. In fact each constraint SQL statement ends in the word "ENABLE", so we can do a global replace of this by "ENABLE NOVALIDATE", which is easy.

Again, as for indexes, we can produce and edit this SQL script as soon as we have the dump file, and then run it at the appropriate point in the steps being used to import the database.

 

Export

I've discussed the import side of things so far, as that is where you often want the speed up. But you can apply similar principles to the export of the database, using divide and conquer to run multiple exports at the same time. Again this will increase the disk I/O operations occurring at the same time, but will reduce the total elapsed time of the export. Some of the techniques include the following:
  • Do an export of just the metadata i.e. just the SQL to create the objects, using "CONTENT=METADATA_ONLY"
  • Export the data from the biggest tables individually to their own dump files, using the "TABLES" option to list each table
  • Export the data from all the other tables to a dump file, using the "EXCLUDE=TABLE:" option
    • Beware that the "EXCLUDE" option has a slightly weird syntax where you need to embed quotes around the list of table names

 

Full Database Migration

If you are using export and import to migrate a production database from one system to another, then you can combine all of these techniques together to minimise the total elapsed time before the second database is up and available.
  • Once the metadata export has finished, transfer it to the target system over the network
    • Then create the empty tables, and produce and edit the SQL scripts for indexes and constraints
  • As each table data export finishes, start its transfer over the network to the target system
    • And as each data dump file arrives on the target system start its load
  • Once data load has completed, start creating the indexes, followed by the other steps
This overlaps execution of the export, transfer of the dump files over the network, and import on all of the systems at the same time.

Monday, 17 February 2014

Packaging Python With My Application

To try and keep a longer story as short as possible I needed to package up the Python run time environment to ship along with a Python based application I had written. And in this case the target platform was Windows, though the solution will also work for Linux or any other platform (however most Linux distributions will already have Python on them). I needed to ship Python itself with my application to guarantee that the application would be able to run (had everything it needed), and to avoid complications of requiring the customer to download and install Python themselves (potential issues over version compatibility).

Through a number of blog posts by other people about different "packaging" techniques (see References later) I came up with the following solution that works. This is not the only method of packaging a Python application, and indeed it is quite surprising how many different techniques there are. But this worked for me, and was what I wanted i.e. including Python itself with my Python based application. One of the neat benefits of this for me is that the whole Python run time I need is only 7.5 MB in size, and the main ZIP file of the run time environment is only 2.5 MB in size, which shows how compressable it all is.

Packaging Python with my application

First I create a directory (folder for Windows people) for my application, and put all my application's Python files in there.

Then I create a sub-directory in this to put Python itself into e.g. Python33_Win.

Into this I put the following files:
_socket.pyd
cx_Oracle.pyd
msvcr100.dll
pyexpat.pyd
python.exe
python33.dll
python33.zip
LICENSE.txt
Note that "cx_Oracle.pyd" is needed because my application makes a connection to an Oracle database to do its work. Also "msvcr100.dll" is technically a Microsoft DLL that is needed by programs written in C, which the Python interpreter is. Microsoft allows this DLL to be copied for the purpose of running such C based programs.

The "python33.zip" is something I created, and into which I put the rest of the necessary Python run time files. There are quite a lot of these, all taken from the directory where you installed Python on your own system:
  • All the ".py" files in the top level Python folder
  • The following folders including any sub-folders:-
    • collections
    • concurrent
    • ctypes
    • curses
    • dbm
    • distutils
    • email
    • encodings
    • html
    • http
    • importlib
    • logging
    • pydoc_data
    • site-packages
    • unittest
    • urllib
    • venv
    • wsgiref
    • xml
Then I wrote a wrapper script to run my application via the Python run time included. In this case it is a Windows batch script, and it exists in the folder above my application source code. My application needs two command line arguments provided e.g. user name and password.
MyAppName\Python33_Win\python MyAppName\myappname.py %1 %2
That's it, and it works.

How does it work.

Built into the Python interpreter i.e. into "python.exe", is clearly the functionality to dynamically load into itself various libraries it needs at run time. An example of these are the "*.pyd" files explicitly included in the Python directory. However, it also has the functionality to open up a ZIP file and look inside that for the libraries it needs. Thus we can take most of the Python run time environment files and put them into a PYTHON33.ZIP file, and Python will look in here to find the files it needs. Which makes packaging up Python pretty simple.

The exceptions to this are the "python33.dll" and Microsoft C DLL, and a few PYD files (which are Python Dynamic libraries, a bit like Windows DLL's). These will not be found in the ZIP file, as they seem to be needed before Python gets around to opening such a ZIP file.

Further Notes

  • These files and directories are what I needed to make my application work. The particular set needed can be different for each Python based application, depending on what modules you import into your program at run time.
  • I tried using the "modulefinder" module to report on which files were actually being referenced at run time when my application was being run. This helped reduce down the total number of files I needed to include in the ZIP file.
  • The ZIP file is named "python33.zip" because I am using Python version 3.3, and its DLL is named "python33.dll" i.e. it looks for a ZIP file with a matching name of its version.

References

The main articles I found that helped me and pointed out this way of packaging up Python were:
  • Distributing a Python Embedding Program which states that everything can be put into a ZIP file, except for some special files, and gives an example where those files are listed.
  • How to Distribute Commercial Python Applications which describes the different packaging options available for Python based applications, and concludes that "For complex scenarios, my recommendation is to go with bundling CPython yourself and steer clear of the freezers".
    • It does not however tell you how to "bundle CPython yourself" - it just states that it is possible to do it.

Thursday, 23 January 2014

RAC high block sharing causes high log sync times

Some time ago I dealt with a system that was experiencing high "log file sync" times, as well as other RAC related waits ("gc buffer busy" and "gc cr block busy"). Initially I assumed that the "log file sync" waits and the RAC Global Cache waits were independent of each other, but it turned out that they were interlinked. This post is mainly to make other people aware that on a RAC system there is a relationship between high inter-node block sharing traffic (Global Cache) and "log file sync", and that the log file sync events are a direct side effect of the high inter-node block sharing that is occurring. I'll also provide some evidence to support this in terms of AWR reports from the system.

The system was a 2 node RAC cluster running 10.2.0.4.0. An AWR report from an hour of peak day time activity showed:
Elapsed:   59.58 (mins)    
DB Time:  469.97 (mins)
So a database busy time of nearly 480 minutes in an elapsed period of nearly 60 minutes, meaning an average of almost 8 active sessions during that period. This is high for a system with only 4 CPUs. Theoretically there would only be 240 minutes of CPU capacity available on a 4 CPU node in a 60 minute period.

The top 5 wait events were:
Event                     Waits  Time(s)  Avg Wait(ms) % Total Call Time  Wait Class
CPU time                          8,450                              30.0  
log file sync            77,234   4,930         64                   17.5  Commit
gc buffer busy           45,532   4,902        108                   17.4  Cluster
db file sequential read 269,971   1,587          6                    5.6  User I/O
gc cr block busy         45,388   1,401         31                    5.0  Cluster
8,450 seconds is about 140 minutes. This is just over half of the 240 minutes of CPU capacity available on the 4 CPU node, indicating that waiting is excessive i.e. the total Database Time is made up of more waiting than actively executing SQL statements. And we can see that the other 4 top wait events add to over 12,800 seconds, which is far more than the CPU time - again more waiting than doing real work.

As I said I initially focussed on trying to tune the "log file sync" events, but this lead nowhere. However a blog post I read when Googling this combination (sorry I cannot seem to find the link now) stated that in fact the log file sync is a consequence of the "gc buffer busy" event, and that there are other clues in the RAC specific section of the AWR report. And indeed on looking in the "RAC Statistics" under "Global Cache and Enqueue Services - Workload Characteristics" I saw the following:
Avg global enqueue get time (ms):                       0.6
Avg global cache cr block receive time (ms):           11.2
Avg global cache current block receive time (ms):       7.2
Avg global cache cr block build time (ms):              0.0
Avg global cache cr block send time (ms):               0.2
Global cache log flushes for cr blocks served %:       10.8
Avg global cache cr block flush time (ms):             72.4
Avg global cache current block pin time (ms):           0.1
Avg global cache current block send time (ms):          0.2
Global cache log flushes for current blocks served %:   0.5
Avg global cache current block flush time (ms):       227.5
The stand out statistic is the last one - 227.5 milliseconds on average to flush a "global cache current block". This is way too high. I would expect an inter-node network message to be of the order of a few milliseconds, and anything involving disk writes to be of the order of tens of milliseconds. So hundreds of milliseconds is way too high.

Further digging around revealed that in some cases a Global Cache request for a block from another node may cause that node to flush its work so far to its redo log before the block is copied across to the other node i.e. a "gc buffer" request from one node may cause a "log file sync" to occur on the other node. And I found this confirmed in this Pythian Blog post on Tuning "log file sync" Event Waits
In a RAC cluster, this problem is magnified, since both CR- and CUR-mode buffers need the log flush to complete before these blocks can be transferred to another instance’s cache.
At this point I switched focus onto the causes of those Global Cache requests and the high inter-node traffic.

In the "SQL Statistics" section of the AWR report is "SQL Ordered by Cluster Wait Time", and I could see a set of SQL statements all acting on the same table and with very high Cluster Wait Times as a percentage of elapsed execution time (CWT%). The percentage cluster wait time was around 85% for all of these related SQL statements i.e. it was spending almost 6 times longer waiting than doing the actual work of the SQL statement.

All of these SQL statements acted on one table that was being both heavily inserted into, and selected from. This is kind of the classic "hot table" or "hot block" problem seen in some OLTP applications on RAC databases. This was further confirmed by the "Segments Statistics" section of the AWR report, in "Segments by Global Cache Buffer Busy". The only objects mentioned are the one table and some of the indexes on it.

In this case there were 2 suggestions I could make:
  1. Make the primary key unique index a Reverse Key index, so that new records would not be inserted into the same last index leaf block.
    • The primary key on this table was a generated sequential number, so it will always be increasing in value, and inserted at the end of a normal index.
  2. Create a new multi-column index on the table that would better satisfy most of the queries being run.
    • The current execution plan used a less selective index and had to also retrieve a number of data blocks from the table to further filter the data according to the constraints in the "WHERE" clause.
These 2 changes reduced both the number of data blocks being accessed in the table for each query execution, and reduced specific contention on the last block in the primary key index. The net effect was to reduce the total number of blocks being continually shared between the nodes in the Global Cache for these SQL statements.

Suffice it to say that performance improved significantly and the users were happy after these changes. The "Avg global cache current block flush time" came down to under 5 milliseconds, which is a significant reduction. The original high wait events all but disappeared, so that disk I/O became the top wait event, and the total Database Time went down significantly because it was no longer waiting so much and so was executing more efficiently.

The key lesson for me was that on a RAC database high inter-node global cache traffic can give rise to redo log buffer flushes and consequently high "log file sync" wait times. And this is primarily due to a high volume of blocks being continually shared between the nodes in the cluster. Rather than trying to directly address the log file sync event, you instead need to address the global cache buffer sharing and reduce the total number of blocks being shared somehow. In my case I was able to create a better index that enabled the execution to access fewer data blocks in total, reducing the global cache traffic volume and contention for hot blocks.

Tuesday, 14 January 2014

Oracle VM on VirtualBox

I needed to "play" with Oracle VM and learn how to use it for some upcoming work involving virtual machines. Oracle VM (not VirtualBox) is virtualization software that installs directly onto a system i.e. it installs onto "bare metal" and does not need a host operating system. The problem I had was that I had no "spare" unused system on which I could install Oracle VM. I really wanted to be able to install it within a host environment as a kind of test sandpit. Which lead me to thinking about VirtualBox - Oracle's other virtualization software. VirtualBox runs inside a host operating system and creates virtual machines in which you can then install other operating systems i.e. it provides virtual machines that each look like an x86 system. Which means that in principle I could use VirtualBox to create a virtual machine, in which I would them install Oracle VM.

And indeed this does work, but there are a couple of important gotchas to be aware of:
  • I was using Oracle VM 3.2 which also needs a separate system for Oracle VM Manager to run on
    • So I created 2 virtual machines in VirtualBox
  • The Oracle VM virtual machine should be quite large e.g. 8 GB of memory
    • As you will then want to create virtual machines within this
  • The Oracle VM virtual machine will need additional disk storage which can be used for its virtual machines
    • I created the VirtualBox VM with a 5 GB disk for installing Oracle VM itself onto
    • And a second 100 GB virtual disk to use for Oracle VM virtual machines later on
  • Oracle VM and Oracle VM Manager must be on the same network as each other for communication
    • I added a second network adaptor to each virtual machine on an Internal Network (vmnet)
  • This second network adaptor must have Promiscuous Mode enabled i.e set to "Allow All"
    • The default of "Deny" will not allow Oracle VM to communicate successfully with Oracle VM Manager
  • VirtualBox does not emulate the Intel VT-x virtualization instructions in its virtual machines
    • So within Oracle VM you can only do Paravirtualized Machines (PVM)
    • So I used Oracle Enterprise Linux has it has suitable paravirtualized drivers in it for use in a PVM
    • You cannot create a Hardware Virtualized Machine (HVM) as this requires the Intel VT-x in the CPU used
    • This is a limitation of VirtualBox, and nothing to do with the real CPU in your physical system on which your host operating system is running
  • If you want a virtual machine running under Oracle VM to be able to network to the outside world you need to configure something else in the network somewhere. There are different options for doing this, depending on what you want and how much change you can do at your network level.
    • You could use the Bridged Adaptor type of network adaptor in VirtualBox, but this means that each virtual machine is visible on your public network and its IP address and hostname must not clash with any other existing system.
    • If you only want internet access for web browsing and downloading operating system updates then you can use the Oracle VM Manager system as an intermediate. It may seem complicated but it worked for me. There are 2 parts:
      • Configure and run some DNS (Domain Name System) software on the Oracle VM Manager system, to let it act as a DNS server to the Oracle VM virtual machines. I used "Dnsmasq" which is simple and straightforward, and can act as a bridge out to cascade out to another DNS server. On the Oracle VM virtual machine configure it to use your Oracle VM Manager system as its DNS server.
      • Configure the Oracle VM Manager to do NAT (Network Address Translation), which involves a few steps to configure "iptables" to do packet forwarding. Then configure the Oracle VM virtual machine to use the Oracle VM Manager as a network gateway (using "route").

The most important point is probably the one about the network adaptor setting in VirtualBox. This caused me no ends of problems as I could successfully install both Oracle VM and Oracle VM Manager on the VirtualBox virtual machines, and I could get Oracle VM Manager to discover and see the Oracle VM system. Which meant that the network between the two systems was definitely working. But whenever I tried to create a new virtual machine in Oracle VM and boot it, it always failed at the Oracle Enterprise Linux installation stage after entering a URL to install from. There is something special done at that point of the installation procedure that requires the network adaptors to be in promiscuous mode. After some digging I came across this "Promiscuous Mode" and tried its alternative setting, and it all worked then.

Thursday, 21 November 2013

Basic Performance Analysis using AWR data

I've previously covered a number of queries that can be combined together to pull out key statistics from the Oracle AWR data for each snapshot it makes, which you could use to analyse the performance of a system. One example would be to use this to look at how the workload changes during the day, and to identify the period of peak activity. Another use would be to see if there were any periods experiencing significantly higher waits than at other times.

I'll now go through how you can use some of the AWR data to do an overall performance analysis of a system. But before we do that, we need to review some basics on performance tuning, just to clarify some things and avoid potential confusion.

Performance Tuning Basics

I'm of the same school of tuning as people like Cary Millsap and Jeff Holt and others - generally speaking you should be tuning specific performance problems as experienced by individual Oracle sessions, and not tuning the system as a whole. This is because any "improvements" you make to the system that do not benefit the slow sessions are completely wasted and pointless. For instance, the system may be experiencing slow disk I/O's, but it might be that the reported slow interactive response time by some users is due to very high CPU usage resulting from a poor execution plan for the particular SQL statement that session is executing. Making the disk I/O's happen quicker might "improve" things for the system as a whole, but it will have no impact at all on the CPU bound SQL statement that is the cause of slow response times on an end user's screen.

That said, there are times where system level tuning is appropriate or when you can only do system level tuning, and that is where AWR can help as it gathers system wide activity data. For session level tuning you need to look elsewhere - such as ASH (Active Session History) or using SQL Trace.

There are some big assumptions being made when we do system level tuning:
  • All sessions are executing the same or similar SQL statements
    • So a system wide "average" view across all sessions is valid
  • Any "poor performance" on the system impacts all sessions equally
    • So even one rogue SQL statement can slow down all other SQL statements being executed from other sessions
  • The system is under a high enough load that performance problems are due to interactions between the SQL statements
    • It is assumed that poorly performing individual SQL statements would have been identified earlier under initial testing
Providing these assumptions or similar are true, then you can do system level tuning.

System Level Tuning Measurements

A computer system exists to do work, and its performance is really about how long it takes to to that work and which resources it uses while doing it. Performance is an issue when a task takes too long to complete.

We need to measure both sides of this – the input work requests, and the output usage of the system resources that results. We can only do performance tuning properly when we know both of these, as this lets us see the impact of any changes we make - does resource usage go down or not for the same input workload?

Measuring the input workload is best done at business or application level transactions. And this is best achieved by instrumenting the application software itself to record both the type of transaction it is performing and the elapsed time it takes when submitted. However, very few applications if any are instrumented this way.

This leaves a database level measurement as the only viable and consistently available way of recording the input workload, and in my view the best available measurement is the number of SQL statements executed per second. While this is very imperfect as a measure of the input workload, it is the nearest we are going to get to it, and all input application transactions result in SQL statements being executed. So it does correlate with input workload.

For a measurement of workload on the system - the output resulting from the execution of the SQL statements - the best measurement is probably Average Active Sessions. AAS is simply the Database Time ("DB Time" from System Time Model) divided by elapsed time for the period over which Database Time was measured. The Database Time value is a measurement of the amount of elapsed time sessions were active executing SQL statements, and includes both active execution time (running on a CPU) and waiting time (for disk I/O or any other wait event). AAS indicates how many sessions were active at the same time executing SQL statements i.e. the level of concurrency on your system.

AAS correlates with the number of CPU's on your system - if all SQL statements executed with no waiting at all they would only use up CPU resource and your transaction rate would be limited by the number of CPU's in the system. If AAS is less than the number of CPU's in your system then you are not reaching capacity. If AAS is close to or more than the number of CPU's then you have an overloaded system and waiting of some form or another is occurring.

To make real use of the AAS value we also want to know the percentage of time spent waiting during each period i.e. total wait time as a percentage of database active time (DB time again). The Percentage Wait (which I label Wait%) indicates the efficiency of the SQL statement execution - if this waiting was eliminated then that wait time would be removed from the elapsed time of each SQL statement executed. This is only relevant when AAS is high enough. When the level of concurrency is very low you can get various anomalies, and there will always be some waiting somewhere. Again remember the assumptions I stated earlier - the system must be under a high enough load to experience performance problems due to the combined workload itself.

That's it - 3 simple measurements should be enough to tell you whether your system is overloaded or not, and whether it is performing efficiently. These can be easily extracted from AWR (see later in this post), and can graphed in your favourite spreadsheet tool to spot trends and anomalies.

Using these measurements

If you have a system wide bottleneck / capacity limit then checking on the total system resource usage during a period of time will hopefully identify any such bottlenecks. As stated, if AAS is high enough (I would say more than half of the number of CPU's), and the Wait% is also high (say 50% or more), then you have poor performance and it can be improved.

At this point I would use the expanded version of the AWR query that I have built up over the previous posts to get all of the data out of AWR for each snapshot, and again into a spreadsheet or something else. Within this AWR data is a breakdown of the Oracle Wait Time by Wait Class, which tells you which "type" of wait was consuming most of the wait time. This can either primarily be Disk (I/O), Network, Cluster, Commit (Redo), or Concurrency (locking).

Knowing the wait class you can then query AWR for SQL statement execution and sum them over the wait class identified earlier, and sort by the wait class time summed. I showed how to do this in my previous post, summing the number of disk reads per SQL statement executed in each snapshot. This way you easily get to see the top SQL by that wait type, and see how much time they spent waiting relative to the total wait time.

Equally you could just do all SQL statements by their total wait time, to see those that experienced the worst waits whatever type of wait they were. Wait time for a SQL statement would be calculated as its Elapsed time minus the CPU time.

AWR System Performance Query

Here is a query that gives you just the 3 key measurements mentioned before for a system from the AWR data. It reports all the measurements for yesterday - modify the date range constraint to report on different periods.
set feedback off
set verify off
set linesize 1000
set trimout on
set trimspool on
--
col snap_time           format a15      heading 'SNAP TIME'
col user_calls_sec      format 9,990    heading 'UCALL/S'
col aas                 format 90.0     heading 'AAS'
col wait_pct            format 990.0    heading 'WAIT%'
--
with
snaps as 
(select snap_id
      , dbid
      , end_snap_time
      , snap_interval
      , extract (second from snap_interval) 
       + (extract (minute from snap_interval) 
          + (extract (hour from snap_interval)
             + (extract (day from snap_interval) * 24)
             ) * 60
          ) * 60 snap_duration
  from (select csnaps.snap_id
             , csnaps.dbid
             , min (csnaps.end_interval_time) end_snap_time
             , min (csnaps.end_interval_time) - min (csnaps.begin_interval_time) snap_interval
          from dba_hist_snapshot csnaps
         group by csnaps.snap_id, csnaps.dbid
       )
) -- snaps
, systimes as 
-- One row per Database Time Model with change in value between snapshots
(select systime.snap_id
      , systime.dbid
      , systime.stat_name
      , sum (systime.value - psystime.value) value
         from dba_hist_sys_time_model systime, dba_hist_sys_time_model psystime
        where systime.snap_id = psystime.snap_id + 1
          and systime.dbid = psystime.dbid
          and systime.instance_number = psystime.instance_number
          and systime.stat_id = psystime.stat_id
-- Assume if stat_id the same so is the stat_name
        group by systime.snap_id, systime.dbid, systime.stat_name
) -- systimes
, sysstats as 
-- One row per System Statistic with change in value between snapshots
(select sysstat.snap_id
      , sysstat.dbid
      , sysstat.stat_name
      , sum (sysstat.value - psysstat.value) value
         from dba_hist_sysstat sysstat, dba_hist_sysstat psysstat
        where sysstat.snap_id = psysstat.snap_id + 1
          and sysstat.dbid = psysstat.dbid
          and sysstat.instance_number = psysstat.instance_number
          and sysstat.stat_id = psysstat.stat_id
-- Assume if stat_id the same so is the stat_name
        group by sysstat.snap_id, sysstat.dbid, sysstat.stat_name
) -- sysstats
, syswaits as 
-- One row for total wait time, plus break down into major wait classes, and events
(select sysevent.snap_id
      , sysevent.dbid
      , sum (sysevent.time_waited_micro - psysevent.time_waited_micro) time_waited_micro
      , sum (sysevent.total_waits - psysevent.total_waits) wait_count
   from dba_hist_system_event sysevent, dba_hist_system_event psysevent
  where sysevent.snap_id = psysevent.snap_id + 1
    and sysevent.dbid = psysevent.dbid
    and sysevent.instance_number = psysevent.instance_number
    and sysevent.event_id = psysevent.event_id
    and sysevent.wait_class != 'Idle'  -- Ignore Idle wait events
  group by sysevent.snap_id
      , sysevent.dbid
) -- syswaits
select to_char (snaps.end_snap_time, 'DD/MM/YY HH24:MI') snap_time
     , (user_calls_st.value / snaps.snap_duration)              user_calls_sec
     , (dbtime.value / 1000000)       / snaps.snap_duration     aas
     , (100 * syswaits.time_waited_micro / dbtime.value)        wait_pct
  from snaps
     join (select * from systimes where stat_name = 'DB time') dbtime
       on snaps.snap_id = dbtime.snap_id and snaps.dbid = dbtime.dbid
     join syswaits
       on snaps.snap_id = syswaits.snap_id and snaps.dbid = syswaits.dbid
     join (select * from sysstats where stat_name = 'user calls') user_calls_st
       on snaps.snap_id = user_calls_st.snap_id and snaps.dbid = user_calls_st.dbid
 where snaps.end_snap_time between 
       (trunc (sysdate) - 1) and (trunc (sysdate))
 order by snaps.end_snap_time
/
--
set feedback on
set lines 80

Wednesday, 6 November 2013

AWR Reporting #6 - SQL Statements

This is the last of the main set of posts on this topic. As for the last post, I'll try and keep this brief and post the SQL involved.

Another data set we can look at is on SQL statements i.e. statistics collected by AWR on individual SQL statements, not overall totals. AWR snapshots SQL statements from V$SQLSTAT to DBA_HIST_SQLSTAT for per SQL statement statistics.

As before we could use the technique of subtracting the values from the previous snapshot from the values of the current snapshot to get the change in value between the snapshots. Unfortunately this runs into issues as not all SQL statements may be present in both snapshots, and SQL statements can be flushed out of the library cache in the SGA and then added back in again later so their statistics have been reset in-between.

Luckily Oracle has solved this problem for us and provided a set of DELTA columns for the most important statistics giving us just what we want. Also this avoids the need to join back to the previous snapshot to calculate the change between the snapshots - the DELTA columns are already the change in value from the previous snapshot.

The main query then to extract useful statistics per SQL statement per database per snapshot is:
select sqlstat.snap_id
     , sqlstat.dbid
     , sqlstat.sql_id
     , sum (sqlstat.elapsed_time_delta)  sql_time
     , sum (sqlstat.executions_delta) sql_execs
     , sum (sqlstat.disk_reads_delta) sql_reads
  from dba_hist_sqlstat sqlstat
 group by sqlstat.snap_id
     , sqlstat.dbid
     , sqlstat.sql_id
We cannot use this query directly in our current main AWR summary query as we are getting multiple data rows per snapshot - one per SQL statement captured by that snapshot. The obvious thing to do is to sum the values over all of the SQL statements executed in that snapshot to produce one set of values per snapshot. Unfortunately this doesn't really give us anything that useful - we already have data values from AWR for these statistics across the whole system (SQL execution time, number of SQL statements executed, and number of disk reads), and when added up across all the SQL statements it doesn't help us see if there are any anomalies within the SQL statement themselves.

A better use for this query is to run it separately within another query that instead groups by SQL_ID and sums over multiple snaphots e.g. all snapshots in one day. This is a useful way of seeing which particular SQL statements put a greater load on the system than other SQL statements. "Load" could be number of executions, CPU used, disk reads, or some other measurement.

For example, one system I was looking at recently was doing a lot of disk reads and I could see that there were a number of table scans occurring. So I wanted to identify the SQL statements causing these table scans i.e. the SQL statements with the highest disk reads. By summarising over a whole day I could ensure that I was looking at the worst offenders who were executed multiple times during the day, and not a bad query only executed once.

The following query reports SQL statements captured by AWR yesterday ("sysdate - 1" truncated to midnight) sorted by total number of disk reads. The "sql_reads > 100000" is a filter so that not all SQL statements are listed, only those with a significant number of disk reads - you can increase or decrease this threshold based on how active your system is. You may want to start higher at a million and then reduce it by a factor of 10 until you get enough SQL statements listed.
with 
snaps as 
(select csnaps.snap_id
      , csnaps.dbid
      , min (csnaps.end_interval_time) end_snap_time
      , min (csnaps.end_interval_time) - min (csnaps.begin_interval_time) snap_interval
   from dba_hist_snapshot csnaps
  group by csnaps.snap_id, csnaps.dbid
) -- snaps
, sqlstats as
(
[insert previous query here]
)
select sqlstats.sql_id
     , sum (sqlstats.sql_reads)  sql_reads
     , sum (sqlstats.sql_execs)  sql_execs
     , sum (sqlstats.sql_time) / 1000000 sql_time
  from snaps
     join sqlstats
       on snaps.snap_id = sqlstats.snap_id and snaps.dbid = sqlstats.dbid
 where snaps.end_snap_time between 
       (trunc (sysdate) - 1) and (trunc (sysdate))
   and sql_reads > 100000
 group by sqlstats.sql_id
 order by sql_reads desc
/
Note that the time values are in microseconds and so must be divided by one million to output them as seconds.

A further optimization to this query is to restrict it to the main working hours, say 8am to 6pm:
   and extract (hour from snaps.end_snap_time) between 8 and 17 -- 8:00 to 17:59
This avoids issues with any overnight batch jobs you may be running or the Oracle default job to update stale statistics on database objects (which does do a lot of disk reads).

When I used this I was able to identify about 4 SQL statements responsible for about 25% of the disk reads on the system, each having a relatively low execution count - under 100 each for millions of disk reads in total. Each was clearly doing full table scans, which was validated by checking the execution plans of each - there is an AWR report supplied with Oracle that reports this information for a SQL statement across a number of snapshots (awrsqrpt.sql).

Now that I knew the top contributors to disk reads on the system I was able to investigate each SQL statement individually and work out changes to improve their execution. This was a mixture of extra indexes, SQL rewrites and other database changes.