Monday, 25 January 2016

What's Going On? Oracle Activity Monitoring Views

How do you find out what is happening now inside an Oracle database instance when you only have SQL level access to the database instance, and no nice GUI management tool? I'll cover how to start doing this with some SQL query examples.

A user is complaining that the application is running slowly, or a report someone scheduled is taking too long to finish - where do you look to find out what is going on inside the Oracle database right now? Well if you have a nice GUI based tool such as Enterprise Manager then go there first, because that will have summary screens of what is going on, and also drill downs into the details behind what is going on. Such tools are easier and quicker to use, and mean you don't have to remember the names of internal Oracle dynamic performance views.

But what if you don't have a nice GUI based tool? Or you want to work out for yourself what is really going on in Oracle? Or something has happened to the nice GUI tool and you cannot use it for one reason or another? Then you need to know about Dynamic Performance Views in Oracle, also known as the V$ tables, and which ones are the most relevant ones. The Dynamic Performance Views are views defined internal to Oracle that in fact map onto memory structures inside the SGA (System Global Area). They enable you to see in real time various data within the SGA using normal SQL queries.

When investigating performance problems you should generally narrow it down to a specific problem and the users being affected by it. In Oracle each connected user has a corresponding "session" inside the Oracle database instance, and information on each session is made available in the V$SESSION dynamic performance view.

This is both an advantage and disadvantage of the Dynamic Performance Views - they are a snapshot of current values in the SGA, and these values can be changing all of the time. Look again and you will get different results. Some views show cumulative values rather than current values, but they are always increasing over time and also constantly changing.

But with that in mind, you can use the V$SESSION view to look at what specific sessions are doing at the moment, and look for any slow, long running SQL statements. We can group the data columns into the following related groups:
  • Session Connection Information - User name, Client Application software, Logon time
  • Current Activity Information - Status, SQL statement executing
  • Delays - both normal Waits and abnormal Blocks i.e. small delays or completely blocked
    • Waits are delays that will definitely finish e.g. read a block from disk
    • Blocks are delays that are outside the control of the session and can potentially go on forever
      • e.g. blocked waiting for a row lock held by another session
For a full list of the columns in V$SESSION look in the "Oracle Database Reference" Manual under "Dynamic Performance Views", where you will get a brief description of each column.

All Active Sessions

Want a list of all active sessions and whether they are waiting or not? Try this query from SQL*Plus:
-- List the active sessions and break down key data
--
column sid              format 999999
column db_user          format a8      heading 'USER'
column db_login_time    format a6      heading 'LOGON'
column program          format a10     heading 'APP_N'
column active_for       format     999 heading 'ACTV|FOR'
column state            format a5      heading 'STATE'
column waiting_for      format     999 heading 'WAIT|FOR'
column event            format a12     heading 'WAIT ON'
column sql_id           format a13     heading 'SQL ID'
--
select s.sid, 
       nvl (s.username, 'SYS') db_user, 
       to_char (s.logon_time, 'hh24:mi') db_login_time,
       substr (s.program, 1, 10) program,
       s.last_call_et active_for,
       decode (s.state, 
                'WAITING', 'WAIT',
                'WAITED KNOWN TIME', 'NW L',
                'NW S') state,
       decode (s.wait_time, 0, seconds_in_wait, 0) waiting_for,
       substr (s.event, 1, 12) event, 
       nvl (s.sql_id, 'Not Executing') sql_id
  FROM v$session s
 WHERE s.type = 'USER'
   AND s.status = 'ACTIVE'
   AND s.wait_class != 'Idle'
/
--
Note the following about this query:
  • It restricts the sessions to only those from real users (not internal background sessions) that are active and experiencing non-idle waits.
    • Idle waits are generally waits external to Oracle, such as waiting for the next SQL statement to execute
  • The number of columns of data output is kept low to fit within an 80 column wide terminal screen
    • If you have a wider output format then you could add extra data columns to the "select"
  • The "decode" of "state" is meant to encode the following combinations of possibilities:
    • WAIT = WAITING, NW = NOT WAITING; PREV WAIT:- S = SHORT, L = LONG
    • i.e. it combines both whether the session is currently waiting or not, with whether the previous wait was a short or long wait
  • It is assumed that users have all connected today, so only the time of connection is shown
    • If connections are long lived, then change the date format used in the "to_char" to include the day
From this you can see the following:
  • How many sessions are active, and how many are currently waiting on something?
    • Sessions that are active and not waiting are therefore running on a CPU
  • How many sessions are running the same SQL statement? Look at the values of SQL_ID
  • How many sessions are experiencing the same kind of wait event?
  • Are sessions experiencing long or short waits most of the time?
 

One Session's Details

If you know the SID (Session ID) of one session you are interested in and want to drill down further, then you could run the following set of queries from within SQL*Plus:
--
set newpage none
--
set define on
undefine SESSION_ID
--
accept SESSION_ID prompt 'Enter Session ID > '
--
column sid          FORMAT 99999
column username     FORMAT a12
column logt         FORMAT a12 HEADING 'Logon Time'
column sql_start    format a9  heading 'SQL Start'
column last_call_et            heading 'Time at Status'
column module       FORMAT a16 heading 'Module'
column client_info  format a16 heading 'Client Info'
column command      format 999 heading 'cmd'
column taddr        format a8  heading 'TX Addr'
column server                  heading 'Server|Type'
column schemaname   format a12 heading 'Schema|Name'
column type                    heading 'Session|Type'
column sql_id                  heading 'Curr SQL ID'
column prev_sql_id             heading 'Prev SQL ID'
column lockwait     format a8  heading 'Wait Lock|Addr'
column event        format a25 heading 'Waiting For'
column wait_class   format a12 heading 'Wait Class'
column row_wait_obj#           heading 'Object|Waiting On'
column wait_time               heading 'Last Wait Time|(0=Waiting)'
column seconds_in_wait         heading 'Elapsed From|Last Wait'
column blocking_session        heading 'Blocking|Session ID'
column blocking_session_status heading 'Blocking|Sess Status'
--
select sid, 
       username, 
       to_char (logon_time, 'dd/mm hh24:mi') logt, 
       status,
       last_call_et,
       to_char (sql_exec_start, 'HH24:MI:SS') sql_start
from v$session
where sid = &&SESSION_ID
/
--
select sid, 
       substr (module, 1, 24) module, 
       substr (client_info, 1, 30) client_info,
       server, schemaname, type
from v$session
where sid = &&SESSION_ID
/
--
select sid, 
       sql_id, prev_sql_id, 
       event,
       substr (wait_class, 1, 20) wait_class
from v$session
where sid = &&SESSION_ID
/
--
select sid, 
       lockwait, 
       row_wait_obj#,
       wait_time, seconds_in_wait,
       blocking_session, blocking_session_status
from v$session
where sid = &&SESSION_ID
/
--
column sql_text format a62
--
select sql_id, sql_text from v$sql
where sql_id = (select sql_id from v$session where sid = &&SESSION_ID)
/
--
undefine SESSION_ID
--
set newpage 1

This gives you a lot more information from V$SESSION for the one session, broken down into related sets of data, as described before. Note that the final query selects from V$SQL which is probably the other main Dynamic Performance View, which has an entry for each SQL statement currently in the library cache in the SGA. Remember that a SQL statement in the library cache can be shared between many sessions.

One Session Detailed Drill Down

If you really want to drill down more into what a single session or a set of sessions is doing over a period of time then you could use Tanel Poder's snapper.sql. This snapshots the contents of V$SESSION and other Dynamic Performance views over a period of time and reports on what it captured. Mainly this will be the wait events experienced by the sessions, but a lot of other activity statistics are reported too.

Snapper is written in PL/SQL and uses two in-memory PL/SQL tables for each snapshot, and then lists out anything that has changed between them. It does not write anything to the database, and only needs minimal permissions - SELECT_CATALOG_ROLE and execute on DBMS_LOCK.SLEEP. Thus it is safe to run on any database.

Conclusion

I've shown how to list all the currently active sessions on a database including the SQL they are executing and what they are waiting on, and how to get more details on what just one session is doing.

If you want even more information on what a single session is doing then you can SQL Trace it, which creates a trace file on the database server of all the SQL statements executed by that session and all the waits it experienced while tracing was enabled. Nothing is missed out because every executed SQL statement from that session is traced. You can then use utilities such as "tkprof" to post process the trace file and report on the SQL statements executed, their elapsed times, and the waits they experienced.

If the slow down is not currently happening but was very recent then you can look back at the recent past using ASH (Active Session History), but note that ASH is an extra cost option - you must have the Diagnostic Pack of the Enterprise Edition of Oracle. I'll write something about using ASH in a future post.

Tuesday, 29 December 2015

GRUB, os-prober and Red Hat / Oracle Linux

I've been successfully using VirtualBox to have test environments to run Oracle Linux and Oracle Database in from some time, but there are limitations to what you can do. So I decided that I wanted to install Oracle Linux onto another disk partition on my PC so I could dual boot into it for some more advanced Oracle Database tests. Well the Oracle Linux installation itself went ahead trouble free - I just had to remember to disable GRUB bootloader installation as GRUB was already installed from my other Linux (Arch Linux) - but then I ran into some problems trying to get this newly installed Linux added properly to the main GRUB boot menu I was using. This post describes why this occurred (after much digging around on my part), and a very quick and simple solution for it.

Detecting other Linux installations on other disk partitions to add to the GRUB boot menu is done on Arch Linux by installing the "os-prober" package which adds some extra scripts used by "grub-mkconfig". The specific problem is that while "os-prober" did detect the Oracle Linux kernels, they were being added to the GRUB menu in the wrong order - it looked like an alphabetical ordering rather than a numeric ordering by kernel version number. This meant that the first Oracle Linux kernel listed in the GRUB menu was not the latest one installed, and in fact more likely to be the oldest one instead or a rescue kernel.

To cut a long story short the problem is due to a combination of the code in the "/usr/lib/linux-boot-probes/mounted/40grub2" detection script and the contents of the "/boot/grub2/grub.cfg" file in the Oracle Linux installation. The "grub.cfg" file in the Oracle Linux installation uses some keywords that are not detected by the "40grub2" script in Arch Linux, so the bootable Linux kernels are not listed in the same order as they are in the source "grub.cfg" file. Instead it is the "90fallback" script that detects the bootable Linux kernels when it is run afterwards by "os-prober". Actually it is run by "linux-boot-prober" and it does a direct listing of Linux kernel files in the "/boot" directory of the other Linux disk partition, and adds each of these to the local GRUB configuration file. And the result of this is that the other Linux kernels are detected and added in alphabetical order.

Details on the Problem

The "40grub2" script works by opening the "/boot/grub2/grub.cfg" file from another Linux installation and looking for the entries for bootable Linux kernels. The idea is that "40grub2" will find Linux kernels in the same order they are in the "grub.cfg" on the other Linux installation, and they will be added to the local "grub.cfg" file in the same order. The benefit of this method is that the first Linux kernel listed for this other installation in the main GRUB boot menu will be the same one as listed by the other Linux installation itself. Which in turn means that if it sorts the Linux kernels in any way or puts a specific Linux kernel first as the default in its "grub.cfg" configuration file, then this is also reflected in the local GRUB configuration file of my main Linux installation.

The "40grub2" script works by opening the "/boot/grub2/grub.cfg" file of the other Linux installation and then reads each line in turn looking for ones that begin "menuentry", "linux" or "initrd". I believe that these are "standard" keywords that GRUB should be using. Unfortunately Oracle Linux is using keywords of "linux16" and "initrd16" instead, which means that these lines are not matched at all by the "40grub2" script and no bootable Linux kernels are matched at all. It seems that Red Hat on which Oracle Linux is based uses these keywords for some obscure, historical reason or other. Either way, they are used and they do not match what "40grub2" is looking for.

Instead the bootable Linux kernels are detected by the "90fallback" script when it runs afterwards, and they are detected in alphabetical naming order as mentioned before.

Solutions

There is a quick, easy and good enough solution you can do yourself, and then there is a more official solution.

First, you can just manually edit your local "40grub2" file and change two lines in it. Add a "16" variation to the lines in the "case" block that test for "linux" and "initrd". Here is the output from "diff" showing the before (<) and after (>) versions of the two lines I changed.

67c67
<    linux)
---
>    linux | linux16 )
80c80
<    initrd)
---
>    initrd | initrd16 )
Once edited run "grub-mkconfig" again to regenerate your "grub.cfg" file, and it should correctly pick up those entries from the other Linux installation now.

Second, it does not look like there is actually an official solution, which can often be the case with open source software. I found some bug reports about this problem but there was some finger pointing going on both ways between the GRUB people and the Red Hat people. It looked like the GRUB people felt that the official keywords were "linux" and "initrd", so it was a Red Hat problem to solve; while the Red Hat people felt that "linux16" and "initrd16" were valid in a GRUB configuration file and did work so it was a GRUB problem with the "40grub2" script.

One person did raise the question on how the main Linux that is adding these entries to its local "grub.cfg" file should be treating these entries with the "16" suffix from the other Linux. Should it ignore them and just use the normal keywords in its own "grub.cfg" file, or should it use exactly the same keywords? The latter solution is a problem because the keywords found in the other "grub.cfg" file are NOT returned back to the "os-prober" script i.e. it is assumed they are only "linux" and "initrd". Making "40grub2" return these extra keywords as extra data fields would need a lot of changes in other places - both "40grub2" and "os-prober" at least, and possibly others too if there is a common format used for passing around information on bootable Linux kernels.

So you can see how something that looks simple can grow into something much bigger, and could have significant changes to something as important as GRUB. And GRUB is a very critical piece of software used at system boot time, so no "obvious solution" should be rushed through without a lot of extra thought and testing. Unfortunately I don't know when we will get any kind of "official solution" to this.

Monday, 28 December 2015

Oracle In Memory and CPU Execution Efficiency

Tanel Poder has been doing a series of posts on "RAM is the new disk" trying to show how the new "Oracle Database In-Memory" feature in Oracle 12c uses memory in a different and more efficient way than the normal, traditional buffer cache does. He hasn't finished the series of posts yet, but I started to draw some of my own conclusions from the data he published in his last post and I'd thought I'd publish my thoughts.

I know a reasonable amount about how CPU's work internally, so I assumed that the main performance improvement of Oracle Database In-Memory would be from a better memory access pattern that would significantly reduce the number of execution stalls inside the CPU. Tanel is one of the good Oracle bloggers, and he has been sharing the details of the tests he has been doing in this series of blog posts as well as the results from them so I can see if my hypothesis on the new Database In-Memory holds up.

Tanels' Tests

The main results are in this post where he gives the 6 SQL queries he ran and their results in terms of elapsed time and CPU execution cycles - some against the traditional buffer cache and some against the new In-Memory cache. No disk accesses were involved as all the data was already in memory in the Oracle SGA.

I'm only interested in the main 2 SQL queries that are the same as each other, except the first one gets its data from the traditional data block based buffer cache in memory while the second gets its data from the new In Memory column storage based cache i.e. both execute the same "SELECT" against the same table and all of the data needed is already in memory in the SGA.
3. FULL TABLE SCAN BUFFER CACHE (NO INMEMORY)
SELECT /*+ MONITOR FULL(c) NO_INMEMORY */ COUNT(cust_valid)  
FROM customers_nopart c WHERE cust_id > 0;

4. FULL TABLE SCAN IN MEMORY WITH WHERE cust_id > 0
SELECT /*+ MONITOR FULL(c) INMEMORY */ COUNT(cust_valid) 
FROM customers_nopart c WHERE cust_id > 0;

Tanel has also made all the low level CPU execution measurements available in a Google online spreadsheet. Here is a cut and paste of the CPU measurements for just the 2 queries I am interested in:

MetricTABLE BUFCACHETABLE INMEMORY PRED
task-clock-ms273741578
cycles864286530404573793724
instructions321154128777080326242
branches7386220210940579984
branch-misses220563974637243
stalled-cycles-frontend766970494202251325295
stalled-cycles-backend586273933951328333827
cache-references25644038411507915
cache-misses2220369817316366
LLC-loads2343611899712269
LLC-load-misses2185702947272805
LLC-stores184935821697666
LLC-store-misses323323127797
L1-dcache-loads73249460421069917316
L1-dcache-load-misses30527634185368159
L1-dcache-prefetches3689030225169253

The most obvious observation is that the In-Memory query execution is 17.34 times faster than the traditional buffer cache query execution (elapsed task-clock-ms time ratio of 27374 / 1578). I'm most interested in trying to explain what has caused this difference in elapsed time.

My Hypothesis & CPU / Memory Access

At the beginning of this post I stated that I assumed that the main performance improvement would be a result of a reduction in the number of execution stalls inside the CPU. Let me try and explain what I mean by that.

In the very old days of Intel CPU's (from the original 8086 up to the first '486 I think) the internal clock of the CPU (the rate at which it executed instructions) was the same as the external clock of the system which also governed memory access times. A read from the CPU of a memory location would take something like 3 or 4 external clock cycles (put the address on the system bus, the memory to process that to its output, put the data out on the bus to the CPU). During these clock cycles the CPU cannot proceed with the execution of that instruction i.e. it stalls until the data it needs from memory has been read into the CPU.

During the '486 lifetime Intel introduced technology that allowed the internal clock of the CPU to run faster than the external clock i.e. at a multiple of the external system clock. Now we have external clocks of 100 MHz typically and CPU internal clocks of 3 GHz and more i.e. the CPU is running internally at 30 or more times the external system clock rate.

A side effect of this change is that now when a CPU "stalls" to read needed data from memory, the stall is for much longer in terms of CPU instruction cycles. As the external clock is now 30 or more times slower than the internal clock of the CPU, the CPU may end up waiting doing nothing for over 100 execution cycles. NB. This is a simplification of what really happens, but it does describe a real effect.

We can see this effect in the separately reported statistics for "CPU cycles" and "CPU instructions completed" - the first one will be much higher than the second one. The "efficiency" of how a CPU executed a given piece of program code can be seen from the calculated "instructions per cycle" value i.e. "CPU instructions completed" divided by "CPU cycles". The closer this value is to 1 the more efficient the execution has been within the CPU. The lower this value is then the more "stalls" occurred within the CPU during execution that stopped it completing execution of an instruction every clock cycle.

Of course CPU designers have tried to work around or reduce the effect of such stalls in various ways, the details of which are not relevant here. It just means that a) there is various technology in the CPU to try and minimise the occurrence and impact of such "memory load stalls", and b) the net impact of such "memory load stalls" is much less than the 100 or so wasted execution cycles I've described. In fact, and this is also important, modern CPU's can actually issue more than one instruction at the same time for execution within the same clock cycle e.g. an integer operation and a floating point operation. This can result in an "instructions completed per clock cycle" value larger than 1, which can occur when executing well optimised program code.

If my hypothesis is correct then we should see a difference in the number of "stalls" in the CPU execution statistics between the 2 SQL query executions, and the ratio between these 2 sets of "stall" statistics should be close to the observed ratio in execution times.

My Initial Conclusions

To cut to the chase a bit, the observed difference in execution times between the 2 SQL queries is not wholly explained by just a reduction in the number of "stalls" inside the CPU, but the reduction in "stalls" is a significant contributor to this reduction in elapsed time. The reduction in elapsed time is roughly a 50:50 split between a reduction in the total number of instructions executed (code efficiency) and CPU memory access stalls (memory access efficiency).

We can see from the measurements that Tanel has reported that the number of instructions executed by the CPU decreased by a factor of 4.54 (32115412877 / 7080326242) to process the same amount of data (rows in the table). This is a significant improvement in code efficiency to process this volume of data - probably a result of the In-Memory cache using far simpler internal data structures than the traditional buffer cache that can in turn be processed much easier.

This leaves an improvement factor of 3.82 (17.34 / 4.54) that needs to be explained by something else. I propose that this other factor is due to a reduction in the number of "stalls" within the CPU when executing these instructions as described earlier. This will have the net effect of increasing the "instructions per cycle" value during the execution of the SQL query, and indeed we can see this from the measurements reported by Tanel:
  • Buffer cache SQL query execution has CPU instructions per cycle of 32115412877 / 86428653040 = 0.37
  • In-Memory cache SQL query execution has CPU instructions per cycle of 7080326242 / 4573793724 = 1.55
  • This gives an improvement ratio of 1.55 / 0.37 = 4.17
And this value is very close to the 3.82 value for the other performance improvement factor.

Trying to drill down further into this difference in "instructions per cycle" and identifying "CPU memory stalls" due to data access is difficult if not impossible for a number of reasons:
  • Above the L1 cache on the CPU the other CPU caches are shared by both instructions and data. We cannot tell which cache reads and which cache misses were due to fetching an instruction to execute or fetching a piece of data needed by an instruction being executed. Both lead to CPU stalls but for different reasons.
  • The In-Memory cache SQL query is executing a completely different code path within Oracle with almost 78% fewer instructions being executed in total (100 / 4.54). So we cannot assume that the instruction execution profile and also instruction cache misses are similar in both cases.
We can however see other evidence that this other factor for the improvement in performance is due to a reduction in CPU stalls. Tanel has reported two other CPU execution statistics - "stalled-cycles-frontend" and "stalled-cycles-backend". Tanel gives descriptions for these in his part 2 post, which I will summarise as follows:
  • Stalled cycles front end occurs before an instruction gets executed by the CPU and is generally due to fetching an instruction from memory
  • Stalled cycles back end occurs during instruction execution and is generally due to fetching data from memory needed by that instruction
So the most relevant statistic for stalls due to reading data from memory is "stalled-cycles-backend". We cannot compare these directly between the two SQL query executions because they executed a different number of instructions (remember the 4.54 reduction factor). What we can do is normalise these to a "stalled cycle per cycle" value and then compare them:
  • Buffer cache SQL query execution has backend stalls per cycle of 58627393395 / 86428653040 = 0.68
  • In-Memory cache SQL query execution has backend stalls per cycle of 1328333827 / 4573793724 = 0.29
  • This gives an improvement ratio of 0.68 / 0.29 = 2.34
Again, this is a significant improvement and shows that the number of stalls due to memory accesses by the CPU for data has been reduced.

Why Have Stalls Reduced?

I hope that I have shown that part of the improvement in the performance of the In-Memory cache has been the reduction in the number of "memory access stalls" in the CPU when executing the SQL query i.e. less time wasted waiting to read data in from main memory with a corresponding increase in the number of instructions executed per clock cycle and less total elapsed to execute those instructions. But how has Oracle actually managed to achieve this? How has it managed to reduce the number of "memory access stalls" while executing the same SQL query i.e. if it needs to read the same "set of data" to perform the same SQL query, how has it managed to read it more efficiently?

I am guessing that this is due to two specific factors:
  1. Data layout in memory - the layout is more "compact" which decreases the total amount of memory that must be read
  2. Sequential access to that memory, causing the CPU to "prefetch" the next memory line into the CPU cache, so it is already there when it is needed i.e. overlapping the processing of one memory line with the fetching of the next memory line
Note that this is based on what I have read on In-Memory as I've not yet had a chance to do any real testing of it yet. Again, thanks to Tanel for both doing some real testing and for sharing all the results he got from those tests.

In the traditional buffer cache a data block will span multiple memory lines (assuming a 64 byte memory cache line, as Tanel states) and this will contain data for all the columns in a row. When you only need the data from one column it means that you are skipping about in memory quite a lot, jumping from row to row, and not using most of the data in each block or memory line. In the new In-Memory cache the data is stored by column, so all the data for one column is stored together, next to each other in memory. This immediately reduces the total amount of memory that must be read to access the data for one column across all the rows in the table.

It also will increase the cache efficiency within the CPU, because one memory line in the cache will contain data for multiple rows i.e. the CPU can process the data for that column from multiple rows before it needs to read in another memory line into cache to get the next set of rows' data.

If the data for a column is physically laid out sequentially and contiguously in the system's memory, then it means that Oracle will be doing a series of sequential memory reads to process all the data for one column. Modern CPU's can detect this kind of sequential memory access and have an optimisation to "prefetch" the next data line from memory while the previous data line is still being processed. The net result is that stalls are either eliminated because the next line of data is already in the CPU cache or the impact of stalls is significantly reduced because the data is already on its way from memory to the CPU cache.

We can see evidence for this occurring in the "L1-dcache-prefetches" CPU statistic. The "L1 data cache" is the first on chip cache immediately used by the CPU core, and is small but very fast. All data needed by the CPU goes through this cache. Again, normalising to a "prefetch per instruction executed" we can see that the number of data prefetches by the CPU increased significantly during the In-Memory cache SQL query execution:
  • Buffer cache SQL query execution has L1 dcache prefetch per instruction executed of 36890302 / 32115412877 = 0.00115
  • In-Memory cache SQL query execution has L1 dcache prefetch per instruction executed of 25169253 / 7080326242 = 0.00355
  • This gives an improvement ratio of 0.00355 / 0.00115 = 3.09

The End

Well that's been a longer post than I wanted, but I wanted to get everything together into one post as I'm sure Tanel will finish his series of posts soon and probably make similar observations. And I wanted to see if he would come to exactly the same conclusions or slightly different ones, and doing this post was the only way to do that. Thanks again to Tanel for doing the tests and sharing all the data with us.

Monday, 14 December 2015

Python for the DBA (2) - Data Unloading

I've mentioned before that I like the Python programming language. Not only is it a "clean" language for writing programs in with a rich set of data types and structures for manipulating data, it also has a standard API for database access making it "database neutral". This makes it relatively easy to get data that is in a database into and out of your Python code where you can use the power of Python for whatever kind of data manipulation you need.

To show how easy it is to get data into a Python program I'll show a little utility to unload data from one table in a database to a flat, text file as one record per line with each data field separated by special delimiters.

Assuming the following:
  • Your Python program already has a database connection
    • This is relatively easy, but does vary by specific database driver being used
  • And it has opened the output file to unload to i.e. an open file handle and not the file name
  • And it has the name of the table, the field delimiter characters and the record terminator characters in local variables
Then you can write a simple function to call that will unload the data from that table to the specified file:
def unload_table_to_file (db_conn, table_name, out_data_file, field_separator, record_terminator) :
    # Preparation steps before doing the real work
    # Replace literals of special characters in the separator and terminator
    record_terminator = record_terminator.replace ('\\n', '\n')
    field_separator = field_separator.replace ('\\t', '\t')
    
    # Simplest SQL to get the data, assume column ordering is correct
    select_SQL = "select * from " + table_name
    
    # Now do the work - read in all records and output them
    print ("=== Unloading . . .")
    select_cursor = db_conn.cursor ()
    select_cursor.execute (select_SQL)
    for data_row in select_cursor.fetchall () :
        # Convert Python native data types to simple string representations
        output_fields = []
        for data_field in data_row :
            output_fields.append (str(data_field))
        out_data_file.write (field_separator.join (output_fields) + record_terminator)
    print ("=== Finished")
The main lines are:
  • First make sure that if any "escaped characters" are in the delimiters then they are the real escape characters
    • "\t" means the TAB character, and "\n" means the NEWLINE character
  • We just use a "select * from table" to get all the data into the Python code
    • The String concatenate operator is just the "+" operator i.e. "+" is overloaded depending on data type
  • And we execute this "select" via a Cursor created from the database connection passed in
  • The real work happens in 2 nested loops - for each data row, for each data field in each data row
    • The fields from one record are put into a list (essentially equivalent to an array)
    • This is so we can use a single action at the end to concatenate them all together with each data field value separated by the "field separator"
    • Then we "write" out this concatenated list of fields to the output file
  • The "str" function will return the String representation of a data value e.g. convert an integer to its String representation
    • This assumes that there exists a data type specific "str" method for each possible data type
  • The "join" method of a String variable iterates over a list of data, and returns one String of the list contents concatenated together with the value of the String variable between each one.
I think this shows some of the power and elegance of Python that reading all the data from a table can be done with so few lines of code, and that the code itself is so readable. And this code is flexible enough to deal with different values for the field separator characters and the record terminator characters. You could just do a classic comma separated list (',' as the field separator) or something more complicated to avoid issues where a comma may appear in the data itself e.g. "|__|" as the field separator (vertical bar, underscore, underscore, vertical bar). And this flexibility is all handled by this Python code.

Caveats

This is probably the most simple Python code to achieve the goal of reading all the data in a table and writing out to a flat, text file. There are several things it does not do at all, and other things that could be enhanced with it.
  • It is assumed that all data types being used can be cleanly converted to valid String representations using "str()"
    • This may not be true for some data types, or you may wish to handle some data types differently e.g. Dates
  • Dates should probably be handled differently, to ensure the full date and time values are output
    • You could execute an "ALTER SESSION" SQL statement to set the value of NLS_DATE_FORMAT to something suitable
    • Python has a native Date / Time data type, "datetime", which Oracle DATE columns are mapped to, and you could test for this
  • You could add extra functionality such as debugging / tracing messages (use the "logging" module) or other progress messages
    • You may wish to add extra counters to count the number of records and fields per record

Tuesday, 24 November 2015

Use Bind Variables in Application SQL for Oracle

If you don't already know, then you should almost always be using bind variables in all SQL statements used in any applications you write that run against Oracle. Bind variables are place holders within your SQL statements that get replaced by real data values at run time when the SQL statement is executed i.e. at run time the real data value is "bound" to the corresponding place holder variable in the SQL statement. So rather than using unique SQL statements with explicit data values embedded in them:
select col1, col2, col3 from mytable where col4 = 'A Value' ;
select col1, col2, col3 from mytable where col4 = 'Another Value' ;
select col1, col2, col3 from mytable where col4 = 'Yet Another Value' ;
use one common SQL statement for all executions instead, binding a different value for each execution:
select col1, col2, col3 from mytable where col4 = :b1 ;
Note that there are a couple of circumstances to be aware of where it might make sense to not use bind variables e.g. data warehouses running ad-hoc queries with histograms present on many skewed data columns. But for the vast majority of circumstances on all OLTP type Oracle databases the applications should be using bind variables.

The most commonly mentioned benefit of using bind variables is quicker SQL statement execution because an existing execution plan in memory is simply reused. There is also an associated reduction in CPU load by avoiding the need to fully parse and validate every SQL statement when executed. However, there are many other benefits that come from using bind variables, which makes the case for using them even stronger. Even if your system does not have high CPU utilisation and performance is adequate today, it is still worth exploring switching to use bind variables for the other benefits you gain.

Here are some of the other benefits, in no particular order.

Library Cache

The memory needed for your library cache where SQL statements and their execution plans are stored in the SGA is significantly reduced. Reusing an existing SQL execution plan for another execution of the same SQL statement but with different run time data values for the bind variables means that fewer SQL statements and execution plans in total need to be stored in the library cache. The main benefit is from the storage for the SQL statement itself which is now stored once only in the library cache and reused, rather than once per execution with different embedded run time data values.

Latch Contention

Latch contention internal to Oracle is reduced. Parsing a SQL statement involves actions such as allocating storage slots in the library cache for the SQL statement and the execution plan, and reading parts of the data dictionary cache while checking table and column names, etc. All of this involves latches - internal locks on Oracle data structures to stop them being used or changed by another session. If you don't need to parse and validate a SQL statement from scratch then the number of latches that get locked during the execution of each SQL statement is significantly reduced.

Better SQL Statement Execution Statistics

When a SQL statement is executed Oracle measures and maintains a number of statistics on the execution of that SQL statement, such as total number of executions, CPU time used, disk reads, etc.

When not using bind variables each SQL statement is unique in the library cache. Eventually when another unique SQL statement is executed it will force one of the other SQL statements to be aged out of the library cache to release its slot. When this happens all of the execution statistics for that SQL statement get destroyed and lost too.

When using bind variables then any existing entry for that SQL statement in the library cache is reused, and the execution statistics for this particular execution are added on to those already accumulated so far. Given that such shared SQL execution plans will stay in the library cache longer, then the execution statistics that Oracle keeps will be accumulated with each execution. And in turn this means that these execution statistics better reflect the true execution profile of all SQL statements being executed against that database.

More Useful AWR Reports

In turn these SQL statement execution statistics are captured and stored in the Automatic Workload Repository (AWR) when the regular snapshots are taken (every hour by default). When using bind variables you get more complete and accurate statistics on total SQL statement execution captured for the reasons given previously. When not using bind variables the churn of SQL statements in the library cache mean that very few are still present in the library cache when the regular AWR snapshot is taken, and their execution statistics are missing from the AWR.

In the SQL Statistics section of an AWR report the top SQL statements are displayed by their cumulative resource usage such as number of executions, CPU usage, disk reads, etc. The idea is that it is trying to help you identify the SQL statements that are having the most impact on the performance of your system. This means that when not using bind variables you are probably missing most of the SQL statement executions from this part of the AWR report because individually their resource usage is very low.

The AWR report helps you spot such an anomaly by telling you the total SQL execution count covered by the statistics being reported at the start of each table of resource usage by SQL statement:
Captured SQL account for ##.#% of Total [Statistic Name]
If the percentage is too low then it implies that there are a lot of SQL statements being executed that are not being captured and reported on by AWR. Even if it is over 50% it still means that up to 50% of the SQL workload on your system is not being captured and reported on by AWR. And you don't know whether you can ignore this or not because you don't know how many SQL statements are involved, how often they are executed, or how much resource they use while executing.

Not using bind variables is an obvious candidate for causing this "missing" set of SQL statements, but there are other possibilities too. So investigate further either way if this percentage for captured SQL is not over 80%. What is causing so much executed SQL to not be captured by AWR?

What about third party applications?

What if you don't have control over the application, and have no ability to change any of the SQL in it? Well you can still benefit from the use of bind variables by forcing Oracle to rewrite all SQL statements dynamically to use bind variables instead. This feature was introduced in Oracle 10g precisely to help with this kind of problem.

There is an initialization parameter "CURSOR_SHARING" which defaults to a value of "EXACT" i.e. an existing cursor in the library cache is only shared and reused if the text of the new SQL statement exactly matches that of the SQL statement that created that cursor originally. If bind variables are not being used, then this means that each SQL statement with an embedded data value will be different and will have a new cursor allocated for it in the library cache.

You can set "CURSOR_SHARING" to "FORCE" which causes Oracle to dynamically rewrite each SQL statement being executed and replace embedded data values by bind variables i.e. Oracle ends up executing the equivalent SQL statement with bind variables. This results in greater reuse of execution plans within the library cache and all of the benefits of using bind variables outlined so far.

Note that there was also a value of "SIMILAR" you could set it to but this was deprecated in 11.2.0.3, so you shouldn't use this setting any more.

Monday, 9 November 2015

Data Driven or Application Driven Development?

I'm the kind of person who believes that Data is really important and needs to be treated and dealt with properly. Of course we also need Applications that use that Data - one without the other is no use at all. But a battle I often have with clients now is that Data needs to be modelled and designed at the same stage as the Application is. Ignore the data model and database design and you will likely end up with a mess that will only come back to haunt you later on. (More on this in other blog posts to come).

I've just read a reasonably interesting blog post from Pythian on being Data Driven where this distinction is mentioned explicitly. At one point one of the panel mentions how some people continue to ignore the importance of Data and its design (my emphasis within this quote):
I recently I talked with someone who felt extremely behind the curve and they only recently did things like looking at data design as integral to the design phase of their system development. They had always, of course, reviewed design of the applications before implementing them, but data was always an afterthought. Applications create data, so what?

Now that they’re trying to get better at data, they had to actually explicitly make sure you got a data person in there designing the data, looking at the impact of what this application’s doing with the landscape or the information architecture they have and reviewing that before going off and actually writing a code.

That was new for them.
This is the kind of thing I have been on about for years - the Data is as important as the Application and needs to be designed properly up front. In fact, you can argue that the Data is more important in the long run. Why? Because Applications come and go but Data lives forever. Think about it - you often replace one Application suite with another, but you will almost always migrate over your existing Data i.e. you might throw away the Application software but you keep the Data itself.

And yet I continue to see software development organisations adopt Object Oriented and Agile Development methodologies wholesale and treat the Data as an "afterthought". Why, when the Data is at least as important as the Application?

Monday, 26 October 2015

JSON, Python & Trello

JSON is a simple data storage format intended to allow easy data interchange between programs and systems. Data is stored in a text format as objects of "name : value" pairs. And that's about it, other than objects can be nested (a value could be another whole object) and multiple objects can occur one after another in the data set (a list or array of such objects). This makes it slightly database like because you have records (objects) made up of named fields holding data values, with nested records inside other records for any kind of relationship or hierarchy.

The "name : value" storage structure is often termed a Dictionary where a given name is associated with its value - other terms such as Hash Map are used in other programming languages. This makes a JSON data set an almost perfect match for processing in Python which has almost directly corresponding data types of dict for the "name : value" pairs and list for multiple objects of the same type. And Python offers a standard "json" module (library) for importing and exporting such JSON data sets into and out of Python objects (the corresponding methods are called "load" and "dump" for consistency with the naming in the existing "pickle" module). Generally speaking when you load in a JSON data set it is converted to a corresponding Python dict object, that may in turn contain other embedded list and dict objects according to the structure of the JSON data being loaded in.

Trello

A real example of this in use is with Trello, where you can export your Do Lists to a local file as a JSON data set. You could then open this file and read in the JSON data set in Python, and process it one way or another. Note that you can also directly access the live data in a Trello List using a suitable URL - Google the Trello API for more on this.

Trello structures its data as follows in a hierarchy where each thing has a name and an internal, normally hidden identifier:
  • A Board is the top level structure - each Board is separate from each other
  • A Board contains multiple Lists
  • A List contains Cards - normally these are the "actions" you want to do
  • A Card can contain a few subsidiary other data items, such as a Checklist, or Labels (categories)
With the data from one Trello Board now read into a Python dict object it becomes very easy to navigate through this and pull out individual members of a list for further processing. An obvious example is to do a simple reformat of the lists for printing purposes - Trello only lets you print the whole board or an individual card, and a board can run to many, many pages when you have lots of lists and cards (items) in it given the default spacing between items.

Trello does not actually store the data in a strict hierarchy, but instead more in a relational format, where child items will contain the identifier ("id") of the parent item they are part of. Thus the top level JSON data for a Trello Board contains data of:
  • name - name of the board
  • id - identifier of this board, a long hexadecimal number
  • desc - description of the board
  • lists - all the Lists in this Board
  • cards - all the individual Cards in all the Lists
  • checklists - all the Checklists used in different Cards
i.e. most things are stored directly linked off a Board, and not stored nested within each other. This is clearly to allow easy moving of Cards between Lists - only the parent identifier within the Card needs to be updated, not the Lists involved or the Board itself.

The main data sets of lists, cards and checklists come into Python as a list (array) of multiple members, each member being a dict. Each data record for these child data sets contains an "id" field for the parent i.e. "idBoard" in a List, and "idList" in a Card (Checklists are slightly different as the Card does contain a list of the Checklist identifiers).

Example 1 - Printing the names of the lists in a board

We just need to iterate over the "lists" data set of the Board, printing out the "name" of each List. We can also check if each list is open i.e. not closed. Note that I am using Python 3, hence "print" is a true function now.
import json
import sys

NAME = "name"
ID = "id"
DESC = "desc"
CLOSED = "closed"
LISTS = "lists"

json_file = open (sys.argv [0])
trello_board = json.load (json_file)
print ("Board: " + trello_board [NAME])
print ("Lists:")
# trello_board [LISTS] is a Python list, each member being a Python dict for each List
# Loop through all the Lists i.e. one List at a time
for trello_list in trello_board [LISTS] :
    if (not trello_list [CLOSED]) :
        print (trello_list [NAME])
This assumes that the JSON file name is given as the first argument on the command line to the Python program.

Example 2 - Printing out the cards in one list

Assuming that the second command line argument is the first part of the name of a List in the Board, then we can use this to find that List and then print out just its Cards.
import json
import sys

NAME = "name"
ID = "id"
DESC = "desc"
CLOSED = "closed"
LISTS = "lists"
CARDS = "cards"
CHECKLISTS = "checklists"
IDBOARD = "idBoard"
IDLIST = "idList"
IDCHECKLISTS = "idChecklists"

json_file = open (sys.argv [0])
trello_board = json.load (json_file)
print ("Board: " + trello_board [NAME])

# Loop through all Lists in the Board, comparing its name against the input name
for trello_list in trello_board [LISTS] :
    # Only do a partial match on name on leading part of name
    if (trello_list [NAME] [:len(sys.argv [1])] == sys.argv [1] and not trello_list [CLOSED]) :
        print ("List: " + trello_list [NAME])
        # Loop through all Cards in all Lists, checking the parent ID of the Card
        for trello_card in trello_board [CARDS] :
            if (trello_card [IDLIST] == trello_list [ID] and not trello_card [CLOSED]) :
                print (trello_card [NAME])
                if (trello_card [DESC]) :
                    print (trello_card [DESC])

Note that in reality you would have extra code to check that the command line arguments were present, and better error handling for things like the file not existing.

Also I actually used "textwrap" in Python to word wrap long lines properly in the output, and indent wrapped lines for better readability - I've just used "print" directly in these examples to keep it simple.

Summary

That's it for JSON and Python using Trello as an example. You can load in a JSON data set from a file into a corresponding Python data structure using just one method call (json.method), and it is very easy to traverse that data structure finding and processing the data elements you want to using the "field name" as the index into the dictionary of "name : value" pairs in the Python data structure.