<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-7302956329008495023</id><updated>2012-02-01T03:21:49.964Z</updated><category term='install'/><category term='density'/><category term='bottleneck'/><category term='virtualbox'/><category term='introduction'/><category term='histograms'/><category term='scalability'/><category term='java'/><category term='contention'/><category term='agile database design'/><category term='holistic'/><category term='multi-threaded CPU'/><category term='optimizer'/><category term='index'/><category term='performance'/><category term='statistics'/><category term='opensolaris'/><category term='measure'/><category term='solaris'/><category term='SQL cost'/><category term='date'/><category term='10g'/><category term='queuing theory'/><category term='oracle'/><title type='text'>Databases and Performance</title><subtitle type='html'>All about Performance of Oracle on UNIX based systems.  How to measure it, analyse it, tune it, and manage it over time.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>35</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-6846926513043424157</id><published>2011-09-20T19:53:00.002+01:00</published><updated>2011-09-20T20:24:42.440+01:00</updated><title type='text'>Blowing my own trumpet - #1</title><content type='html'>On the one hand I don't want to boast, but on the other hand there are a couple of things I am proud of at the moment and thought I would take the opportunity to share.  If only to record them for posterity.&lt;br /&gt;&lt;br /&gt;First, I am presenting at the &lt;a href="http://www.ukoug.org/home/"&gt;UK Oracle User Group&lt;/a&gt; &lt;a href="http://www.ukoug.org/events/5544-unix-sig/"&gt;UNIX SIG&lt;/a&gt; in a few days time (Thu 22 Sept) on &lt;a href="http://databaseperformance.blogspot.com/2009/01/queuing-theory-resource-utilisation.html"&gt;Queueing Theory&lt;/a&gt;.  I thought I would volunteer for something in the spirit of giving back, and was pleasantly surprised when they accepted my submission.  Nothing revolutionary in the presentation itself, but its nice to know that I have something useful to offer others.&lt;br /&gt;&lt;br /&gt;Second, &lt;a href="http://richardfoote.wordpress.com/2011/09/14/big-tables-sorts-and-indexes-quiz-candidate/#comment-33423"&gt;I answered a question&lt;/a&gt; posed on &lt;a href="http://richardfoote.wordpress.com/"&gt;Richard Foote's blog&lt;/a&gt; the other day and was again pleasantly surprised when he said I got the answer right.  "Spot on" was Richard's reply.  Which was a nice surprise.  I expected to be close but I also expected to have missed something out somewhere.  That's the danger when you quickly reply to a question posted on the web, which becomes obvious when someone else points your mistake later.  But this time I got it right.&lt;br /&gt;&lt;br /&gt;Back to normal posts later,&lt;br /&gt;John&lt;br /&gt;&lt;br /&gt;&lt;div id="ui-datepicker-div" style="display: none;"&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-6846926513043424157?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/6846926513043424157/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=6846926513043424157' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/6846926513043424157'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/6846926513043424157'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2011/09/blowing-my-own-trumpet-1.html' title='Blowing my own trumpet - #1'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-332927366891029380</id><published>2011-07-19T19:08:00.002+01:00</published><updated>2011-07-20T12:44:43.238+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='multi-threaded CPU'/><category scheme='http://www.blogger.com/atom/ns#' term='scalability'/><title type='text'>Beware: CPU Threads are not the same as CPU Cores #1</title><content type='html'>Craig Shallhamer recently did a number of posts about reporting &lt;a href="http://shallahamer-orapub.blogspot.com/2011/04/core-vs-threadcpu-utilization-part-1.html"&gt;CPU usage&lt;/a&gt; from the &lt;a href="http://shallahamer-orapub.blogspot.com/2011/05/cores-vs-threads-util-differencespart-2.html"&gt;data Oracle was gathering&lt;/a&gt;, and in this he made a distinction between CPU Cores and CPU Threads.  His focus was on the correlation between the values Oracle was reporting and those reported by other operating system tools - were they the same and could you directly use the values reported by Oracle? (His conclusion was that they were the same).  What he didn't go into was the difference between CPU Cores and CPU Threads, and what the "real" CPU capacity of each type is.  I think this is an important issue for anyone dealing with performance on modern computer systems, as the two types of "multi-CPU" are not the same, and they exhibit radically different scalability behaviour.  And what compounds this is that most operating systems do not distinguish between the two types of CPU, and so will misreport (or lie if you want to) how much CPU capacity you have and how much is being used.&lt;br /&gt;&lt;br /&gt;The purpose of this post is to try and bring out this distinction between the two types of "multi-CPU" in a single processor chip, and show that multi-threaded CPUs do not behave as expected, leading to unexpected and negative scalability results under increased workloads.  The wrong time to discover that the reported "spare, unused CPU capacity" in your system is wrong is when you experience an increase in your transaction volume, and everything just runs slower rather maintaining the same response times.&lt;br /&gt;&lt;br /&gt;Some definitions:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;A CPU Core or Execution Core is a physical piece of silicon in the CPU chip that actually executes instructions on data values.&lt;/li&gt;&lt;li&gt;A CPU Thread is a virtualized thing that in reality sits on top of an underlying CPU Core.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;The key difference is that because of the virtualization, multiple CPU Threads will share a single CPU core.  Thus I might have 2 CPU Threads but only 1 CPU Core.  If I have 2 CPU Cores then they can execute 2 instructions simultaneously, independently and in parallel.  I have 2 times the capacity of a single CPU core.  If I have 2 CPU Threads sharing a single CPU Core, then it can still only execute 1 instruction at a time.  This is true regardless of the number of Threads being virtualized on the same CPU core - there is only one CPU core and it can only execute one instruction at a time.  The virtualization layer may let it switch between the different Threads very often, so that they all have some of their instructions executed, but it is always only executing one instruction at a time.&lt;br /&gt;&lt;br /&gt;If I have 2 CPU Cores I can get twice the work done in a period of time.  But if I have 2 CPU Threads sharing the same physical CPU Core then I can get no more work done in a period of time than if there were only 1 CPU Thread.  And I will show this in a moment.&lt;br /&gt;&lt;br /&gt;Most operating systems though don't distinguish between these two types of multi-CPU flavours.  They report CPU Cores and CPU Threads as if they each had an equal capacity to do work.  Which means that most operating systems are over reporting the available CPU capacity if the underlying CPUs use CPU Threads on shared CPU Cores.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;&lt;span style="font-family:times new roman;"&gt;Tests&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;A solid example.  I have a laptop with an Intel Core i3 M350 CPU in it.  This has 2 physical CPU Cores, and on each of these it has 2 CPU Threads.  This gives a total of 4 exposed CPUs, but on only 2 CPU cores.  The laptop is running Linux (XUbuntu) and I have installed SAR via the sysstat package.  /proc/cpuinfo shows 4 CPUs (processors 0 to 3), and clearly this is used by SAR in calculating CPU capacity and CPU utilisation.&lt;br /&gt;&lt;br /&gt;I have a simple CPU test using the &lt;span style="font-weight: bold; font-style: italic;"&gt;factor &lt;/span&gt;utility, which prints out the prime factors of any number.  Any CPU only test will do, and &lt;span style="font-weight: bold; font-style: italic;"&gt;factor&lt;/span&gt; is a CPU bound utility.  Feed it a list of very big numbers and it will use up some CPU testing prime factors.  By using the same numbers I can repeat this any time I want to.  How will this behave when I run more instances of this test at the same time?  The system says that it can see 4 CPUs.  But I know that the Core i3 has only 2 physical CPU Cores to execute instructions on.&lt;br /&gt;&lt;br /&gt;The only fly in the ointment is that because this laptop is running X Windows and I am typing in my results to it there is a small constant CPU overhead of around 5%.  So the system is not 100% idle when I run these tests.  I mention this to be as honest and open as I can be about the tests I did.&lt;br /&gt;&lt;br /&gt;Running &lt;span style="font-weight: bold; font-style: italic;"&gt;factor &lt;/span&gt;once against a list of large numbers (see elsewhere) and using "&lt;span style="font-weight: bold; font-style: italic;"&gt;time&lt;/span&gt;" to measure its elapsed time, takes 7.779 seconds and produces the following SAR output:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;12:49:44        CPU     %user     %nice   %system   %iowait    %steal     %idle&lt;br /&gt;12:49:46        all      0.74      0.00      0.74      0.00      0.00     98.51&lt;br /&gt;12:49:48        all      1.74      0.00      0.62      0.00      0.00     97.64&lt;br /&gt;12:49:50        all     10.88      0.00      1.64      0.00      0.00     87.49&lt;br /&gt;12:49:52        all     27.66      0.00      2.78      0.00      0.00     69.57&lt;br /&gt;12:49:54        all     27.64      0.00      2.67      0.48      0.00     69.21&lt;br /&gt;12:49:56        all     27.79      0.00      2.55      0.73      0.00     68.93&lt;br /&gt;12:49:58        all     15.12      0.00      2.03      0.00      0.00     82.85&lt;br /&gt;12:50:00        all      3.29      0.00      3.06      0.94      0.00     92.71&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;As expected, because Linux believes that there are 4 CPUs in the system and I have run one CPU bound task it reports that it is using 25% of the CPU capacity.  "&lt;span style="font-weight: bold; font-style: italic;"&gt;time&lt;/span&gt;" reported that 7.440 seconds of user CPU time was used. System CPU time is always negligible in these tests.&lt;br /&gt;&lt;br /&gt;When I run 2 of these &lt;span style="font-weight: bold; font-style: italic;"&gt;factor &lt;/span&gt;tests at the same time they take 8.034 seconds - about the same elapsed time allowing for other factors affecting this measurement.&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;12:52:16        CPU     %user     %nice   %system   %iowait    %steal     %idle&lt;br /&gt;12:52:18        all      2.09      0.00      1.72      0.00      0.00     96.19&lt;br /&gt;12:52:20        all      0.61      0.00      0.49      0.00      0.00     98.90&lt;br /&gt;12:52:22        all     25.73      0.00      3.88      0.85      0.00     69.54&lt;br /&gt;12:52:24        all     52.32      0.00      3.58      0.72      0.00     43.38&lt;br /&gt;12:52:26        all     51.49      0.00      3.93      0.00      0.00     44.58&lt;br /&gt;12:52:28        all     51.57      0.00      3.13      0.96      0.00     44.34&lt;br /&gt;12:52:30        all     22.18      0.00      1.72      0.00      0.00     76.10&lt;br /&gt;12:52:32        all      1.25      0.00      0.25      1.50      0.00     97.01&lt;br /&gt;12:52:34        all      1.10      0.00      0.74      0.00      0.00     98.16&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;As expected, SAR is reporting 50% CPU utilization - double our 25% before - and still around 45% CPU capacity unused (idle).&lt;br /&gt;&lt;br /&gt;Note also that "&lt;span style="font-weight: bold; font-style: italic;"&gt;time&lt;/span&gt;" reports that 15.465 seconds of user CPU time was used by these 2 factor tests.  This ratio of 2:1 between CPU used and elapsed time shows that the 2 &lt;span style="font-weight: bold; font-style: italic;"&gt;factor&lt;/span&gt;s were indeed running simultaneously in parallel.&lt;br /&gt;&lt;br /&gt;Next I run 4 of the factor tests together.  How long will they take - still 8 seconds?  And what will SAR report?&lt;br /&gt;&lt;br /&gt;In fact they took 13.555 seconds, with &lt;span style="font-weight: bold; font-style: italic;"&gt;time &lt;/span&gt;reporting 47.991 user CPU seconds.  SAR reported 100% utilisation:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;&lt;br /&gt;12:56:40        CPU     %user     %nice   %system   %iowait    %steal     %idle&lt;br /&gt;12:56:42        all      0.70      0.00      0.58      0.00      0.00     98.72&lt;br /&gt;12:56:44        all      0.12      0.00      0.74      1.74      0.00     97.39&lt;br /&gt;12:56:46        all     22.10      0.00      1.85      0.12      0.00     75.93&lt;br /&gt;12:56:48        all     96.62      0.00      3.38      0.00      0.00      0.00&lt;br /&gt;12:56:50        all     96.88      0.00      3.12      0.00      0.00      0.00&lt;br /&gt;12:56:52        all     96.62      0.00      3.38      0.00      0.00      0.00&lt;br /&gt;12:56:54        all     97.00      0.00      3.00      0.00      0.00      0.00&lt;br /&gt;12:56:56        all     96.00      0.00      4.00      0.00      0.00      0.00&lt;br /&gt;12:56:58        all     96.88      0.00      3.12      0.00      0.00      0.00&lt;br /&gt;12:57:00        all     26.37      0.00      1.74      0.87      0.00     71.02&lt;br /&gt;12:57:02        all      0.25      0.00      0.62      0.99      0.00     98.14&lt;br /&gt;12:57:04        all      0.62      0.00      0.86      0.00      0.00     98.52&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;Now, if there were 4 real CPU cores I would not expect the elapsed time to increase from about 8 to 13.5 seconds (over 50% increase).  Furthermore, something is definitely wrong with the reported CPU usage - 47.991 seconds (nearly 48) for 4 compared to 15.465 (15.5) for 2 &lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt;factors&lt;/span&gt;&lt;/span&gt;. I'll come back to this in another post (the operating system is actually measuring allocation of CPU time, not real usage of CPU cycles).  The fact that one &lt;span style="font-weight: bold; font-style: italic;"&gt;factor&lt;/span&gt; used 7.440 seconds of CPU means that four &lt;span style="font-weight: bold; font-style: italic;"&gt;factors&lt;/span&gt; should not use more than about 30 seconds (7.5 * 4).  So 48 seconds is way off the mark.&lt;br /&gt;&lt;br /&gt;For completeness 3 factors together take 10.946 seconds, with 29.738 CPU seconds, and SAR said about 75% utilisation (75% for factor + 5% for other background processes):&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;&lt;br /&gt;13:00:24        CPU     %user     %nice   %system   %iowait    %steal     %idle&lt;br /&gt;13:00:26        all      1.24      0.00      0.99      0.12      0.00     97.65&lt;br /&gt;13:00:28        all      1.70      0.00      0.73      0.85      0.00     96.72&lt;br /&gt;13:00:30        all     21.98      0.00      2.56      0.00      0.00     75.47&lt;br /&gt;13:00:32        all     79.15      0.00      4.49      0.62      0.00     15.73&lt;br /&gt;13:00:34        all     79.72      0.00      3.50      0.00      0.00     16.77&lt;br /&gt;13:00:36        all     79.75      0.00      3.88      0.00      0.00     16.38&lt;br /&gt;13:00:38        all     74.63      0.00      2.74      0.00      0.00     22.64&lt;br /&gt;13:00:40        all     52.11      0.00      3.49      1.32      0.00     43.08&lt;br /&gt;13:00:42        all     12.07      0.00      1.52      0.00      0.00     86.40&lt;br /&gt;13:00:44        all      2.31      0.00      0.85      1.33      0.00     95.51&lt;br /&gt;13:00:46        all      1.43      0.00      0.95      0.00      0.00     97.62&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;From this I draw the following conclusions:&lt;br /&gt;&lt;br /&gt;First, as I described,  CPU Threads are virtual and the  real measure of available CPU capacity is CPU cores.  That is the "real  CPU capacity available", regardless of the number of threads on top of  the physical CPU cores.&lt;br /&gt;&lt;br /&gt;This is shown by the increase in elapsed  time when running more than 2 &lt;span style="font-weight: bold; font-style: italic;"&gt;factors&lt;/span&gt; together.  If there really were  50% CPU capacity unused and idle then there should be no significant  increase in elapsed time for 3 or 4 &lt;span style="font-weight: bold; font-style: italic;"&gt;factors &lt;/span&gt;together compared to only 1 or 2.&lt;br /&gt;&lt;br /&gt;Interestingly  8 &lt;span style="font-weight: bold; font-style: italic;"&gt;factors &lt;/span&gt;took 26.790 seconds and "&lt;span style="font-weight: bold; font-style: italic;"&gt;time&lt;/span&gt;" reported 1 minute 37.194 user  CPU seconds (97.194 seconds).  Both almost perfectly double the 4 &lt;span style="font-weight: bold; font-style: italic;"&gt; factors &lt;/span&gt;test results, because the CPUs are fully saturated.&lt;br /&gt;&lt;br /&gt;Second, the operating system and hence SAR is lying when it says that there is 50% CPU capacity unused and available when I ran 2 &lt;span style="font-weight: bold; font-style: italic;"&gt;factors &lt;/span&gt;together.  A "CPU Thread" is unfortunately not a real CPU, but is rather sharing a real CPU with other "CPU Threads".&lt;br /&gt;&lt;br /&gt;Third, this is also important to Oracle based systems.  A "perfect" or "ideal" Oracle system is both well tuned and running a well designed and written application. It will minimise disk I/Os because they are the slowest operation, and it will scale well as both workload and system capacity are increased.  This means that in reality such an Oracle system is CPU bound - transactions are limited by how fast the CPUs are, and not by how fast the disks are.  Adding more transactions by increasing the workload will require more CPU capacity to maintain current response times.&lt;br /&gt;&lt;br /&gt;Using CPUs that expose multiple CPU Threads on shared CPU execution cores will result in poor scalability under increasing workloads, and a significant increase in response times when the used CPU capacity reported equals the number of physical CPU cores.  This is exactly what happened in my tests - up to 50% reported CPU utilisation the elapsed times of &lt;span style="font-weight: bold; font-style: italic;"&gt;factor &lt;/span&gt;remained constant, but beyond 50% reported CPU utilisation the elapsed times increased in proportion to the number of concurrent factor's running.&lt;br /&gt;&lt;br /&gt;Personally I would either avoid all such multi-threaded CPUs, or would switch off all the extra Threads leaving only one CPU Thread per CPU Core.  That way I know that the operating system is reporting "real" CPU capacity and utilisation, and that there is no hidden knee in the response time as I increase the CPU usage under increasing workloads.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;font-size:130%;" &gt;Source Code&lt;/span&gt;&lt;br /&gt;The &lt;span style="font-weight: bold; font-style: italic;"&gt;factor&lt;/span&gt; test is just running &lt;span style="font-weight: bold; font-style: italic;"&gt;factor&lt;/span&gt; with input redirected from a list of very large prime numbers:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;factor &amp;lt; factors &lt;/code&gt;&lt;/pre&gt;I can run this and &lt;span style="font-weight: bold; font-style: italic;"&gt;time&lt;/span&gt; how long it takes.  I run multiple factors by putting the same command into a file and running that.  Each factor runs at the same time (&amp;amp; runs it as a separate, child process).&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;&lt;br /&gt;#!/bin/sh&lt;br /&gt;#&lt;br /&gt;factor &amp;lt; factors &amp;amp;&lt;br /&gt;factor &amp;lt; factors &amp;amp;&lt;br /&gt;#&lt;br /&gt;wait &lt;/code&gt;&lt;/pre&gt;The list of factors is the same set of large prime numbers repeated 10 times in the file:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;999999999989&lt;br /&gt;99999999999973&lt;br /&gt;99999999999971&lt;br /&gt;99999999999959&lt;br /&gt;99999999999931&lt;br /&gt;99999999999929&lt;br /&gt;99999999999923&lt;br /&gt;99999999999853&lt;br /&gt;99999999999829&lt;br /&gt;99999999999821&lt;br /&gt;99999999999797&lt;br /&gt;99999999999791&lt;br /&gt;99999999999701&lt;br /&gt;99999999999673&lt;br /&gt;99999999999503&lt;br /&gt;99999999999481&lt;br /&gt;99999999999469&lt;br /&gt;99999999999467&lt;br /&gt;99999999999463&lt;br /&gt;99999999999457&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;div id="ui-datepicker-div" style="display: none;"&gt;&lt;/div&gt;&lt;div id="ui-datepicker-div" style="display: none;"&gt;&lt;/div&gt;&lt;div id="ui-datepicker-div" style="display: none;"&gt;&lt;/div&gt;&lt;div id="ui-datepicker-div" style="display: none;"&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-332927366891029380?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/332927366891029380/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=332927366891029380' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/332927366891029380'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/332927366891029380'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2011/07/beware-cpu-threads-are-not-same-as-cpu.html' title='Beware: CPU Threads are not the same as CPU Cores #1'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-4996462647107704705</id><published>2011-06-28T20:23:00.000+01:00</published><updated>2011-06-28T20:23:23.721+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='histograms'/><category scheme='http://www.blogger.com/atom/ns#' term='density'/><category scheme='http://www.blogger.com/atom/ns#' term='optimizer'/><title type='text'>Histograms are not what I assumed</title><content type='html'>One of the themes I'll keep coming back to is how things are often not how you have assumed they are, and how there are always opportunities to learn something new about Oracle.  I am quite familiar with histograms in Oracle from reading the manuals and white papers, but have always left it to Oracle to decide what columns histograms were needed.  This is mainly due to the databases I have worked on not being large or complicated enough to need further analysis so far.  But knowing the day would come when I would have to do something I thought I would review what I knew, and in doing so learnt a few new things - or rather realised that some of my assumptions had been incorrect.&lt;br /&gt;&lt;br /&gt;Rather than re-read the manual I decided to go to a better source of knowledge rather than just raw information - Jonathan Lewis's &lt;a href="http://www.amazon.co.uk/Cost-Based-Oracle-Fundamentals-Experts-Voice/dp/1590596366/ref=wl_it_dp_o?ie=UTF8&amp;amp;coliid=I29DQ177L8QLLH&amp;amp;colid=CGRKE4OKZ11N"&gt;Cost Based Oracle Fundamentals&lt;/a&gt;.  I already knew that Oracle had 2 types of histogram - Frequency and Height Balanced.  And I knew that Frequency stored a row count per individual data value, while a Height Balanced stores a row count per range of values (strictly the row count is the same per range, and it is the data range size that varies).  And I knew that the Optimizer in Oracle used this extra row count data in the histogram to calculate better estimates for row counts that would match constraints in queries.  In reading the chapter on Histograms and working through the examples and doing my own tests I realised that some of my assumptions had been wrong about how Oracle uses these histograms.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Assumption 1&lt;/span&gt; - Oracle uses all the entries in a histogram to calculate its row count estimate&lt;br /&gt;&lt;br /&gt;I knew this to be true for a Frequency Histogram, and had assumed something similar for a Height Balanced one.  But I was wrong.  In a height balanced histogram although Oracle may have up to 254 buckets, it only uses the ones that contain the same end point values to identify "popular values" - those that occur more frequently than the width of a single bucket, and so span more than one bucket in size.  For all other values it ignores the histogram completely and uses the column level &lt;span style="font-style: italic;"&gt;Density&lt;/span&gt; statistic in the same manner as if there were no histogram.&lt;br /&gt;&lt;br /&gt;In other words there is a threshold, being the number of rows covered by a single bucket.  For data values that occur more often than this and are also recorded in two or more buckets in a height balanced histogram, the Optimizer uses the number of buckets to calculate the estimated row count, along with the number of rows per bucket.  For all other values Oracle assumes a uniform distribution and uses the column level &lt;span style="font-style: italic;"&gt;Density&lt;/span&gt; statistic.  This will not be the same as one over the Number of Distinct Values (NDV), but is calculated differently to remove the effect of the rows for those popular values i.e. the &lt;span style="font-style: italic;"&gt;Density&lt;/span&gt; is lowered to the average of the "unpopular" values.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Assumption 2&lt;/span&gt; - A Frequency Histogram has an entry for all data values occurring in the table&lt;br /&gt;&lt;br /&gt;This is true if either the table is relatively small, or you force Oracle to read all the data rows when gathering statistics.  If you leave Oracle to use its default sampling method and you have a large table then some values may not be sampled, and they will be missing from the Frequency Histogram produced.&lt;br /&gt;&lt;br /&gt;What Oracle does is to calculate and store a &lt;span style="font-style: italic;"&gt;Density&lt;/span&gt; value for the column that is half that of the least frequent occurring value in the histogram.  Values that appear in query constraints that do not appear in the Frequency Histogram are therefore assumed to occur at half the row count of the value with the smallest row count in the histogram.  So again, the Optimizer may end up not using a histogram that exists and instead use the &lt;span style="font-style: italic;"&gt;Density&lt;/span&gt; statistic of the column when executing a particular query.&lt;br /&gt;&lt;br /&gt;None of this is new, and I was able to double check this via various other sources.  It was just another thing to add to the list of assumptions I've made in the past that turn out not to be true.  Even with histograms in place there is a reasonable chance that the Optimizer will actually not be using the histogram itself but instead the &lt;span style="font-style: italic;"&gt;Density&lt;/span&gt; statistic value of the column.  And also changing the value of the &lt;span style="font-style: italic;"&gt;Density&lt;/span&gt; for the column can have an impact on queries, even when there is a histogram on that column.&lt;br /&gt;&lt;br /&gt;&lt;div id="ui-datepicker-div" style="display: none;"&gt;&lt;/div&gt;&lt;div id="ui-datepicker-div" style="display: none;"&gt;&lt;/div&gt;&lt;div id="ui-datepicker-div" style="display: none;"&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-4996462647107704705?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/4996462647107704705/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=4996462647107704705' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/4996462647107704705'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/4996462647107704705'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2011/06/histograms-are-not-what-i-assumed.html' title='Histograms are not what I assumed'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-155930667182738555</id><published>2011-06-16T19:42:00.003+01:00</published><updated>2011-06-16T19:59:56.175+01:00</updated><title type='text'>Hinting - Lessons Learnt</title><content type='html'>Following on from my &lt;a href="http://databaseperformance.blogspot.com/2011/06/customers-who-wont-wait-to-be-helped.html"&gt;previous post about some SQL that needed to be tuned&lt;/a&gt;, I thought I'd summarise some additional important lessons I've learnt about using hints.  I already knew quite a lot about hints in Oracle  from reading different sources, and to avoid them as much as possible.  In spite of this I have still learnt a few new things about using hints in practise.  I know that others have said most of this before, but it bears repeating because it really does affect whether any hints do work or not.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Name all your query blocks using the qb_name hint.  This clarifies things both in terms of how Oracle itself reports back your execution plan, and in terms of how you specify your hints.&lt;/li&gt;&lt;li&gt;Qualify all object references with query block names e.g. t1@mainq.  Again, this is how Oracle reports object references, and it is precise in terms of where the hint should be applied.&lt;/li&gt;&lt;li&gt;Check your hints are being used with the 'outline' option to dbms_xplan.display_cursor.  If the hints being listed are the same ones that you used, then all is well.  If not, then it is likely that some of your hints are actually being ignored.&lt;/li&gt;&lt;li&gt;Test the hints individually.  This follows on from the previous point about proving the hint is recognised and honoured by Oracle.  It is possible that a similar hint is being automatically produced by the Optimizer as a result of another hint you are using.  In my case it looks like a USE_NL hint was being ignored, but a Nested Loop was produced anyway because of an INDEX hint.&lt;/li&gt;&lt;li&gt;Include all other relevant hints, such as LEADING and UNNEST.  Previously I would have assumed that these would be produced by the Optimzer automatically but &lt;a href="http://jonathanlewis.wordpress.com/2011/06/08/how-to-hint-1/"&gt;Jonathan Lewis includes them in his hint sets&lt;/a&gt; so they must be relevant.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-155930667182738555?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/155930667182738555/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=155930667182738555' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/155930667182738555'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/155930667182738555'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2011/06/hinting-lessons-learnt.html' title='Hinting - Lessons Learnt'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-8117353096906191383</id><published>2011-06-09T19:51:00.002+01:00</published><updated>2011-06-09T20:24:04.288+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><title type='text'>Customers who won't wait to be helped</title><content type='html'>[&lt;span style="font-style: italic;"&gt;I'm back.  I haven't posted for a long while due to a combination of being busy and somewhat losing focus on the blogging side, but I hope to post more frequently in the future.  I was going to start with a post on histograms in Oracle but ...&lt;/span&gt;]&lt;br /&gt;&lt;br /&gt;I work for a software house, whose heavyweight application runs on top of Oracle.  Earlier this week I was dealing with a customer who had poor performance on just one SQL query after having updated their statistics.  I spent a day analysing the SQL and the data - what was it trying to do and why - and another day trying different solutions to make it use a different execution plan and run faster for them.  I then sent the results to the support people dealing with this customer so that the solution could be implemented.  Imagine my surprise then when I see today that Jonathan Lewis has a &lt;a href="http://jonathanlewis.wordpress.com/2011/06/08/how-to-hint-1/"&gt;blog post about the same SQL statement&lt;/a&gt;, and makes reference to &lt;a href="http://forums.oracle.com/forums/thread.jspa?threadID=2234783&amp;amp;tstart=0"&gt;a post on one of the Oracle Technical Forums&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I'm surprised that a customer would post SQL from a third party application on the web like that, and that they would somehow expect a better answer to come back than from the software vendor themselves.  I was even more surprised because they were asking for a set of hints to change the execution plan, and I had just come up with such a set of hints just 2 days earlier.  Okay, maybe this had not been forwarded to the customer yet, because our Support department was doing further testing on it first.  So it is highly likely that the customer had not seen my solution yet.  But I'm still amazed that a customer would be willing to put into production some recommendation they had got off a web forum, and on the basis that they could not work out a better solution themselves - the poster did not know how to do hints properly.&lt;br /&gt;&lt;br /&gt;In terms of the problem itself, the cause of the poor performance is actually the extreme skew within the data in this table.  On average most of the date values occur thousands of times, but the particular value used in the query only occurs about ten times in the table.  Hence another execution plan is faster for this particular value.&lt;br /&gt;&lt;br /&gt;Which brings me back to histograms, which I was planning on doing a completely different post on anyway.  There is a histogram on the constrained column, but it is a height balanced histogram because there are over 254 distinct values, and so there are no statistics on row counts of individual values other than the most popular values.  The value used in the query is unpopular, and the average number of rows per value across all the unpopular values is in the tens of thousands.  Hence the choice of an execution plan using a HASH JOIN, as that scales in a different way to higher row counts than a NESTED LOOP join does.&lt;br /&gt;&lt;br /&gt;Maybe Oracle should introduce a new type of histogram?  One that records both the most popular values, and the least popular values, and then stores a single average for all the values in between (the density).  That would have helped here.  It certainly seems the case that Oracle does not handle heavily skewed data sets well, though of course you should never expect a "one size fits all" solution to be perfect for all possible scenarios.  What the Optimizer does is try and produce an execution plan that is the best for most of the possible input values.  Which is what it did here, based on the statistics available for the average number of rows per date value via the density statistic.&lt;br /&gt;&lt;br /&gt;Another viable solution is to make the Optimizer believe that the date constraint will match far fewer rows and so choose the NESTED LOOP join method itself.  This is the approach put forward by Wolfgang Breitling in his paper on &lt;a href="http://www.centrexcc.com/Tuning%20by%20Cardinality%20Feedback.pdf"&gt;Tuning by Cardinality Feedback&lt;/a&gt;, in which he suggests changing the Density statistic stored for the column.  And indeed reducing the density value by a factor of 100 has this effect.  The upside of this approach is that it avoids the need to change the SQL or use hints, which do not always work as intended.&lt;br /&gt;&lt;br /&gt;&lt;div id="ui-datepicker-div" style="display: none;"&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-8117353096906191383?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/8117353096906191383/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=8117353096906191383' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8117353096906191383'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8117353096906191383'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2011/06/customers-who-wont-wait-to-be-helped.html' title='Customers who won&apos;t wait to be helped'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-8014081655153637671</id><published>2011-03-07T19:44:00.003Z</published><updated>2011-03-07T20:20:21.282Z</updated><title type='text'>I hate Agile Development</title><content type='html'>I really hate, loath and detest "Agile Development".  At least I hate how they use it where I work.  For the record I actually agree with most of the principles and points of Agile Development.  I'm pretty sure I have said so before.  But what they do in the name of Agile Development where I work is really terrible.&lt;br /&gt;&lt;br /&gt;The list of "crimes" conducted by the development teams against databases and development in general by their "use" of Agile methods continues to get longer and longer.  Lately I have had to deal with a number of such crimes and their consequences.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;One group went ahead with development against nothing more than a strawman design, that in turn was for only verbally stated requirements.  No one could agree in a meeting I attended what the actual requirements were.  The number of times someone said "I assumed he meant ..." was incredible.&lt;/li&gt;&lt;li&gt;Another group designed and implemented their own tables, but now have no time to have their database design reviewed before the release deadline.  It "must" be delivered as it is, because it is too late to change anything.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Another group simply did not bother to document anything.  When asked at  how they arrived at their final solution and the analysis they went  through, they could not explain or justify anything.&lt;/li&gt;&lt;li&gt;All of the groups have used the phrase "we left that for another sprint" for some major, critical piece of functionality.  They are simply ignoring anything that is too difficult.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Iteration speed is more important than getting it right or meeting all the requirements.  The fact that the speed of iteration is introducing more errors that need to be corrected by future sprints never gets flagged by anyone.  I am sure the quality of what is being delivered is going down, but everyone else is happy that we now have some more buttons to click on the user facing screens and forms. &lt;br /&gt;&lt;/li&gt;&lt;li&gt;Design and code reviews are simply ignored. The only measurement seems to be the volume of exposed and  visible functionality delivered.  Whether it works or not is never  really considered.  By that time the developers have moved onto the next sprint, and the next piece of functionality to be delivered.  The push is for additional functionality added to the application software at all costs.&lt;/li&gt;&lt;li&gt;Many application level requirements are simply ignored in early development sprints, just to deliver some usable functionality as early as possible.  The fact that the initial design and development cannot be extended in the future to meet the ignored requirements is itself simply ignored.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;I really want to like Agile, but the things done in the name of Agile are truly horrible.    The attitude of simply ignoring critical dependencies and then  expecting the rest of the world to reorganise itself around their  particular problem and solve it for them is amazing.  And if you don't put everything down immediately and jump , you are accussed of being the blocker who is holding up development and stopping delivery of the wonderful software they have all developed.  The fact that it is seriously crippled and will never work properly is somehow never their problem.  And the most common answer to my any question I ask about the development they did is "we never had time to do that (so we didn't)".&lt;br /&gt;&lt;br /&gt;I now term Agile the "ignore it today, it will be someone else's problem tomorrow" development methodology.   I really do think that many of the developer's believe in a world of magic fairies or pixies that will sprinkle magic Agile dust everywhere in the middle of the whole development, and all of their ignored issues will just disappear and go away.  Unfortunately, everyone else is living and dealing with the real world.  Have you ever tried living in a house built and delivered one room at a time, all with separately laid foundations, and installed plumbing, electrics, windows and doors?  It is a mess, and it gets worse over time as more and more rooms are added on, because there was never an overall design, and no one ever bothered to think about future requirements until they had to deliver them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-8014081655153637671?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/8014081655153637671/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=8014081655153637671' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8014081655153637671'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8014081655153637671'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2011/03/i-hate-agile-development.html' title='I hate Agile Development'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-5338000120519378414</id><published>2010-05-25T19:52:00.003+01:00</published><updated>2010-05-25T20:08:43.614+01:00</updated><title type='text'>Date Columns at the end of Indexes</title><content type='html'>A generally good rule of thumb is that date columns should appear at the end of a multi-column index and not at the start or the middle.  Why?  Because generally date columns are used in range comparisons in queries a lot more often than equality comparisons.  The effect of putting such a date column in the middle of an index is to distribute values of the following columns across the different dates within the index.&lt;br /&gt;&lt;br /&gt;What this means is that if you have an index on 3 columns - A, D, Z say - where D is a date and A and Z are not (either numeric or a character string) and you execute a query constraining on all 3 columns but with a range constraint on D, the database server will have to read and check a great many index entries&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;&lt;br /&gt;select ... from T where A = 123 and D &gt; to_date ('20090101', 'YYYYMMDD') and Z = 456 ;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;An index is a tree like structure organized by the columns in it.  So at the highest levels we branch out for different values of A.  Then lower down, for each value of A it branches out for each value of D within that particular value of A.  And underneath that we have each value of Z that occurs for each value of D.  The value 456 of Z may only occur a few times in the whole table, but it could occur on any value date of D, and so could be in different parts of the index.&lt;br /&gt;&lt;br /&gt;This means that when executing the query, the database server will traverse the index, find the branch for A with the value 123 and then all the values for D underneath it.  Due to the date range constraint, it will have to check many different values of D to see which ones also have Z with a value of 456.&lt;br /&gt;&lt;br /&gt;If the index was on A, Z, D instead, then fewer index entries would need to be checked when executing the query.  First it would traverse the index to 123 for A.  Then it would traverse down the next level to 456 for Z.  Then it would traverse down the next level to D and all values greater than the specified date (1/1/2009).  At this point &lt;span style="font-weight: bold;"&gt;all&lt;/span&gt; such entries in the index match the 3 constraints, and so can be retrieved.&lt;br /&gt;&lt;br /&gt;This alternate index leads to far fewer index entries being read and checked when executing this particular query.&lt;br /&gt;&lt;br /&gt;I am not saying that dates should always go at the end of indexes.  If Z was not constrained in the query, only A and D, then the second index would not be much use, and the first index would be much better.  There may also be cases where dates are used with equality constraints rather than ranges.  But the general rule is still a good one.  Put date columns at the end of an index, &lt;span style="font-style: italic;"&gt;unless&lt;/span&gt; you know the queries being executed would benefit from that date column appearing earlier in the index.  As ever the best indexes are those that are most useful to the queries you execute and cover the referenced columns.&lt;br /&gt;&lt;br /&gt;An example of this I have just come across resulted in a query execution time coming down from 10,000 ms (10 seconds) by an order of magnitude to 20 ms - a major improvement I think you'll agree.  To me this just shows the greater "efficiency" of the index - far fewer index blocks need to be visited to satisfy the query, and those fewer blocks are more likely to be cached in memory too.  Hence the orders of magnitude reduction in the elapsed time of the query.&lt;br /&gt;&lt;br /&gt;Furthermore this query is executed in a loop some 200+ times within the application I was looking at.  So the net effect is not just 10 seconds once down to less than 1 second once, but really 2,000 seconds or 33 minutes down to less than 33 seconds across the overall job each day it is run.  A worthwhile improvement from having an alternate index with the date column at the end of it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-5338000120519378414?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/5338000120519378414/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=5338000120519378414' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/5338000120519378414'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/5338000120519378414'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2010/05/date-columns-at-end-of-indexes.html' title='Date Columns at the end of Indexes'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-1020296655325931462</id><published>2010-03-29T18:47:00.002+01:00</published><updated>2010-03-29T18:52:45.251+01:00</updated><title type='text'>DTrace - Great if you can get it to work at all</title><content type='html'>On the face of it I think &lt;a href="http://www.sun.com/bigadmin/content/dtrace/index.jsp"&gt;DTrace&lt;/a&gt; is a phenomenal piece of technology from Sun.  That on the fly you can turn on instrumentation on all kernel events and system calls, and write your own scripts to count calls, sum elapsed time, or look at which files had the most I/Os to them, and much more.  All implemented with no external change to Solaris itself - all existing commands and utilities run the same - and minimal impact on performance when not enabled.  Truly amazing.&lt;br /&gt;&lt;br /&gt;For the record I should confess that I was a Sun employee when DTrace was initially announced.  However I was an Oracle Consultant and had nothing to do with Solaris kernel development or DTrace.  My viewpoint was and remains that of a user of Solaris, and I did not gain any inside knowledge about DTrace one way or another.&lt;br /&gt;&lt;br /&gt;I did see the demonstrations that are commonly repeated, where they start off a rogue process and then use DTrace scripts to see what is going on, and which process is causing all the extra work on the system.  It looks so simple and so direct - a few iterations and you have identified the process in question and got some of the details of what it is doing.&lt;br /&gt;&lt;br /&gt;I have been waiting since then to use DTrace in anger on some real life investigations.  It has taken some time for Solaris 10 to be fully rolled out by most organizations. And that opportunity just occurred the other day.  A chance to use DTrace to see what an application was doing, and in which functions it was spending most of its time.&lt;br /&gt;&lt;br /&gt;A quick peruse of the Web finds the DTrace Toolkit, in which I find something called &lt;span style="font-style:italic;"&gt;dappprof.d&lt;/span&gt; which does something similar to what I want - elapsed time per function call.  I write my own D script for what I want, using similar probes and timestamp calculations, and am ready to look inside this application.  Problem solved?  No!&lt;br /&gt;&lt;br /&gt;Unfortunately, DTrace does not work!  Or at least that is what seems to be happening when I try and use it.  Which is why I am writing this blog post. Sorry to be so negative, but either it does what it is supposed to do or it does not.  And unfortunately it simply does not do what it is supposed to.  And I've just spent 2 days banging my head against a brick wall trying to get DTrace to do what it should be doing.  The good news is that there is an "answer" to the problem I had, which I'll get to later on.  But this could all have been avoided if some of the "details" of DTrace had been done properly by Sun - like useful and helpful error messages, and configuration options that made sense.&lt;br /&gt;&lt;br /&gt;First, unlike other software products which tell you WHY something went wrong so that you can fix it, DTrace simply says "I'm not going to do that" and stops.  Shades of HAL from 2001: A Space Odyssey spring to mind.  Generally whenever something goes wrong with Oracle or a programming language compiler I get a specific message - "syntax error at line xxx", "missing keyword expected", "out of memory".  With DTrace you get a message that amounts to "I did not like that so I am stopping" one way or another. &lt;br /&gt;&lt;br /&gt;My first error message was:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;dtrace: processing aborted: Abort due to systemic unresponsiveness&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;I managed to find a &lt;a href="http://blogs.sun.com/jonh/entry/the_dtrace_deadman_mechanism"&gt;blog entry by Jon Haslam of Sun&lt;/a&gt;, that says that there is nothing you can do about this, other than disabling certain protections within DTrace.  And generally speaking those protections are there for a reason, so it is not recommended to disable them.&lt;br /&gt;&lt;br /&gt;Okay, maybe I was sampling too often, or matched too many probes, or something.  As I say, DTrace never tells you WHAT caused the failure.  So I cut down my script and instead of 20,000 probes I now match less than 100.  Just 33 in fact. Does it run properly?  No!  Still the same problem:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;dtrace: script './gpsmintrace.d' matched 33 probes&lt;br /&gt;dtrace: processing aborted: Abort due to systemic unresponsiveness&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;The script only has 4 probes as far as I am concerned, even though DTrace ends up matching more than this:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;pid$target:a.out::entry&lt;br /&gt;pid$target:a.out::return&lt;br /&gt;profile:::tick-5sec&lt;br /&gt;dtrace:::END&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;So the entry and exit points of functions in the program file itself (a.out), every 5 seconds, and finally at the end of the script when it stops.  How can these be causing "systemic unresponsiveness?".&lt;br /&gt;&lt;br /&gt;I try different combinations - a different process, take out the tick-5sec probe, name the functions to match so only "11 probes" matched - and always the same result:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;dtrace: processing aborted: Abort due to systemic unresponsiveness&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;At this point the only conclusion I can come to is that DTrace is useless.  What else can I conclude?  Every time I run my minimal D script it sits there with no output messages and then aborts.  Why do I not see the output from the 5 second ticks?  Why does it never work?&lt;br /&gt;&lt;br /&gt;As you can imagine I am just about ready to give up on DTrace - forever.  I have been through the trouble of designing and writing an initial D script to help investigate my application behaviour, and have requested the appropriate privileges from the system administrators - dtrace_user and dtrace_proc - but have achieved nothing from 2 days of continuous trying.  After 2 days I have got zero information out of DTrace, useful or not.  &lt;br /&gt;&lt;br /&gt;I'm ready to throw in the towel and give up on DTrace as another piece of great but fundamentally flawed Sun technology when I have one more thought.  Why are the 5 second tick probes never firing?  Why do I never see any output from them?  What if the tick probes never fired?  Wouldn't my running totals in the aggregations get very large?  Could this be causing some kind of overflow and DTrace to simply give up?&lt;br /&gt;&lt;br /&gt;A quick Google on the terms "dtrace tick probe privilege" turns up an entry at http://www.mail-archive.com/dtrace-discuss@opensolaris.org/msg02621.html with the same symptoms - the tick probe never fires.  And the solution is confirmed in the reply http://www.mail-archive.com/dtrace-discuss@opensolaris.org/msg02622.html - you need the "dtrace_kernel" privilege for the tick probes to fire and output anything.  So I make another request to the system administrators for this privilege, and voila! My D script now works as it should do.&lt;br /&gt;&lt;br /&gt;So it seems that to do anything useful with DTrace you will need the maximum privilege level of dtrace_kernel, regardless of what the manual tells you differently.  Yes, to "use" the tick probe of the profile provider in a D script you need the "dtrace_user" privilege.  However, such a defined tick probe will never fire unless you have the "dtrace_kernel" privilege.  The "dtrace_user" privilege gives you the "right" to use a tick probe, but not the "access" to that probe at run time.  You need the "dtrace_kernel" privilege too for the run time "access" to the tick probe.&lt;br /&gt;&lt;br /&gt;I've a feeling that this is extremely poorly designed and documented by Sun.  It is not documented in the standard DTrace manuals - I know I looked very closely at them on how to make the tick probe work.  So it must be a common problem that people run into time and time again on production systems when trying to work with the least necessary privileges.  Yet my Google search only found one explicit reference to this, on a forum posting by someone with the same problem.  All the other hits on "dtrace tick probe privilege" were standard descriptions of using probes in DTrace.  Nothing from Sun themselves on their own web sites.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-1020296655325931462?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/1020296655325931462/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=1020296655325931462' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/1020296655325931462'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/1020296655325931462'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2010/03/dtrace-great-if-you-can-get-it-to-work.html' title='DTrace - Great if you can get it to work at all'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-8744698508050088727</id><published>2010-03-10T19:17:00.002Z</published><updated>2010-03-10T19:28:16.659Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='install'/><category scheme='http://www.blogger.com/atom/ns#' term='10g'/><category scheme='http://www.blogger.com/atom/ns#' term='opensolaris'/><category scheme='http://www.blogger.com/atom/ns#' term='virtualbox'/><category scheme='http://www.blogger.com/atom/ns#' term='solaris'/><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>OpenSolaris is not Solaris</title><content type='html'>I've been playing with an installation of &lt;a href="http://www.opensolaris.org"&gt;OpenSolaris&lt;/a&gt; on my Intel x86 PC within a VirtualBox virtual machine. (VirtualBox is good, but I'll say more about it separately).  The good news is that I managed to install OpenSolaris (2009.06) easily enough and get logged into it for normal use.  The bad news is that OpenSolaris is not Solaris, which introduces a whole bunch of incompatability issues.  And given that I have done this in order to install and test Oracle, it creates a whole series of problems that would not occur if it was really Solaris.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;What is OpenSolaris?&lt;/span&gt;&lt;br /&gt;The simplistic answer is that OpenSolaris is an open sourced subset of Solaris i.e. the subset that Sun owned the rights to and could open source.  But that is not the real truth.  The real truth is that OpenSolaris is the open sourced next generation version of Solaris i.e. Solaris 11.  And indeed the version number of OpenSolaris is 11 (or 5.11 technically speaking).  And Sun have changed a lot of administration side things between Solaris 10 and 11.  And they are still changing things, as OpenSolaris is a work in progress by Sun.  A lot can change even between releases of OpenSolaris.  So remember that OpenSolaris is not Solaris.  There is no mention yet of any level of compatability between Solaris 10 and OpenSolaris.&lt;br /&gt;&lt;br /&gt;Clearly OpenSolaris shares the same kernel core as Solaris 10.  But outside of that, in terms of the various applications that can be installed, things change.  A lot.  Solaris has always had its own "packages" installed via the "pkgadd" command.  However it lacked a GUI (at least I was not aware of any before now), and did not handle dependencies between packages (A uses things from B, so B should be installed first).  You also had to obtain the package distribution file yourself somehow - download or CD media - before it could be installed.&lt;br /&gt;&lt;br /&gt;The major Linux distributions have all addressed these issues, so that installing most application software is just a case of firing up the GUI utility, clicking on the application name and then the "Install" button.  OpenSolaris has now tried to do the same thing, via its Image Packaging System (IPS), which seems to share the same concepts as Synaptic in Ubuntu.  There are repositories of packages, and the packages include lists of dependencies on other packages.  Now you can install a package via a few simple clicks, and the package distribution is downloaded from the repository.&lt;br /&gt;&lt;br /&gt;Nice.  Except ... That there is only one repository, being Sun's own repository of its packages included as part of Solaris.  So there is no third party software to choose from.  And Sun continues to name all its packages with a very short name and a prefix of "SUNW".  Would you know that "SUNWarc" were the "Lint Libraries"?  Or that the "Solaris Bundled tools" were in package "SUNWsprot", but that "SUNWbtool" was in fact the "CCS tools bundled with SunOS".  Not clear or straightforward at least.&lt;br /&gt;&lt;br /&gt;Which leads to the second problem.  Even if you know that there is a package containing a program or utility you need, finding out what the name of the package is can be quite hard work.  There is not a lot of useful information on OpenSolaris yet that you can find easily via Google.  And Sun has provided very little documentation.&lt;br /&gt;&lt;br /&gt;Okay, I'll just use the existing Solaris 10 packages?  No!  OpenSolaris is not Solaris, and even though you could try to install a Solaris 10 package, and it might work, it is not supported and the results are not guaranteed.  This is because OpenSolaris is really Solaris 11, and things can change in the kernel and interface libraries.  So you should only use OpenSolaris packages, and not Solaris 10 ones.&lt;br /&gt;&lt;br /&gt;Are there any other repositories? Of any shape or form?  Yes.  There is one at Blastwave.  Can I use its packages?  Yes, but they are very old.  Clearly someone did port various Open Source to earlier releases of OpenSolaris, and these have been gathered together at Blastwave.  But when I went looking for some things, I could only find 3 year old versions from 2007.  In the end I downloaded the source code to the current version of the utility I needed and compiled it all myself.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Oracle 10gR2 Issues&lt;/span&gt;&lt;br /&gt;First there is the issue of required packages, and I've already discussed the new IPS on OpenSolaris.  For Oracle 10gR2, as per the Oracle 10gR2 on Solaris installation notes, you will need to install:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;    SUNWarc SUNWbtool SUNWhea SUNWlibm SUNWlibms SUNWsprot SUNWtoo&lt;/li&gt;&lt;/ul&gt;but not:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;    SUNWi1of SUNWi1cs SUNWi15cs SUNWxwfnt&lt;/li&gt;&lt;/ul&gt;This is because these are closed source font libraries, and Sun cannot open source them or include them in OpenSolaris under its license terms.  But these will not cause an issue, as other fonts will be automatically used instead.&lt;br /&gt;&lt;br /&gt;You also need to install SUNWmfrun, which is the Motif Runtime libraries.  Yes, the Oracle 10gR2 graphical utilities use Motif behind the scenes.  And Motif is no longer installed as standard on OpenSolaris.&lt;br /&gt;&lt;br /&gt;Then can we install Oracle 10gR2?  No, because the Oracle installer sees the operating system version as 11 (or rather 5.11 as given by "uname -r") and stops because it is not certified against it.  Which is not surprising, as you can only certify against officially released operating systems, and OpenSolaris is still a work in progress.  But we will assume that OpenSolaris (11) is fully compatible with Solaris 10, and use the "-IgnoreSysPrereqs" to make Oracle ignore this mismatch in the operating system version.&lt;br /&gt;&lt;br /&gt;Then the installer will fail with missing Java libraries e.g.&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;Exception java.lang.UnsatisfiedLinkError:&lt;br /&gt;/tmp/OraInstall&amp;lt;...&amp;gt;/jre/1.4.2/lib/i386/motif21/libmawt.so: &lt;br /&gt;ld.so.1: java: fatal: libdps.so.5: open failed: No such file or directory occurred..&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;This is because the Oracle installer is using Java 1.4.2 which is shipped as part of the Oracle 10gR2 distribution itself, but OpenSolaris does not have the right bits for backwards compatability with this, hence the missing library message.  Again we can use an installer command line option to tell it to use the current Java Runtime Environment on OpenSolaris (1.6), which will not be missing any bits it needs:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;./runInstaller -IgnoreSysPrereqs -jreLoc /usr/java/jre&lt;/li&gt;&lt;/ul&gt;And voila, an Oracle installer window eventually appears and we can proceed with the installation.&lt;br /&gt;&lt;br /&gt;Well, again no.  There were various errors about certain files not existing.  It seems that the installer cannot create symbolic links between files.  Using an Oracle installation on another system I was able to work out what these files should be linked to, and manually did this at the end of the installation. &lt;br /&gt;&lt;br /&gt;And finally, Yes, I was able to start Oracle and create a database.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-8744698508050088727?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/8744698508050088727/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=8744698508050088727' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8744698508050088727'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8744698508050088727'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2010/03/opensolaris-is-not-solaris.html' title='OpenSolaris is not Solaris'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-3392724552168808462</id><published>2010-03-02T19:31:00.000Z</published><updated>2010-03-02T19:34:22.992Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='queuing theory'/><title type='text'>Which Disk Is Faster?</title><content type='html'>Recently on a Solaris system I got the following disk statistics from “sar –d” (non-busy disks have been removed for clarity):&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;device  %busy   avque   r+w/s  blks/s  avwait  avserv&lt;br /&gt;sd3        82     0.9     134     879     0.0     6.6&lt;br /&gt;sd4        90     0.9     133     879     0.0     6.9&lt;br /&gt;sd5        73    17.1     777   12366     0.0    22.0&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;You can draw conflicting conclusions from this data:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;On the one hand disk “sd5” seems to be performing slower than disk “sd4”, at 22.0 milliseconds per disk I/O request from Solaris versus only 6.9 milliseconds for “sd4”.  &lt;/li&gt;&lt;li&gt;But on the other hand “sd5” is clearly doing more work than “sd4” – 777 I/Os per second versus 133 I/Os per seconds (6 times more), and 12,366 blocks per second versus 879 (14 times more).  Does this make “sd5” actually faster than “sd4” overall?&lt;/li&gt;&lt;/ul&gt;So which conclusion is right?  Which of the two disks is actually faster than the other for the individual disk I/Os themselves, ignoring any time spent queuing before being executed?  Clearly “sd5” will have longer queues than “sd4” because it is processing a far greater number of I/Os per second.  This is confirmed by the average queue length value reported by “sar” – only 0.9 for “sd4” (less than 1 I/O at a time), while it is 17.1 for “sd5”.&lt;br /&gt;&lt;br /&gt;I actually think that “sd5” is faster, given how many more disk I/Os it is doing per second, and the size of its average queue length.  How can I prove this one way or the other?  Well our old friend “Queuing Theory” can help, with its set of formulae describing how such things work.&lt;br /&gt;&lt;br /&gt;A key point to realise is that modern disks have internal queues, and will accept more than one request from the operating system at a time.  From the operating system’s perspective it can send a new I/O request to a disk before all the previous ones have finished.  From the disk’s perspective it has an internal queue in front of the real disk, and the real disk can still only do one I/O at a time.  We can see that this is the case in Solaris because the average queue length is 17.1 for “sd5”.  Also the average wait time is 0.0 for all disks, because there is no waiting or queuing within Solaris, which is what this measures.  Solaris was always able to immediately issue a new I/O request to the disk, and never exceeded any limit on concurrent requests to the disks.&lt;br /&gt;&lt;br /&gt;So although “sd5” looks slow at 22.0 milliseconds service time, this is the full service time measured by Solaris, which includes any queuing time within the disk device itself.  And with 17.1 concurrent requests on average, this could be quite a large queue, meaning that the 22.0 milliseconds reported by Solaris could include a significant amount of time waiting within the disk before the I/O was actually performed and the data returned.&lt;br /&gt;&lt;br /&gt;Queuing Theory can help us “look inside the disk device” and see how big its queue is on average, and what the “real service time” of an I/O is within the disk when it performs it.&lt;br /&gt;&lt;br /&gt;What do we know about the disks behaviour?&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Average completed requests per second are 777 for sd5 and 133 for sd4.&lt;/li&gt;&lt;li&gt;External service times are 22.0 ms for sd5 and 6.9 ms for sd4&lt;/li&gt;&lt;li&gt;Average requests in the disk device are 17.1 for sd5, and 0.9 for sd4&lt;/li&gt;&lt;/ul&gt;Even from this we should be able to see that 22.0 ms does not make sense for individual disk access times on sd5, because it managed to do 777 of them per second.  Assuming that the disk was 100% busy during a one second interval, if it did 777 I/Os then each must have taken less than 1 / 777 = 0.001287 = 1.287 milliseconds.  Which further confirms that the 22.0 ms reported by Solaris is mainly queuing time within the disk device itself.&lt;br /&gt;&lt;br /&gt;We would like to know the actual service time within the disk, separate from the queue time within the disk.  We can use a formula from Queuing Theory for this:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;S = R / (1 + N)&lt;/li&gt;&lt;/ul&gt;For this we need to know the response time from outside the disk i.e. from Solaris, and the number of overlapping concurrent requests on average (queue length) submitted to the disk.  We have both of these from sar:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;sd5: S = 22.0 / (1 + 17.1) = 22.0 / 18.1 = 1.215 milliseconds&lt;/li&gt;&lt;li&gt;sd4: S = 6.9 / (1 + 0.9) = 6.9 / 1.9 = 3.632 milliseconds&lt;/li&gt;&lt;/ul&gt;There we have it – disk sd5 has a far lower true service time than that of sd4.  sd5 is actually almost 3 times faster than sd4 at performing each individual disk access!  It is just the large queue of outstanding requests that causes the total disk access time as measured from Solaris to be so high at 22.0.  We can now see that on average each disk I/O to sd5 spends (22.0 – 1.215) or 20.785 milliseconds waiting within the internal disk device queue before it is then executed, which then takes only 1.215 milliseconds.&lt;br /&gt;&lt;br /&gt;In terms of the utilisation of each disk, the Queuing Theory formula is U = X * S, so:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;sd5: U = 777 * 0.001215 = 0.944 = 94.4%&lt;/li&gt;&lt;li&gt;sd4: U = 133 * 0.003632 = 0.483 = 48.3%&lt;/li&gt;&lt;/ul&gt;This indicates that disk “sd5” is operating at a high utilisation level, and any increases in utilisation will lead to exponential increases in response time (service time as measured by Solaris).  Disk “sd4” however is at just less than 50% utilisation, which correlates with the average queue length being just under 1 (0.9).&lt;br /&gt;&lt;br /&gt;In this scenario I would suggest trying to move some of the I/O workload off “sd5” and onto some other disks somehow.  Any reduction in the workload on “sd5” would dramatically reduce the number of concurrent requests (average queue length) and so dramatically reduce the service time as measured by Solaris.  In other words, “sd5” is a hot and busy disk.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-3392724552168808462?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/3392724552168808462/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=3392724552168808462' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/3392724552168808462'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/3392724552168808462'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2010/03/which-disk-is-faster.html' title='Which Disk Is Faster?'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-2335718752143652903</id><published>2009-12-16T10:51:00.002Z</published><updated>2009-12-16T10:57:35.905Z</updated><title type='text'>Correlated Delete &amp; Update</title><content type='html'>Sometimes you want to delete or update rows in one table based on the existence of matching rows in another table.  Examples include building up a list of records to be deleted in another table, or new data values being first loaded into a staging table before updating the main data set.  I call this a Correlated Delete or Update, because the rows affected in the main table must correlate with matching rows in the other table.  There are different ways we can write such correlated actions in Oracle.  Unfortunately one way results in a very poor execution plan and incredibly bad performance – several hours estimated in one example – while the other way might take only a second or two.  So it can be very useful to know which is which, and why.&lt;br /&gt;&lt;br /&gt;This scenario is easiest seen with a Delete i.e. Delete from tableA where matching rows exist in tableB.  We might naturally think of this as being a kind of join between tableA and tableB, assuming tableB has a foreign key to the primary key of tableA.  It turns out that Sybase has implemented an extension to its DELETE and UPDATE statements that lets us use join syntax to specify this kind of correlated action, with an additional ‘from’ clause.  In Sybase our delete would be:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;delete tableA from tableA, tableB where tableA.pkey = tableB.fkey&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;Unfortunately this is an extension to the ANSI SQL syntax, and Oracle does not have an equivalent syntax.  So in Oracle we can only refer to one table in the main table, and need to use sub-queries to refer to the other tables.  One way I came across the other day to do this is:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;delete from tableA&lt;br /&gt;where exists (select 1 from tableB where tableA.pkey = tableB.fkey)&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;On the face of it this is correct - we only delete rows in tableA that have a match in tableB.  Unfortunately it suffers from terrible performance.  In the case I came across I saw that Oracle would take 3 hours to scan tableA (table names changed from their original ones):&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;-------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name      | Rows  | Bytes | Cost (%CPU)| Time     |&lt;br /&gt;-------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | DELETE STATEMENT              |           |     1 |    33 |  1075K  (2)| 03:35:07 |&lt;br /&gt;|   1 |  DELETE                       | A         |       |       |            |          |&lt;br /&gt;|*  2 |   FILTER                      |           |       |       |            |          |&lt;br /&gt;|   3 |    TABLE ACCESS FULL          | A         |   320M|     9G|  1072K  (2)| 03:34:30 |&lt;br /&gt;|*  4 |    TABLE ACCESS BY INDEX ROWID| B         |     1 |    82 |     0   (0)| 00:00:01 |&lt;br /&gt;|*  5 |     INDEX RANGE SCAN          | X1_B      |     1 |       |     0   (0)| 00:00:01 |&lt;br /&gt;-------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;This is because the sub-query is correlated – it refers to the outer table (tableA) and so must be executed for each row of tableA from the outer, main query.  This results in a full table scan of tableA, and then a join to tableB for the correlated sub-query.  If tableA is large and tableB is small with a list of rows to delete, then the performance is very bad indeed.&lt;br /&gt;&lt;br /&gt;The solution is to rewrite the sub-query so that it is not correlated, and we can do that using IN i.e. where a row in tableA has a matching row IN tableB.  The delete then becomes:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;delete from tableA where (tableA.pkey) IN (select tableB.fkey from tableB)&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;The meaning of this is exactly the same as the other delete, but now the sub-query is not correlated.  Oracle can now choose an execution plan that scans tableB to produce a set of rows, and join to tableA using an index.  This executes much faster, as you would expect. &lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;&lt;br /&gt;------------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                     | Name     | Rows  | Bytes | Cost (%CPU)| Time     |&lt;br /&gt;------------------------------------------------------------------------------------------&lt;br /&gt;|   0 | DELETE STATEMENT              |          |     1 |   113 |     6  (17)| 00:00:01 |&lt;br /&gt;|   1 |  DELETE                       | A        |       |       |            |          |&lt;br /&gt;|*  2 |   FILTER                      |          |       |       |            |          |&lt;br /&gt;|   3 |    NESTED LOOPS               |          |     1 |   113 |     6  (17)| 00:00:01 |&lt;br /&gt;|   4 |     SORT UNIQUE               |          |     1 |    80 |     2   (0)| 00:00:01 |&lt;br /&gt;|*  5 |      TABLE ACCESS FULL        | B        |     1 |    80 |     2   (0)| 00:00:01 |&lt;br /&gt;|*  6 |     INDEX RANGE SCAN          | X1_A     |     1 |    33 |     3   (0)| 00:00:01 |&lt;br /&gt;------------------------------------------------------------------------------------------&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;As you can see the estimated cost has reduced from over 1 million to just 6, and the estimated elapsed time from 3 and a half hours to just one second!  A major improvement.  Obviously this improvement is down to the fact that tableB is at least 1,000 times smaller than tableA.&lt;br /&gt;&lt;br /&gt;Either way, I believe that using IN is a better way of phrasing the sub-query, because it is clearer to Oracle which way round the relationship is (A depends on B, not vice versa), and gives the optimizer more flexibility of how to execute the SQL statement.  If tableB were big enough with respect to tableA, then there is no reason why the optimizer could not go back to the original execution plan – scanning A and joining to B.&lt;br /&gt;&lt;br /&gt;Where it really makes a difference is where you have other conditions on tableB within the sub-query – not just the foreign key join to tableA.  When using EXISTS, Oracle will ignore extra indexes on tableB and treat the correlated sub-query in the same way – check tableB for each row in tableA.  When using IN, Oracle can take advantage of extra indexes on tableB if they help with other conditions on columns of tableB in its sub-query.  Thus it can use an index for the initial access to data in tableB, and then use the primary key index into tableA.  This results in efficient execution plans for the DELETE.&lt;br /&gt;&lt;br /&gt;The problem also occurs with Updates, but is compounded by needing to refer to the same table twice within the update statement.  Again, the Sybase syntax is much simpler and more straightforward:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;&lt;br /&gt;update tableA&lt;br /&gt;set tableA.column1 = tableB.column1, tableA.column2 = tableB.column2&lt;br /&gt;from tableA, tableB&lt;br /&gt;where tableA.pkey = tableB.fkey&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;In Oracle we need to use a sub-query for tableB in the WHERE clause in the same way as for the Delete statement, and we also need a sub-query for tableB in the SET clause so that it is updated to the values in the matching row in tableB.  This is important – without both sub-queries we would either update the wrong rows (WHERE clause problem) or update them to the wrong values (SET clause problem).  The Oracle equivalent, using the same rewrite as before is:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;&lt;br /&gt;update tableA&lt;br /&gt;set (column1, column2) = (select tableB.column1, tableB.column2&lt;br /&gt;from tableB where tableA.pkey = tableB.fkey)&lt;br /&gt;where pkey in (select fkey from tableB)&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;As already mentioned this particular update will perform well when executed.  Typically Oracle will scan tableB if it is much smaller for the sub-query, then join to tableA on its key columns assuming there is an index on them, and then join back to tableB to get the values to update the columns to.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-2335718752143652903?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/2335718752143652903/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=2335718752143652903' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/2335718752143652903'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/2335718752143652903'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/12/correlated-delete-update.html' title='Correlated Delete &amp; Update'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-5744278898102818419</id><published>2009-10-19T19:55:00.002+01:00</published><updated>2009-10-19T20:11:47.542+01:00</updated><title type='text'>Solving Database Design in Development</title><content type='html'>Previously I have looked at the challenges involved in being able to support a changing database design in an application development environment, which were:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Support change at any time &lt;/li&gt;&lt;li&gt;Ensure database designed and modelled properly &lt;/li&gt;&lt;li&gt;Record each individual change, to enable metamorphosis of older databases &lt;/li&gt;&lt;li&gt;Allow multiple versions or branches of the Database Design to exist, and be changed individually or collectively &lt;/li&gt;&lt;/ol&gt;Although I started off on this problem of database structure change from the context of the need for it when using Agile development methods, the actual problem is not unique or specific to Agile. It is a generic problem to the development of any application that involves the use of a relational database to store its data in.&lt;br /&gt;&lt;br /&gt;What I now want to do is list the key features needed in a solution to this problem which, if all assembled together,  would deliver a viable solution to this problem.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;With Application Source Code we only care about the final state it ends up in.  But with a Database Design we care about each individual change, and what gets changed. We need to record each of these Databse Changes individually, so that they can be applied to other instances of that database, as well as recording the final state of the database design itself in the Database Model. &lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Changes are physically made to a Database Instance by executing appropriate SQL statements.  This is the only way to make a change to an existing database instance. &lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Each "Database Change" should be an atomic change to a database - it either completes successfully or not at all.  Changes cannot be partially implemented on a database instance.  &lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;The Database Changes must be executed in the correct sequence on each database instance, for repeatability and due to any dependencies between the changes. &lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Changes to the Database Design should be formally Requested, and these requests recorded somewhere to provide an audit trail - Who, When, What Changed, Where in the Database. &lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Changes must be reviewed and approved by the Data Modeller, to ensure a good, scalable design.  The Data Modeller is responsible for updating the Database Model - the record of the Database Design - and producing the SQL statement that implements that change.  Some of this could be done automatically using appropriate tools, or manually.  All that matters is that it does happen. &lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;The Database Designer or Data Modelling role now becomes a part time role during the whole lifecycle of the application's development, instead of being a full time role only in the first phase of its development.  It is now likely that all development cycles will involve the Database Designer, who becomes a central and critical member of the development team.  The role itself, however, could be fulfilled by someone who has other roles too within the development cycle, or shared amongst a number of people, because it is not a full time role.  This could be someone who is also a developer or a database administrator.  &lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Each Database Change is given its own unique Identifier, which must be globally unique over all the other Database Changes.  This identifies the particular Database Change, regardless of which Database Instances it is applied to, or which versions of the Application it appears in. &lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Changes to a particular Database Instance must be recorded within the Database Instance itself.  This is the only way to avoid applying the same changes twice to a Database Instance.  This implies some form of Change History table within each Database Instance. &lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;This Change History table needs to record the Change Identifer for those changes applied, and the date and time when it was done.  Other data could also be recorded, but these two are the minimum required. &lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;The set of changes should be stored in a single file, often termed the Change Log.  Each entry in this will include the Change Identifier and the SQL statement that implements that change.  All necessary changes can be easily located and sequenced this way. &lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;This Change Log needs to be well structured, so that a program can read and decode the Changes in it and execute the SQL statements for changes not yet applied to a given Database Instance.  This is probably best done using XML.  &lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;There must be separate instances of the Database Model / Database Design and the Change Log file in each version / branch of the Application source code.  This allows changes to be made independently to each branch of the Application. &lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;In reality SQL statements are often specific to one database product, such as Oracle, Sybase, SQL*Server or MySQL.  The Change Log will need to record the separate SQL statements for each supported database product for each Database Change.  Thus each Change will have an Identifier, and a SQL statement per database product, clearly labelled with the database product name. &lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;The whole solution is brought together in a program that is run during an Application Upgrade.  It opens the Change Log XML file and reads in each Change.  If that Change has not yet been applied to this particular Database Instance, then the corresponding SQL statement is executed and a record inserted into the Change History metadata table. &lt;/li&gt;&lt;/ul&gt;This is my interpretation of how to solve the problem of a changing database design during application development.  But it turns out of course that others have been through exactly the same process before me and come up with exactly the same answers.  And in fact, once I had thoroughly understood the nature of the problem, I could immediately see that these other people too had understood it and had arrived at pretty much the same conclusions I would.  So although I did work through the problem from first principles myself, simply because that is the best way for me to fully understand a topic and appreciate whether any "solution" is right or not, I have no doubt borrowed from some of the material I have read in the way that I have described this solution here.&lt;br /&gt;&lt;br /&gt;I still find some of the descriptions I found on the Web around Database Design, Changes and Agile Development unclear around this particular problem of implementing changes to a database design during application development.  Quite a lot discuss the need to change the database design, and how to model different types of changes (refactoring is often mentioned).  But almost none discuss how such changes get implemented in real deployed databases, and how you maintain a database over the long term with a series of changes to the application that uses it.  None of them covered enough of it at once to leave me feeling that I had a real solution I could go out and apply.  Hence my need to work through the problem from first principles for myself, to get the full picture as it were.  &lt;br /&gt;&lt;br /&gt;Again, one article that did help to clarify the nature of the problem a lot for me was &lt;a href="http://www.hexsw.com/Products/Components/DataConstructor/RethinkingAgilityInDatabases.aspx?FileName=01.EvolutionAndDataStructures.BlogItem&amp;amp;SinglePage=true"&gt;Rethinking Agility in Databases: Evolution&lt;/a&gt; from Hexagon Software.  This brought home the message that Databases must be treated differently to Applications in how you change them.  &lt;br /&gt;&lt;br /&gt;And the Change Log file of SQL statements was explicitly mentioned by Peter Schuh and Pramod Sadalage, who have put this into practice themselves on large projects.  See &lt;a href="http://www.agilealliance.org/system/article/file/924/file.pdf"&gt;Agility and the Database&lt;/a&gt; by Peter Schuh for instance.&lt;br /&gt;&lt;br /&gt;So although I have not invented anything new here that has not already been described by other people elsewhere, in one form or another, I have tried to bring together all of the necessary ingredients for a solution in a coherent manner.  As I said, this was my way of working through the database design problem to arrive at a solution that I fully understood and felt able to go out and apply in the real world.&lt;br /&gt;&lt;br /&gt;I may revisit this again and try and outline a working solution to this i.e. what would need to be delivered to implement the kind of solution I have described here.  Anybody reading this and want to know more?&lt;br /&gt;&lt;br /&gt;&lt;div id="ui-datepicker-div" style="display: none;"&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-5744278898102818419?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/5744278898102818419/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=5744278898102818419' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/5744278898102818419'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/5744278898102818419'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/10/solving-database-design-in-development.html' title='Solving Database Design in Development'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-6258893050845376883</id><published>2009-10-05T11:45:00.002+01:00</published><updated>2009-10-05T12:00:49.720+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='agile database design'/><title type='text'>The Challenge of Agile Database Design</title><content type='html'>Previously I have said that for what I call Enterprise Applications &lt;a href="http://databaseperformance.blogspot.com/2009/08/data-modelling-why-it-is-important.html"&gt;data modelling is important&lt;/a&gt; because the data itself has meaning and value outside of the application that manipulates it, and that &lt;a href="http://databaseperformance.blogspot.com/2009/09/data-modelling-scalability.html"&gt;scalability requires a good database design&lt;/a&gt; because you cannot just add scalability on afterwards.  The Data Model or Database Design defines the structure of the database and the relationships between the data sets, and is part of the foundation on which the application is built.  And a good database design is essential to achieve a scalable application.  Which leads to the challenge - How do I go about designing a database in an Agile development project, when not all the requirements are known initially?  What techniques should I be using?&lt;br /&gt;&lt;br /&gt;Having read as much as I can find on this topic, I think I have a better understanding of the nature of this challenge.  And this is what I want to explain here - the nature of the challenge of Agile Database Design.&lt;br /&gt;&lt;br /&gt;First, we should restate the problem in a more positive way.  Rather than "when not all the requirements are known", we can say "How should I be designing an Agile database that will change in the future".  Change happens all the time, in one form or another, and is inevitable.  We need to embrace it, assume that our database design will change over time, and find ways to support this changing database.  This need for change over time is not unique to Agile development, and is in fact really a universal problem for applications and databases.&lt;br /&gt;&lt;br /&gt;Second, we need to accept that we cannot skip the design stage for the database in any way.  We must design the database properly - at least those parts of the database that we need to design now - and produce a correct model.  As I argued before, a good and correct database design is essential to a well performing application, and to the integrity of the data itself. &lt;br /&gt;&lt;br /&gt;The outcome of the database design is documentation on the structure of the database, often termed the Data Model.  There are many tools that can be used to help you design your database and record the details of the model.  Any such tool chosen should enable and support small and frequent changes to the data model, as this is a major requirement of Agile development.  But you could also use tools as simple as a spreadsheet and a set of diagrams.&lt;br /&gt;&lt;br /&gt;Third, a Database Design is not the same as Application Source Code.  There are similarities, but they are actually different beasts.  Both act as Blueprints for a thing that can be built - an Instance of that Design.  And both can be changed over time as needed.  The difference is that when the Source Code to an Application changes, we rebuild the Application completely, typically compiling all source code files.  We have produced a new instance of the Application Program, as a next generation instance of it. This is an example of "Evolution": the Blueprint changes, and a brand new instance is created using it.  Existing instances are not modified, but instead "replaced" by the newly created instance.&lt;br /&gt;&lt;br /&gt;Databases are the opposite.  Changes must be applied "in place" directly to each Database Instance (a real database on a computer system), to modify it into the latest database design.  Such an in place changing in the structure of a thing is termed "Metamorphosis", and is quite different from "Evolution".&lt;br /&gt;&lt;br /&gt;While I have appreciated for some time that there is a difference in type between an Application's Source Code and a Database's Design, I read about the explicit nature and form of this difference (Evolution versus Metamorphosis) in an article on &lt;a href="http://www.hexsw.com/Products/Components/DataConstructor/RethinkingAgilityInDatabases.aspx?FileName=01.EvolutionAndDataStructures.BlogItem&amp;amp;SinglePage=true"&gt;Rethinking Agility in Databases: Evolution&lt;/a&gt; from Hexagon Software.  All credit for this distinction between them and the terminology goes to them.&lt;br /&gt;&lt;br /&gt;Application Source Code editing and maintenance methods will not work for a Database Design.  A "replace and rebuild" methodology cannot be used for databases, which need a change to the design to be applied "in place" instead to each instance of that database.&lt;br /&gt;&lt;br /&gt;Fourth, there may be multiple separate versions or branches of the Database Design to be maintained, as a result of the existence of separate branches of the Application Source Code.  It is common for Application Source Code to be branched when major releases are done, typically producing a new branch for support of that release, and a new branch for the next major release.  It is possible that the Database Design may need to change in different ways in different branches of the Application.  Likewise, the same change may need to be made to the Database Design in different branches - correcting a bug for instance.  We need a way to record each version of the Database Design separately to support this.&lt;br /&gt;&lt;br /&gt;These then form the Challenges of delivering a changing Database Design:&lt;br /&gt;&lt;br /&gt;1. Supporting change at any time to the Database Design&lt;br /&gt;&lt;br /&gt;2. Ensuring that the Database is Designed and Modelled properly&lt;br /&gt;&lt;br /&gt;3. Recording each individual change to the Database Design, so that older instances of that Database Design can metamorphose by having these changes applied to them&lt;br /&gt;&lt;br /&gt;4. Allowing multiple versions or branches of the Database Design to exist, and to be changed individually or collectively&lt;br /&gt;&lt;br /&gt;These challenges are not specific to Agile Development, and apply to any large enough application software development.  Addressing these challenges will provide a solution that could be used in any database oriented application development, whether using Agile development methods or not.&lt;br /&gt;&lt;br /&gt;In the next post I hope to start describing the outlines of what you would need in order to achieve what I call "Agile Database Design" that addresses these challenges.  And then subsequently how to meet this in a minimal way.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-6258893050845376883?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/6258893050845376883/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=6258893050845376883' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/6258893050845376883'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/6258893050845376883'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/10/challenge-of-agile-database-design.html' title='The Challenge of Agile Database Design'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-6249442513008122207</id><published>2009-09-21T16:34:00.001+01:00</published><updated>2009-09-21T16:39:12.607+01:00</updated><title type='text'>Data Modelling &amp; Scalability</title><content type='html'>For what I call Enterprise Class Applications, scalability is an important factor.  Scalability can occur in multiple directions - increases in user count, transaction volume, database size.  These can occur individually or in combination together. The application must perform well when any of these scales in size, and performance must not degrade severely.  For this to be true, scalability must be designed in to the application from the beginning.  You cannot simply add scalability onto an application later on, much like you cannot just add security on.  There is no magical sticking plaster that you can apply to an existing application and suddenly have it scale wonderfully.  If there was, then everyone would be doing it.&lt;br /&gt;&lt;br /&gt;No, scalability must be designed into the application and the way it works.  This leads to the corollary that only good designs scale well, and bad ones don't.  Skipping the database design phase in any application development can only result in a poor design, with associated poor scalability.  At some point as the workload increases on the application, performance will hit the dreaded knee and throughput will level off and response times will increase.&lt;br /&gt;&lt;br /&gt;Poor performance results from things like lack of primary and foreign keys, general purpose tables with a type field indicating which data fields are relevant, use of key fields that only have a few values ('Y' or 'N' values are typical of this), outer joins between tables where the child records may not exist.  Any of these can result in weak SQL when used in a query, and poor execution as a result.&lt;br /&gt;&lt;br /&gt;So if the scalability of your application is important to you, and it may not be important for everyone, then make sure you design your database properly rather than just implementing the first set of tables that comes into someones mind. And this design requires capturing information about all of the entities that need to be stored, and modelling their relationships both at a high, conceptual level and then at a detailed low, physical level.  &lt;br /&gt;&lt;br /&gt;To ensure all entities and their relationships are captured, clearly requires knowledge about all of the entities within the application.  But in an iterative development, as typified by Agile, not all of the entities are known at the beginning when the first version of the database design is needed.  So how does Agile deal with this?  It is the one thing lacking from the documentation I have seen on Agile so far - how do you produce a good, scalable database design when it all takes place in an iterative development environment and when the full requirements and data entities are not yet known?  &lt;br /&gt;&lt;br /&gt;I've heard of Agile projects delivering poor database designs from other database architects and administrators, and know of one specific one at the moment where no database skilled person has been involved at all.&lt;br /&gt;&lt;br /&gt;For me this is a key sticking point with what I have seen of Agile Development, especially the aspect of delaying decisions for as long as possible.  There seems to be this perception that the database design "can be finished off later on", after more of the application has been developed.  But when the database design becomes an issue and needs to be changed, it is too late, as the time and effort involved in changing and rewriting the application to reflect these database changes is too great.&lt;br /&gt;&lt;br /&gt;Given that a scalable application requires a good database design to store its data in, why is it that database design seems missing from most descriptions of Agile Development?  It may get mentioned every once in a while, in general lists of development tasks, but I have seen very little on the specifics of what I would term "Agile Database Design".&lt;br /&gt;&lt;br /&gt;My concern is that by ignoring database design within Agile Development, it will most of the time result in a poor database design, and so poor application scalability.&lt;br /&gt;&lt;br /&gt;Either way it seems to me that to ensure you do end up with a good, scalable database design, you need a Data Modeler within the development team, responsible for the database design.  All database changes should be routed through the Data Modeler, rather than a developer making them directly themselves.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-6249442513008122207?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/6249442513008122207/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=6249442513008122207' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/6249442513008122207'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/6249442513008122207'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/09/data-modelling-scalability.html' title='Data Modelling &amp; Scalability'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-6279167986844778935</id><published>2009-09-15T20:55:00.004+01:00</published><updated>2009-09-15T21:08:52.543+01:00</updated><title type='text'>Reported Execution Plan Anomalies</title><content type='html'>In doing some tests of the effects of different indexes on execution plans, I came across a small anomaly with the way the execution plan was being reported.  I was using the Autotrace feature of SQL*Plus to report the execution plan after a SQL query was executed.  I noticed that the costs reported for each step were not completely true, although the overall execution total cost was correct.&lt;br /&gt;&lt;br /&gt;The SQL query I was playing with was:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;&lt;br /&gt;select count (*) from (&lt;br /&gt; select t1.i1, t1.i2, t1.c1, t2.i1, t2.i2, t2.c1&lt;br /&gt; from t1 join t2 on t1.i3 = t2.i1&lt;br /&gt; where t1.i2 = 111&lt;br /&gt;   and t2.i4 = 11&lt;br /&gt;)&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;With indexes on the I1 columns in both tables, and the I3 column of table T1 (foreign key). Oracle will therefore scan T2 and join to T1 using the index on I3, as seen in this reported plan:&lt;br /&gt;&lt;pre class="source-code"&gt;&lt;code&gt;&lt;br /&gt;--------------------------------------------------------------------------------------&lt;br /&gt;| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |&lt;br /&gt;--------------------------------------------------------------------------------------&lt;br /&gt;|   0 | SELECT STATEMENT             |       |     1 |    18 |  1236   (1)| 00:00:15 |&lt;br /&gt;|   1 |  SORT AGGREGATE              |       |     1 |    18 |            |          |&lt;br /&gt;|*  2 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |     9 |   102   (0)| 00:00:02 |&lt;br /&gt;|   3 |    NESTED LOOPS              |       |    10 |   180 |  1236   (1)| 00:00:15 |&lt;br /&gt;|*  4 |     TABLE ACCESS FULL        | T2    |    20 |   180 |   195   (1)| 00:00:03 |&lt;br /&gt;|*  5 |     INDEX RANGE SCAN         | I2_T1 |   100 |       |     2   (0)| 00:00:01 |&lt;br /&gt;--------------------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;Predicate Information (identified by operation id):&lt;br /&gt;---------------------------------------------------&lt;br /&gt;&lt;br /&gt;  2 - filter("T1"."I2"=111)&lt;br /&gt;  4 - filter("T2"."I4"=11)&lt;br /&gt;  5 - access("T1"."I3"="T2"."I1")&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;As can be seen, the access to the data row of table T1 is performed after the join to T2 is complete i.e. outside and after the NESTED LOOP join has completed. However, the reported cost of 1236 for the Nested Loop is incorrect. It should be 236, based on the other reported values for cost and cardinality (row count).  The cost of a Nested Loop join is the cost of the inner, first table access, plus the cost of the outer, second table access multiplied by the number of rows retrieved from the inner, first table (cardinality):&lt;br /&gt;&lt;br /&gt;195 + (20 * 2) + CPU = 195 + 40 + 1 = 236.&lt;br /&gt;&lt;br /&gt;Likewise the row count of the T1 table access is incorrectly shown as 1, when it should be 10, the same as the Nested Loop count.  The cost of accessing data rows in T1 that match a single value in the I2_T1 index is about 100.  (This can be verified by looking at the values reported in the 10053 trace file for Clustering Factor and Filter Factor, or deriving these manually).  The cost for accessing all necessary data rows in T1 as a result of the Nested Loop is the single access cost times the cardinality of the number of times it is performed i.e. 100 * 10 = 1000.&lt;br /&gt;&lt;br /&gt;Thus the total cost for the two table join query is 236 (NL) + 1000 (T1 data) = 1236. And indeed the total cost of the query is correctly reported as this.  It is just the 2 inner steps of NESTED LOOPS and TABLE ACCESS BY INDEX ROWID that have their costs and row count incorrectly reported, causing this anomaly.&lt;br /&gt;&lt;br /&gt;I guess this anomaly is caused by the execution plan moving the table access to outside the nested loop, rather than inside where it most often occurs.  If the table access was instead inside the nested loop, between steps 4 and 5 as the parent of 5, then the reported costs would be correct.  Most probably the Optimizer arrived at the original plan, then modified the execution steps when it realised it could move the table access outside of the nested loop.  Unfortunately, it did not adjust the cost values for this, and so they are mis-reported.&lt;br /&gt;&lt;br /&gt;&lt;div id="ui-datepicker-div" style="display: none;"&gt;&lt;/div&gt;&lt;div id="ui-datepicker-div" style="display: none;"&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-6279167986844778935?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/6279167986844778935/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=6279167986844778935' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/6279167986844778935'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/6279167986844778935'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/09/reported-execution-plan-anomalies.html' title='Reported Execution Plan Anomalies'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-3211837740501861781</id><published>2009-09-07T11:43:00.003+01:00</published><updated>2009-09-07T12:10:34.534+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='index'/><title type='text'>Index Selectivity Analysis</title><content type='html'>Why does Oracle sometimes &lt;span style="font-weight: bold;"&gt;not&lt;/span&gt; use an index when executing a SQL query?  Even when the index includes most of the columns from a table explicitly referenced in the query?  The obvious answer is that using the index does not produce a lower cost than other access methods, such as a full table scan.  The Optimizer has calculated that the number of disk I/Os needed to use the index (depth of index plus matching leaf blocks) and then get the data blocks (number of matching rows) would be greater than the number of disk I/Os needed to do a full table scan with multi-block reads.&lt;br /&gt;&lt;br /&gt;In a case like this, such an index will almost never be used by Oracle even though the index includes columns directly referenced in the WHERE clause of a query. This is  because the number of estimated disk I/Os to use it will always be greater than the number of disk I/Os to do a full table scan.  There might be exceptions in cases where the data is very skewed and you have a histogram on the columns in the index, and you are selecting a value with a low occurrence count.  But generally such an index is unlikely to be used by any queries you run.&lt;br /&gt;&lt;br /&gt;Which raises the question, how can we find out which indexes in our database are selective enough to be used by a query before we actually run those queries? Or conversely, which indexes are not selective enough given the data in the database and will never be used?  It is a bit too late to find out when we are running a query and Oracle is doing a full table scan and taking a long time to do it.  Of course we could do an EXPLAIN PLAN of the query beforehand, but execution plans can change over time dependent on the volumes of data in a table, and the distribution of data values.&lt;br /&gt;&lt;br /&gt;We would like to have a query that we can run on a complete database, and which will tell us which indexes are not selective given the actual data in the tables.  This would flag weak indexes to us, regardless of which queries might be running i.e. indexes that are unlikely to be chosen by the Optimizer even if all columns in them are referenced in the query.&lt;br /&gt;&lt;br /&gt;How would such a query work?  Well, we know how the Optimizer costs the different access methods to data in a table (see previous posts on &lt;a href="http://databaseperformance.blogspot.com/2009/05/oracle-optimizer-plan-costing-full.html"&gt;Full Table Scans&lt;/a&gt; and &lt;a href="http://databaseperformance.blogspot.com/2009/05/optimizer-costing-of-index-scans.html"&gt;Index Scans&lt;/a&gt;).  We can calculate each of these for a table and an index, and see if the table scan cost is lower than the index access cost.  In fact, using a bit of maths we can manipulate the formula for these two costs together in a way that results in a more direct comparison.  If the maths bit is too much for you then just skip ahead to the final formula followed by the SQL to run it.&lt;br /&gt;&lt;br /&gt;The costs of each type of access are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;FTS Cost = (Blocks * mreadtim) / (MBRC * sreadtim)&lt;/li&gt;&lt;li&gt;Index Cost = Levels + (Leaf Blocks + Clustering Factor) * Filter Factor&lt;/li&gt;&lt;/ul&gt;And an index will be ignored when:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Index Cost &gt; Full Table Scan Cost&lt;/li&gt;&lt;/ul&gt;Assuming that the table is large, then the number of blocks in the index will be far greater than the depth or number of levels in the index tree, so we can ignore this. This gives the relationship between the two that an index will be ignored when (approximately):&lt;br /&gt;&lt;ul&gt;&lt;li&gt;(Leaf Blocks + Clustering Factor) * Filter Factor &gt; (Blocks * mreadtim) / (MBRC * sreadtim)&lt;/li&gt;&lt;/ul&gt;Dividing both sides gives:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Filter Factor &gt; (Blocks * mreadtim) / (MBRC * sreadtim * (Leaf Blocks + Clustering Factor))&lt;/li&gt;&lt;/ul&gt;Now when there is no histogram on the columns or no data skew then the Filter Factor will be the inverse of the Number of Distinct Values or Keys i.e.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Filter Factor = 1 / Distinct Keys&lt;/li&gt;&lt;/ul&gt;Inverting both sides, and the relationship too gives us&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Distinct Keys &lt; (MBRC * sreadtim * (Leaf Blocks + Clustering Factor)) / (Blocks * mreadtim)&lt;/li&gt;&lt;/ul&gt;All of these values are stored in the Data Dictionary of the database in one place or another.  So we can write a single SQL query that can calculate both sides of the comparison directly from the database itself.  Note that this assumes that the statistics on your tables and indexes are up to date, and reflect the true data distributions in the tables.&lt;br /&gt;&lt;br /&gt;To make some things easier we can group together values that are table and index independent, finally giving us that for the index to be ignored the following must be true:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Distinct Keys &lt; (MBRC * sreadtim / mreadtim) * (Leaf Blocks + Clustering Factor) / Blocks&lt;/li&gt;&lt;/ul&gt;We can calculate the expression "(MBRC * sreadtim / mreadtim)" once, and then use this value with the other values on each index on each table in a database.  We can then list all the indexes that fail the test i.e. indexes that will be ignored because their cost of access is greater than a full table scan.&lt;br /&gt;&lt;br /&gt;For a system with no System Statistics gathered, which is actually the majority of cases, then the following calculations are used for the disk read times:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;sreadtim = ioseektim + (db_block_size / iotfrspeed)&lt;/li&gt;&lt;li&gt;mreadtim = ioseektim + (db_file_multiblock_read_count * db_block_size / iotfrspeed)&lt;/li&gt;&lt;/ul&gt;By default ioseektim will be 10 milliseconds and iotfrspeed will be 4 KB / millisecond.&lt;br /&gt;&lt;br /&gt;This means that the only non-table dependent values in the fully expanded formula are for db_block_size and db_file_multiblock_read_count.  These can only be obtained directly from the Oracle database if the user has SELECT permission on the dynamic performance views (V$PARAMETER specifically).  And generally this is not true for normal users.  To work around this, the SQL script below simply prompts the user to enter these two values.&lt;br /&gt;&lt;br /&gt;The SQL script then calculates the derived values, and then uses these on all indexes on all tables to list those that fail the test i.e. number of distinct values in the index is too low.&lt;br /&gt;&lt;br /&gt;As mentioned earlier, we can only ignore the depth of the index when the table is large enough, so the SQL includes an additional constraint on the number of blocks in a table, which is hardcoded at 100 blocks.  You could easily change this to another value if desired.&lt;br /&gt;&lt;br /&gt;This is a SQL*Plus script and must be run via SQL*Plus, as it uses SQL*Plus features such as substitution variables.  And remember that your statistics must be up to date on each table and index in the database.&lt;br /&gt;&lt;pre class="code"&gt;&lt;br /&gt;--&lt;br /&gt;-- Check which indexes on all tables are better or worse than a full table scan&lt;br /&gt;-- When the number of distinct values in a column is low, a full table scan&lt;br /&gt;-- is more efficient (less I/Os) than using an index.&lt;br /&gt;--&lt;br /&gt;set define on&lt;br /&gt;set verify off&lt;br /&gt;set heading off&lt;br /&gt;--&lt;br /&gt;prompt&lt;br /&gt;prompt Report to show which indexes are selective enough and which are not.&lt;br /&gt;prompt Non-selective indexes will be ignored and full table scans preferred.&lt;br /&gt;prompt&lt;br /&gt;&lt;br /&gt;define small_table_threshold = 100&lt;br /&gt;&lt;br /&gt;accept block_size prompt 'Enter database block size in bytes (e.g. 8192)    : '&lt;br /&gt;accept mbrc       prompt 'Enter db_file_multiblock_read_count value (e.g. 8): '&lt;br /&gt;&lt;br /&gt;prompt Using following database wide configuration settings:&lt;br /&gt;prompt :           Database Block size is &amp;amp;&amp;amp;block_size (bytes)&lt;br /&gt;prompt : db_file_multiblock_read_count is &amp;amp;&amp;amp;mbrc&lt;br /&gt;prompt :   Threshold for a small table is &amp;amp;&amp;amp;small_table_threshold (blocks)&lt;br /&gt;prompt :   (Smaller tables than this are ignored in this report)&lt;br /&gt;prompt : [If these are incorrect, edit this script and re-run]&lt;br /&gt;&lt;br /&gt;column sreadtim       new_value sreadtim&lt;br /&gt;column mreadtim       new_value mreadtim&lt;br /&gt;column effective_mbrc new_value effective_mbrc&lt;br /&gt;&lt;br /&gt;set termout off&lt;br /&gt;-- Hide from screen internal calculations of derived values&lt;br /&gt;select 10 + (&amp;amp;&amp;amp;block_size / 4096) sreadtim             from dual ;&lt;br /&gt;select 10 + (&amp;amp;&amp;amp;mbrc * &amp;amp;&amp;amp;block_size / 4096) mreadtim    from dual ;&lt;br /&gt;select &amp;amp;&amp;amp;mbrc * &amp;amp;&amp;amp;sreadtim / &amp;amp;&amp;amp;mreadtim effective_mbrc from dual ;&lt;br /&gt;set termout on&lt;br /&gt;&lt;br /&gt;prompt&lt;br /&gt;prompt Assuming that system wide statistics have these derived values:&lt;br /&gt;prompt : Single Block Read Time used is &amp;amp;&amp;amp;sreadtim (milliseconds)&lt;br /&gt;prompt :  Multi Block Read Time used is &amp;amp;&amp;amp;mreadtim (milliseconds)&lt;br /&gt;&lt;br /&gt;set heading on&lt;br /&gt;column table_name    heading 'Table'                     format a20&lt;br /&gt;column index_name    heading 'Index Name'                format a20&lt;br /&gt;column distinct_keys heading 'Number of|Distinct|Values' format 999,999&lt;br /&gt;column min_ndv       heading 'Min|NDV'                   format 999,999&lt;br /&gt;column selectivity   heading 'Selectivity'&lt;br /&gt;column sel           heading 'Sel?'&lt;br /&gt;column blocks        heading 'Blocks'                    format 999,999&lt;br /&gt;column mb            heading 'Table|MB'                  format 999,999&lt;br /&gt;&lt;br /&gt;select table_name, index_name, distinct_keys, min_ndv,&lt;br /&gt;      case&lt;br /&gt;        when (distinct_keys &lt; min_ndv)&lt;br /&gt;        then 'N'&lt;br /&gt;        else 'Y'&lt;br /&gt;      end sel,&lt;br /&gt;      (blocks * &amp;amp;&amp;amp;block_size / (1024 * 1024)) mb&lt;br /&gt; from ( select t.table_name, t.blocks, t.num_rows num_trows,&lt;br /&gt;                i.index_name, i.num_rows num_irows, i.distinct_keys,&lt;br /&gt;               &amp;amp;&amp;amp;effective_mbrc *&lt;br /&gt;((i.leaf_blocks + i.clustering_factor) / t.blocks) min_ndv&lt;br /&gt;          from user_ind_statistics i, user_tables t&lt;br /&gt;         where t.table_name = i.table_name&lt;br /&gt;           and i.num_rows != 0&lt;br /&gt;      )&lt;br /&gt;where distinct_keys &lt; min_ndv&lt;br /&gt;   and blocks &gt;= &amp;amp;&amp;amp;small_table_threshold&lt;br /&gt;order by table_name, index_name ;&lt;br /&gt;&lt;br /&gt;prompt Selectivity (S) indicates whether index will be chosen or not in a query&lt;br /&gt;prompt given the values of the database settings you have entered.&lt;br /&gt;prompt&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Running this should list indexes that will NOT generally be used by the Optimizer for any query you might run that references that table.  There might be exceptional cases where the Optimizer might use such an index e.g. an Index Fast Full Scan, but generally such indexes would not be used as expected for direct access to the matching rows.&lt;br /&gt;&lt;br /&gt;As ever, remember that higher values for db_file_multiblock_read_count lower the cost for a full table scan, and so lower the threshold at which a full table scan becomes cheaper than an index access.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-3211837740501861781?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/3211837740501861781/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=3211837740501861781' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/3211837740501861781'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/3211837740501861781'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/09/index-selectivity-analysis.html' title='Index Selectivity Analysis'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-3901575963778143064</id><published>2009-08-31T20:18:00.002+01:00</published><updated>2009-08-31T20:26:55.604+01:00</updated><title type='text'>Hash Join Costing</title><content type='html'>As already discussed in previous posts Oracle has different join methods available to it, and will choose the one with the lowest cost for any specific join.  So far I have described how the &lt;a href="http://databaseperformance.blogspot.com/2009/06/nested-loop-join-costing.html"&gt;Nested Loop&lt;/a&gt; and &lt;a href="http://databaseperformance.blogspot.com/2009/07/sort-merge-join-costing.html"&gt;Sort Merge&lt;/a&gt; Join methods work.  Now I will describe the Hash Join method.&lt;br /&gt;&lt;br /&gt;It shares some similarities with the Sort Merge Join method - both data sets are accessed separately, and then joined together - but is obviously different in how it does the join.  The cost formula for this is:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;[Outer Access Cost] + [Inner Access Cost] + [Hash Join Cost]&lt;/li&gt;&lt;/ul&gt;As for the Sort Merge method, this uses an area of memory and so its cost is affected by whether the controlling data set is smaller than the available memory or not.  If it is smaller, then the Hash Join will take place in memory, and the Hash Join Cost component is negligible being only a CPU based cost.  If the data set is larger, then the join takes place in stages, with intermediate results being written to disk.  Hence the Hash Join Cost component will increase significantly due to the disk I/Os involved in writing out the overflow data, and reading it back in again later on.&lt;br /&gt;&lt;br /&gt;The memory available for the Hash Join is the same as for the Sort Merge join, being derived from the pga_aggregate_target initialization parameter.&lt;br /&gt;&lt;br /&gt;The main difference between the Sort Merge and Hash Join methods is that the Sort Merge wants to have &lt;span style="font-weight: bold;"&gt;both&lt;/span&gt; data sets in memory before joining them together, whereas the Hash Join method only wants one of the data sets in memory before starting the join.  This makes a significant difference to the size of the data sets that can be processed by the Hash Join before it overflows to disk, compared to the Sort Merge join method.&lt;br /&gt;&lt;br /&gt;Also the Sort Merge Join wants the data sets sorted before they can be merged, which has an associated cost even if it is only CPU because the data can be sorted in memory.  The Hash Join however does not need the data sets pre-sorted in any way, and so avoids this extra cost.  This difference ensures that the Hash Join is almost always cheaper than a Sort Merge Join, except under exceptional circumstances.&lt;br /&gt;&lt;br /&gt;The Hash Join is particularily efficient compared to the Sort Merge Join where a small data set is joined to a much larger data set.  A Sort Merge Join requires that both data sets be sorted, which needs memory for each data set. Whereas a Hash Join only requires memory proportional to the smaller first data set and is unaffected by the size of the second data set.  This means that a Hash Join can avoid needing to overflow to disk when one of the data sets is much larger, compared to a Sort Merge Join which would need to overflow to disk to sort the larger data set.&lt;br /&gt;&lt;br /&gt;The Hash Join works by reading all of the rows in one of the data sets - termed the Outer data set as before - and puts all of them into a table like structure in memory, assuming that sufficient memory is available.  The data rows are placed in the memory table according to the values in the columns used for the join.  These values are used in a hash function - hence the name of the join method - that returns a value within the range of the size of the table in memory.  The data row is then placed in this location in the table - subject to some caveats.&lt;br /&gt;&lt;br /&gt;Then the second, Inner data set is read, the same hash function applied to the same join columns in this data set, and the corresponding location in the memory table read.  Now either there is a matching row in this location or there is not.  If there is a matching row, then the join takes place and a joined row is produced.  If there is no matching row then the inner data row is simply discarded.&lt;br /&gt;&lt;br /&gt;Thus the Hash Join method can be relatively fast, with little overhead apart from the hash function itself.&lt;br /&gt;&lt;br /&gt;The only caveat, as mentioned, is that it is possible for more than one outer data row to hash to the same location in the memory table.  When this happens the data row is basically stored in the next available higher location.  When the inner data row is hashed, Oracle compares the next locations in the memory table too, while they have the same join values.  This handles the situation of duplicate hash values from different data rows, with minimal overhead.&lt;br /&gt;&lt;br /&gt;Clearly the size of the hash table in memory is determined by the number of rows in the first or outer data set.  The Optimizer is able to swap the data sets around if one is larger than the other, so that it processes the smaller one first.  If this produces a significantly lower cost, then this swapped hash join will be used.&lt;br /&gt;&lt;br /&gt;When the size of the first data set is larger than the available memory, Oracle proceeds along the following lines.  Note that I have simplified the description, and omitted some of the details, but the gist of it is correct.  When hashing the first, outer data set, rows with hash values outside the size of the memory table are written out to disk.  But those that do hash within the in memory range are put into the memory table.  Then the second, outer data set is processed, and hashed in the same way.  Those that hash into an in memory range are processed as normal - join or no join - and those that don't are written out to disk.&lt;br /&gt;&lt;br /&gt;The range of hash values covered by the in memory table is adjusted, to cover the second range of values, and the two pending data sets on disk joined.  The saved first data set is read back in, and either hashed into memory or written out to another temporary area on disk.  Then the second data set is read in, hashed and either joined or not joined if it falls within the in memory range, or written out to another temporary area.  This is repeated until the remaining data from the first, outer data set fits in memory, and the second, inner data set processed the one final time.&lt;br /&gt;&lt;br /&gt;Note that Oracle has certain optimizations in the implementation of this hash mechanism to both discard rows from the second data set that will not join as early as possible, and to write the temporary pending data sets into pre-hashed sub-sets.  These optimizations minimize the total number of disk I/Os that need to be done and minimize multiple passes on the same data rows.  Thus the actual Hash Join mechanism differs in detail from my simplified summary of it.&lt;br /&gt;&lt;br /&gt;As ever, the costs can be verified by looking at the trace file output from the 10053 event.  As described, when the outer data set is small enough to fit in memory the cost of the Hash Join stepitself is very small, being only CPU operations.  The majority of the total Hash Join cost is simply the sum of the access cost to the two data sets being joined.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;[Outer Access Cost] + [Inner Access Cost] + [Hash Join Cost]&lt;/li&gt;&lt;/ul&gt;Where the data sets are much larger than the available memory, then the Hash Join cost step itself increases significantly.  Unfortunately I have not yet been able to determine a single rule that can be applied to determine this cost.  Clearly it is dependent on the size of the outer data set, where the overflow data that will not fit in memory is written out to disk, and on the size of the inner data set that is also processed and written out to disk.  I have not been able to determine yet how the split between these two affects the reported cost of the Hash Join itself (a lack of time on my part unfortunately).  Rather than delay a posting on this I thought I would post what I have now, and later post anything else I discover about the Hash Join.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-3901575963778143064?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/3901575963778143064/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=3901575963778143064' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/3901575963778143064'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/3901575963778143064'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/08/hash-join-costing.html' title='Hash Join Costing'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-4057505772210491108</id><published>2009-08-04T20:24:00.002+01:00</published><updated>2009-08-04T20:28:55.961+01:00</updated><title type='text'>Data Modelling - Why it is important</title><content type='html'>A recent post by Robyn &lt;a href="http://adhdocddba.blogspot.com/2009/05/on-importance-of-good-data-model.html"&gt;on the importance of a good data model&lt;/a&gt;, especially in a development environment using Agile practices made me sit up and think "Yes, I agree with you".  So here are some thoughts of my own on why a Data Model is so important on large data processing applications that store their data in a database.  If your application is neither large nor stores its data in a database, then you can probably ignore these thoughts.&lt;br /&gt;&lt;br /&gt;First, let me state that I think the most important thing for the class of Application that I am discussing here is the Data.  You could go so far as to say "Its &lt;span style="font-weight: bold;"&gt;ALL&lt;/span&gt; about the Data", and not the processing done by the Application. &lt;br /&gt;&lt;br /&gt;Applications involving financial or medical data, for instance.  I am sure that Customers of these classes of Applications care more about the integrity and correctness of their Data than they do about the features offered by the Application.  Their Data must never be corrupted, or deleted, or lost through broken links to other data, or incorrect (miscalculated financial amounts or dates).  It is the Data that is of value, not the processing of the Data.  In fact the Data may live beyond the current application, and be migrated over to the next one and continue to live in one form or another.&lt;br /&gt;&lt;br /&gt;Of course Data on its own is no use to anyone.  Data almost always exists with an Application that manipulates it in one way or another.  So the two can never be separated, or considered as completely independent problems.  But it is Data first, Application second.&lt;br /&gt;&lt;br /&gt;Given the importance then of the Data, we need a Data Model that defines the structure of this Data being manipulated by the Application.  To me such a Data Model involves both a Design and full Documentation of that Design.  And by "full Documentation" I simply mean "adequate" to capture all the necessary information about the Data and the Design, so that someone else coming along later could understand it.  What use is a Data Model that no one can understand?&lt;br /&gt;&lt;br /&gt;In my view the Data and the "Data Model" of it should stand in their own right. This is because the data and its meaning should be constant and consistent, regardless of which particular application is manipulating it. Remember - Data first, Application second.&lt;br /&gt;&lt;br /&gt;By a "Data Model" I mean a high level Logical or Conceptual Model, and the main data entities and the relationships between them. This should always be the same model for a given set of data, regardless of which particular application is manipulating the Data itself.&lt;br /&gt;&lt;br /&gt;Of course the details of the implementation of this Conceptual Model can differ. And all specific applications will require unique elements within the physical database - extra staging tables to hold intermediate results, flags to indicate the type of data or entry, status codes, etc. But these should be in addition to the core data elements identified in the Conceptual Model.&lt;br /&gt;&lt;br /&gt;In other words, although the Physical Database will have lots of extra data columns and tables, you should still be able to map large parts of it back to the Entities in the Conceptual Model very easily.&lt;br /&gt;&lt;br /&gt;If the "Data Model" is somehow unique and tied in to an Application and its implementation, and cannot be separated from it, then this is admitting that it does not capture the true "meaning" of the data, and that such a database could not be used by another application, or be queried directly.&lt;br /&gt;&lt;br /&gt;The Data Model is the Foundation of an Application, just as we have a real Foundation of a House.  Imagine what it would be like to build a house room by room, with only minimal foundations for each room as we build it?  First a one room house with a roof.  Then later you extend it for another room - extra foundations dug, change one existing wall, new walls built, roof extended.  Then 2 more rooms - again dig foundations, build walls, extend the roof over them.  Then you go up a storey by building some rooms on top - rip off the complete roof, build on top of the existing walls to increase their height, put in a new ceiling/floor between the two floors, and a new ceiling and a new roof.  Maybe you go into the roof, instead of putting on a new floor.  The roof will still be modified, and extended outward for windows to be put in.  A floor will have to be laid, and stairs put in.  Any water plumbing or electricity ring mains will have to be extended too on each new development.  Is this an efficient way to build a house?&lt;br /&gt;&lt;br /&gt;No.  The sensible way is to dig all the foundations at once, and then build only what you needed or could afford - maybe only the 2 room model initially - on top of this.  When you came to extend later it would be much easier - no digging of foundations, or realigning walls.  Allowances would have been made for this in advance.&lt;br /&gt;&lt;br /&gt;What I have seen in reality is that newer development methods (Object Orientation) and tools (Java) let developers more easily write code that manipulates data "objects" internally, and then somehow magically persist that data to a database. So the developers focus on application functionality, assuming that the "data persistence" problem will be taken care of somehow. And when it does not happen "transparently" and there are "issues", it then becomes a database administrator area problem and not a development area problem.&lt;br /&gt;&lt;br /&gt;The developers argue that they are taking a "model led design", but unfortunately it is an "object" led design, and not a "data" led design. From my experience, the object to relational mapping is non-trivial, and many object led designs are imperfect - there are major revisions between version 1.0 and 2.0. So what happens to the database structure between version 1.0 and version 2.0? And more importantly, what happens to all of the customer's data in that database?  You cannot just throw the data away and start with fresh empty tables again.&lt;br /&gt;&lt;br /&gt;Also, in many Object Models there can be a lot of repetition between different objects. Customers, People, and Organisations for instance will all have names and addresses. If the Data Model were designed first this commonality would be recognised, and a single set of tables designed to store this data. Whether one shared table or separate identical tables does not matter - it is that the design is the same for all of them. But with separate development teams designing their own objects as they go, you run the risk of having separate Address objects for each piece of functionality developed, all differing in one way or another.&lt;br /&gt;&lt;br /&gt;Data and Objects are not the same thing. They may have a lot in common, but they are not the same thing. And while developers continue to propagate this myth, we will continue to have poor data models, and poor databases, and poor performance as a result. If you believe that the data is important, then you should adopt a data led design approach.  I do not see this as necessarily conflicting with other development methodologies such as Agile.  As long as a data led design is incorporated into it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-4057505772210491108?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/4057505772210491108/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=4057505772210491108' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/4057505772210491108'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/4057505772210491108'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/08/data-modelling-why-it-is-important.html' title='Data Modelling - Why it is important'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-8408365424218295953</id><published>2009-07-20T21:15:00.002+01:00</published><updated>2009-07-20T21:31:53.798+01:00</updated><title type='text'>Sort Merge Join Costing</title><content type='html'>For a Sort-Merge Join, each data set (Outer and Inner) is read separately and sorted by the joining columns.  The resultant sorted data sets are then joined together.  The cost formula for this is:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;[Outer Access Cost] + [Sorting Outer Cost] + [Inner Access Cost] + [Inner Sort Cost]&lt;/li&gt;&lt;/ul&gt;We know how to calculate the two data Access Costs from earlier posts for different tables and the indexes available on them.  How does the Optimizer cost the two Sorts?&lt;br /&gt;&lt;br /&gt;The two key points are - how big is the data set to be sorted, and how much memory is available for use as a sort area? &lt;br /&gt;&lt;br /&gt;If both of the data sets fit into the available memory, then they can be sorted directly into memory as they are read from their sources (previous steps in the execution plan).  This avoids additional disk I/O and only has a CPU cost for the Sorts.  The Optimizer calculates an approximate cost for this CPU component according to its own internal formula.  Although this CPU cost is included by the Optimizer in its costs, for our purposes we can ignore it.  This is because it is generally so small compared to the I/O costs involved in getting the data from disk in the first place, that it is negligible and has no impact on the final choice of which execution plan to use.&lt;br /&gt;&lt;br /&gt;If you are using Oracle 10g and its new initialization parameters, with "&lt;span style="font-style: italic;"&gt;workarea_size_policy&lt;/span&gt;" set to &lt;span style="font-style: italic;"&gt;auto&lt;/span&gt;, then the amount of memory used by Optimizer as the query sort area will be a percentage of the memory specified for "&lt;span style="font-style: italic;"&gt;pga_aggregate_target&lt;/span&gt;".  (If these are not set correctly, then it will use the value of the older "&lt;span style="font-style: italic;"&gt;sort_area_size parameter&lt;/span&gt;" - but this is now deprecated and should not be used.)&lt;br /&gt;&lt;br /&gt;The percentage it will use is not documented anywhere, but there are several references to 5% on the web.  However, in the testing that I did I saw nearer 20% being used.  This could be due to the fact that nothing else was happening on this system, so all of the PGA memory was free and available.  Or that I had a relatively small PGA aggregate target of 32 MB, and there is a minimum allocation size from within the PGA memory area.&lt;br /&gt;&lt;br /&gt;Either way, if you are trying to estimate the cost yourself, it is probably best to use 5% of the PGA memory as your value.&lt;br /&gt;&lt;br /&gt;To calculate the size of the data being sorted you need to know the following:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The columns being retrieved from each table.  Oracle only processes those data columns it needs, and not the whole data row.  The columns used will include those needed for the query output itself, plus those needed for filter conditions and joins to other data sets.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;The size of each of these columns, added together for each data set to give a data total.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;An additional 6 bytes per data row, as a unique identifier for it, internal to the processing by the Optimizer.  This is present in all data sets, regardless of the join method.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;An additional 12 bytes per data row for sort processing, on 32 bit systems.  This would be doubled to 24 bytes on 64 bit systems, as they are memory address pointers. [Jonathan Lewis, Cost Based Oracle Fundamentals].&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;The number of rows or cardinality of each data set.&lt;/li&gt;&lt;/ul&gt;Adding together the row element sizes and multiplying by the cardinality of the data set, gives the total space needed for each data set.  If either of these exceeds the available memory, then the Optimizer assumes that the Sort will be done in chunks, or sets, each written out to disk, and then each sorted sub-set merged together into the final sorted result set. &lt;br /&gt;&lt;br /&gt;During this Merge phase, Oracle needs to read in blocks of sorted data from &lt;span style="font-weight: bold;"&gt;all&lt;/span&gt; of the sorted sets into memory at the same time, in order to properly merge them together.  For efficiency Oracle uses multi-block reads on the sorted sets, because they are already sorted sequentially on disk and are being used in that order.  Multi-block reads reduces the total number of read disk I/Os needed to read in all the sorted sets.  Thus the number of sorted sub-sets that can be merged simultaneously is limited by the memory available, as mentioned earlier, and the size of the multi-block reads.  The Optimizer terms this the "Sort width" in some of its trace output, but it means the number of sort sets that can be merged together at once.&lt;br /&gt;&lt;br /&gt;If there is enough memory after the Sort phase to Merge all of the sorted sets together at once, then they are indeed merged this way, and this is termed a "single pass merge" - one processing of the sorted sets to merge them.&lt;br /&gt;&lt;br /&gt;If there is not enough memory to Merge all sorted sets at once, then Oracle will merge as many as it can, and produce a single, larger sorted set of data rows from these.  It will then repeat this merging across the next group of sorted sets, and so on to iterate over all of the sets from the initial sort.  This will result in a smaller number of larger sorted sets, which hopefully can now be merged together.  If not, the merging of sets is repeated until the number of sets is reduced to that which can be merged together.  This is termed a "multi-pass merge".&lt;br /&gt;&lt;br /&gt;Assuming that the data to be sorted is larger than the memory available, then the cost of the initial sort phase into sorted sets is the cost of writing out the data blocks using single block writes, plus the CPU cost to do the sort.  Again, as before, the CPU cost is trivial compared to the disk I/O cost, but both are calculated by the Optimizer.  The cost of the Merge phase is again the cost to read in the data from disk, plus the CPU cost of the merge. &lt;br /&gt;&lt;br /&gt;When reading the data back in to merge, it seems that Oracle wants to use multi-block reads as described, but may not be able to achieve that for all the blocks for some reason or another.  The Optimizer seems to base its calculations on achieving multi-block reads for two thirds of the sorted blocks, but not for the other third i.e. single block I/O used for this.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Sort-Merge Example&lt;/h3&gt;&lt;br /&gt;The calculations here produce results that agree with the costs given in a 10053 event trace file.  In fact, it was a number of such trace files for different data sets that was used to derive the formula presented.&lt;br /&gt;&lt;br /&gt;The Outer data set has 7 bytes of data, plus the 6 overhead is 13 bytes of data per row, with cardinality of 500,000 rows.  The cost to produce this has been calculated by the Optimizer as being 245.84.&lt;br /&gt;&lt;br /&gt;The Inner data set has 3 bytes of data, plus 6 overhead gives 9 bytes of data per row, with cardinality of 100,000 rows.  The Optimizer has calculated the cost to produce this as 270.83.&lt;br /&gt;&lt;br /&gt;Adding on the 12 bytes per row used by the sort operation, gives 25 bytes per Outer data row and 21 bytes per Inner data row.  The Outer data volume is therefore 500,000 * 25 = 12,500,00 bytes = 11.92 MB (dividing by 1024 twice).  The maximum memory available for the sort is 6.4 MB in this case, so the data will be sorted to disk and then merged back together.  The sorts can be done using 2 sort sets, so only a single pass merge is required.&lt;br /&gt;&lt;br /&gt;It seems that when the Outer data set is sorted in multiple sets, so is the Inner data set too, even though it could fit in memory in this case.&lt;br /&gt;&lt;br /&gt;To calculate the I/Os involved, we must calculate the number of blocks required to store each of these data sets of 25 and 21 bytes per data row each.  With an 8 KB block size, the calculations of the number of blocks are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Outer Rows per block = floor (8192 / 25) = 327&lt;/li&gt;&lt;li&gt;Outer blocks = ceiling (500,000 / 327) = 1530&lt;/li&gt;&lt;li&gt;Inner Rows per block = floor (8192 / 21) = 390&lt;/li&gt;&lt;li&gt;Inner blocks = ceiling (100,000 / 390) = 257&lt;/li&gt;&lt;/ul&gt;To each of these another block is added, presumably some kind of master control block, giving 1531 and 258 blocks per data set.&lt;br /&gt;&lt;br /&gt;The sort phase cost is that of writing out the blocks using single block I/O i.e. 1531 and 258.  The merge phase cost is two thirds multi-block I/O and one third single block I/O.  Remember that multi-block I/O is adjusted by Oracle by the ratio of single block read to multi-block read and the value used for Multi-Block Read Count (MBRC, typically being the db_file_multiblock_read_count initialization parameter), as seen previously in the formula for a Full Table Scan.  Using typical values of these of 8 for MBRC and 12 for SREADTIM and 26 for MREADTIM, gives an adjusted MBRC value of 3.69.&lt;br /&gt;&lt;br /&gt;However, Oracle does not use the value of "&lt;span style="font-style: italic;"&gt;db_file_multiblock_read_count&lt;/span&gt;" for sorts, and instead uses an undocumented parameter of "&lt;span style="font-style: italic;"&gt;_smm_auto_min_io_size&lt;/span&gt;", which defaults to 56 KB.  This is 7 blocks in size in this case.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;1530 / 3 = 510&lt;/li&gt;&lt;li&gt;510 * 2 = 1020 (multi-block I/O)&lt;/li&gt;&lt;li&gt;Multi-Block Cost is (1020 * 26) / (7 * 12) = 26520 / 84 = 315.714 i.e. 316 I/O cost units&lt;/li&gt;&lt;/ul&gt;The Merge phase cost is the sum of both types of I/O - 510 + 316 = 826.  The Sort phase cost is 1020.  And the total Sort/Merge I/O cost is 1530 + 826 = 2356.&lt;br /&gt;&lt;br /&gt;Likewise for the Inner data set the Sort phase cost is 258.  The Merge cost is derived from:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;258 / 3 = 86&lt;/li&gt;&lt;li&gt;86 * 2 = 172&lt;/li&gt;&lt;li&gt;Multi-Block Cost is (172 * 26) / (7 * 12) = 4472 / 84 = 53.23 i.e. 54 I/O cost units&lt;/li&gt;&lt;li&gt;Total Merge cost is 86 + 54 = 140.&lt;/li&gt;&lt;li&gt;Total Sort / Merge cost is 258 + 140 = 398.&lt;/li&gt;&lt;/ul&gt;These results agree very closely with those reported in the 10053 event trace file for this query.  This reports the Merge cost separately as "IO Cost / pass", and the "Total IO sort cost" which is actually for the complete Sort / Merge.  The Sort cost itself must be deduced by subtracting these two values from each other.&lt;br /&gt;&lt;br /&gt;For the Outer data set the 10053 trace file reported 2349 as the total Sort/Merge cost, and 400 for the Inner data set.  These differences can be explained as follows:&lt;br /&gt;&lt;br /&gt;The Optimizer used a calculated cardinality of 497,392.56 for the Outer data set, instead of 500,000.  This resulted in a lower sort block count of 1523 instead of 1530.&lt;br /&gt;&lt;br /&gt;After dividing the block count by three into thirds, it seems to add on two more blocks, presumably to account for an uneven division.  Thus 1523 / 3 gives 508 blocks rounded up, but the Optimizer uses 510.  Likewise instead of 86 for the third of 258, it uses 88.&lt;br /&gt;&lt;br /&gt;This gives the modified results of:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Outer Merge I/O Cost = 510 + 316 = 826&lt;/li&gt;&lt;li&gt;Outer Sort / Merge I/O Cost = 1523 + 826 = 2349 - as reported in the 10053 file&lt;/li&gt;&lt;li&gt;Inner Merge I/O Cost = 88 + 55 = 143 - 142 is reported in the 10053 file&lt;/li&gt;&lt;li&gt;Inner Sort / Merge I/O Cost = 258 + 143 = 401 - 400 is reported in the 10053 file&lt;/li&gt;&lt;/ul&gt;Using the values reported in the 10053 file, the total cost for the Sort Merge Join is:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;[Outer Access Cost] + [Sorting Outer Cost] + [Inner Access Cost] + [Inner Sort Cost]&lt;/li&gt;&lt;li&gt;245.84 + 2349 + 270.83 + 400 = 3265.67&lt;/li&gt;&lt;/ul&gt;The reported value in the 10053 trace file is 3331.02, of which 3254.00 is the total I/O cost.  The missing 65.35 (3331.02 - 3265.67) is presumably the CPU cost of the two sorts and their merges and the production of the Outer and Inner data sets.  As predicted this is much lower than the I/O cost of the sorts and merges.&lt;br /&gt;&lt;br /&gt;The heuristic presented of the Merge reading two thirds of the sorted blocks using multi-block reads, and single blocks reads for the remaining third, was arrived at by varying the values of different inputs that the Optimizer would use, and observing the impact on the costs reported.  Whenever the value of &lt;span style="font-style: italic;"&gt;_smm_auto_min_io_size&lt;/span&gt; was adjusted there was a corresponding change in the reported value of the Merge cost (IO Cost / pass).  From this the splitting into thirds was deduced, and also that the Sort cost was simply the number of blocks to be sorted due to using single block I/O.  Once hypothesized, all subsequent tests agreed with this formula, and none disagreed.  Increasing the number of rows to be sorted, or the width of the rows, or the PGA Aggregate Target did change the number of sort sets produced, but the formula presented here still held and the Sort and Merge I/O costs reported agreed with those calculated.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-8408365424218295953?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/8408365424218295953/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=8408365424218295953' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8408365424218295953'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8408365424218295953'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/07/sort-merge-join-costing.html' title='Sort Merge Join Costing'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-3208919751998616579</id><published>2009-06-22T14:20:00.002+01:00</published><updated>2009-06-22T14:41:50.079+01:00</updated><title type='text'>Nested Loop Join Costing</title><content type='html'>As described previously, the optimizer has 3 main join methods available when joining 2 data sets together:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Nested Loop Join&lt;/li&gt;&lt;li&gt;Sort Merge Join&lt;/li&gt;&lt;li&gt;Hash Join&lt;/li&gt;&lt;/ul&gt;Each method has a different associated formula for its cost of execution.  Here I will look at the Nested Loop Join, which is the most straightforward in some respects.  Some of this information, such as the costing formula, are given in the Oracle Performance Tuning Guide Manual, in the chapter on the Query Optimizer. &lt;br /&gt;&lt;br /&gt;A Nested Loop Join reads data from one data set - the Outer data set - and for each data row retrieves corresponding rows from the other data set - the Inner data set.  Clearly the Inner data set is accessed separately for each data row in the Outer data set.  And the number of rows in a data set is termed its Cardinality. Hence the cost formula for a Nested Loop Join is as follows, where "Outer" and "Inner" refer to their respective data sets, and square brackets used to enclose each individual element:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;[Cost of Outer Access] + ([Cardinality of Outer] * [Cost of Inner Access])&lt;/li&gt;&lt;/ul&gt;The Outer and Inner costs will have been determined earlier by the Optimizer, either as a single table access cost or as the cost of a previous join between data sets.  Thus these component costs do not have to be recalculated by the Optimizer.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Example&lt;/h3&gt;&lt;br /&gt;Consider the following query using a table with 100,000 rows in it over 1252 * 8 KB blocks:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;select it1.i1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;  from insert_test_1 it1, insert_test_1 it2&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt; where it1.i1 = it2.i4 &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;   and it1.i3 = 99     &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I am using the same table twice in the query, with an alias for each occurrence of it.&lt;br /&gt;&lt;br /&gt;There are indexes on each column i.e. on i1, i2 (not used here), i3 and i4.  The i1 index is unique, but the others are not.&lt;br /&gt;&lt;br /&gt;Remember that the Optimizer starts with the data set with the lowest cardinality (smallest number of data rows).  In this case this is "it1", because the "i3" column has 100 rows for each possible value.  This gives a cardinality of 100 for "it1" as the Outer data set.  The lowest cost access to this is using the index on this, which has a cost of 102 (costed using index range scan described in an earlier post).&lt;br /&gt;&lt;br /&gt;To join to "it2" in this case, the index on "i4" can be used.  In this case no other data columns are needed from the "it2" table, and thus the cost is really that of reading the index leaf block with that particular value.  For this index all entries for one value fit within one leaf block (leaf blocks per key value is 1), so the access cost is just 1 to read that particular leaf block.&lt;br /&gt;&lt;br /&gt;The Nested Loop Cost is therefore:-&lt;br /&gt;&lt;ul&gt;&lt;li&gt;102 + (100 * 1) = 102 + 100 = 202&lt;/li&gt;&lt;/ul&gt;This was verified by examining a 10053 event trace file when this SQL was executed.  The costs reported differ only after the decimal point, due to a combination of rounding of values within Oracle and additional minor costs such as CPU cost (in 10g).&lt;br /&gt;&lt;br /&gt;The 10053 trace file also showed that the Optimizer also costed other variations of the Nested Loop Join.  This agrees with the statement that the Optimizer costs all possible access and join plans to determine the cheapest one.&lt;br /&gt;&lt;br /&gt;Also considered are a full table scan of it2, at a cost of 243.1.  The Nested Loop cost of this should be:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;102.1 + (100.2 * 243.1) = 102.1 + 24358.2 = 244460.3&lt;/li&gt;&lt;/ul&gt;The 10053 trace file actually reported a cost of 24580.26 total.  Again this would be different due to rounding of results and additional CPU costs.  Nevertheless, it is very close to our calculated value for the I/O cost only.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-3208919751998616579?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/3208919751998616579/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=3208919751998616579' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/3208919751998616579'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/3208919751998616579'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/06/nested-loop-join-costing.html' title='Nested Loop Join Costing'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-2252602221435161191</id><published>2009-06-15T21:05:00.001+01:00</published><updated>2009-06-15T21:11:28.009+01:00</updated><title type='text'>Optimizer Costing of Joins</title><content type='html'>Previously I have described how the Oracle Cost Based Optimizer costs all possible execution plans for a SQL statement to arrive at the one with the lowest cost (cheapest), and how it costs access to single tables using access methods of Full Table Scan and Index Range Scans.  There are a few other potential access methods to a single table, but these are the primary ones that occur most if not all of the time.  I will now describe how the Optimizer goes about costing joins between tables.&lt;br /&gt;&lt;br /&gt;Although we often say that a query involves processing data from tables, it is important to realise that internally the query processes a series of data sets.  Logically each step in the execution plan produces a result set of data rows, that is in turn consumed by its parent step within the execution plan.  So although we might talk about joining "tables" together, the Optimizer is actually always joining "data sets" together.  Some of these data sets do come from tables, while others may come from other steps within a more complex execution plan e.g. from a join of other data sets.&lt;br /&gt;&lt;br /&gt;The Optimizer normally only joins two data sets together at a time.  There are some exceptions, such as using Bit Map Indexes on Star schemas in Data Warehouses.  But for most transaction processing systems, the joins will be between two data sets at a time.  Joining more than two tables together simply involves the Optimizer first joining two of the tables together to produce a resultant data set, and then joining this to the third table, and so on to join together all the tables in the query.&lt;br /&gt;&lt;br /&gt;First, how does the Optimizer determine the lowest cost join execution plan?  In simple terms it costs each and every possible join method between the pair of data sets being joined, and chooses the cheapest one from these.  That is: it considers different join methods for joining Data Set 1 to Data Set 2, and also considers the join the other way around - Data Set 2 to Data Set 1.&lt;br /&gt;&lt;br /&gt;When there are more than two tables to join together, the Optimizer simply iterates over all the possible join combinations between them and the different join methods for each.  It picks one data set as the first one, another as the second, costs the joins between them to get a cheapest, then costs the joins of this to the third data set. Then the next iteration does the first data set to the third, and in turn to the second.  Then it repeats for all the other combinations - second to first to third, second to third to first, and so on.  Thus all possible join paths between the data sets are costed, and the cheapest one chosen.&lt;br /&gt;&lt;br /&gt;Second, what join methods are available?  There are 3 main join methods:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Nested Loop Join&lt;/li&gt;&lt;li&gt;Sort Merge Join&lt;/li&gt;&lt;li&gt;Hash Join&lt;/li&gt;&lt;/ul&gt;Each method has a different associated formula for its cost of execution.  Generally this is the cost of accessing each data set, already calculated previously by the Optimizer, and the cost of the join itself.&lt;br /&gt;&lt;br /&gt;Note that when I say "cost of accessing each data set" for a join between two data sets, the actual access method can be different for each join method, and so have a different cost.  It is not required that all three join methods use the same access method to get to the data sets they need to join.  The Optimizer will cost the join methods with the appropriate data access and choose the cheapest one.&lt;br /&gt;&lt;br /&gt;Thus for 2 tables and 3 join methods the Optimizer must cost 6 separate possible join combinations. For 3 tables this produces 6 different join combinations between them, each costed for the 3 join methods, for a total of 18 different possible join costs.  For 4 tables it becomes 72 different joins to cost, and so on.&lt;br /&gt;&lt;br /&gt;Fully costing each join combination properly would be very time consuming, so the Optimizer has a number of improvements to reduce this.  For the first join combination it starts with the data set with the lowest cardinality - smallest estimated number of rows in it.  In turn it joins this to the data set with the next lowest cardinality, and so on.  This is because this is likely to produce a relatively low cost plan due to it processing less data rows initially.  This is not guaranteed to be the lowest cost, but will generally be relatively low.  This first join combination is then remembered as the currently lowest join cost.&lt;br /&gt;&lt;br /&gt;When costing subsequent join combinations, if the partial cost so far of a join plan is greater than the currently lowest join cost, then it is abandoned (pruned). This makes sense, because costs can only increase for an execution plan as other costs are accumulated within it.  Furthermore, sometimes the cost for just accessing the first data set may be greater than the current lowest join cost, and such a join can be pruned without any further consideration or costing of different join methods.  Remember that single table access paths are costed first by the Optimizer for each table referenced in the query.  So it already knows what these costs are, with no further calculations required.&lt;br /&gt;&lt;br /&gt;Thus the Optimizer is able to cost each join combination and each join method between the data sets, to arrive at the cheapest join cost.  The internal optimizations attempt to minimise the total effort involved in costing each of these, by picking a relatively low cost join combination as the first one to be costed, and by immediately pruning out any alternative join plan as soon as its partial cost exceeds the current lowest cost join.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-2252602221435161191?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/2252602221435161191/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=2252602221435161191' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/2252602221435161191'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/2252602221435161191'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/06/optimizer-costing-of-joins.html' title='Optimizer Costing of Joins'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-9089285813673724909</id><published>2009-05-27T19:32:00.002+01:00</published><updated>2009-05-27T19:47:04.261+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='optimizer'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL cost'/><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>Optimizer Costing of Index Scans</title><content type='html'>Previously I gave a brief overview of the fact that the optimizer costs each potential execution plan for a SQL query and then chooses the one with the lowest cost, and described how it costs a Full Table Scan.  Now I will carry on and look at the cost calculations for using an Index to access data in a table.  Again, most of this is based on the information in &lt;a href="http://www.centrexcc.com/A%20Look%20under%20the%20Hood%20of%20CBO%20-%20the%2010053%20Event.pdf"&gt;A Look Under The Hood Of CBO: The 10053 Event&lt;/a&gt; by Wolfgang Breitling, and various tests I have done to confirm the details of this.  Now we will look at the calculation of costs for table access using an Index.&lt;br /&gt;&lt;br /&gt;Note that I am concentrating here on the I/O cost as that is the largest component of the cost of an execution plan.  However, there is also a CPU cost, which is not shown here.  Typically this is far less than the I/O cost, and is so in the cases discussed so far.  It is possible for some execution plans that the CPU cost is significant, and this will increase the total cost of such an execution plan.&lt;br /&gt;&lt;br /&gt;Just as Oracle can scan all blocks in a table, so it can scan all leaf blocks in an index.  This scan can be performed in two different ways potentially - as a scan using single block reads, or a scan using multi-block reads as for a table scan.  The former is termed an Index Full Scan, while the latter is termed an Index Fast Full Scan.  The cost calculations of each are based on the number of Leaf Blocks in that particular index.&lt;br /&gt;&lt;br /&gt;For a Full Scan the cost formula seems to be:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;#Levels + #Leaf Blocks&lt;/li&gt;&lt;/ul&gt;For a Fast Full Scan the cost formula uses MBRC as for a Full Table Scan:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;#Levels + (#Leaf Blocks * MREADTIM / (MBRC * SREADTIM))&lt;/li&gt;&lt;/ul&gt;Testing with different queries and different indexes shows this to be the case. &lt;br /&gt;&lt;br /&gt;Note that I have seen Oracle report both access methods as an "INDEX FAST FULL SCAN" operation in an EXPLAIN PLAN (DBMS_STATS.DISPLAY_CURSOR).  I wonder whether this is a bug within the labelling of operations within the Plan Table used?&lt;br /&gt;&lt;br /&gt;Normally however indexes are used to implement a predicate condition, and restrict the number of rows being returned.  To do this the index is traversed from the root block down through the levels of the index to the leaf blocks that contain the matching values.  Not all leaf blocks will be visited - only those containing a matching entry for the predicate condition.  The Oracle Optimizer terms this restriction on the number of matching entries the Filter Factor.  It is the proportion of rows in the whole table that will match the predicate condition i.e. a value always less than 1. Thus the cost of using only the index itself is:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;#Levels + (#Leaf Blocks * Filter Factor)&lt;/li&gt;&lt;/ul&gt;In many situations some data needed for the query is not present in the index, and so the data row needs to be read from the table itself.  Thus the cost increases by the cost of reading these data rows, which is essentially the number of rows matching the predicate condition:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;#Rows * Filter Factor&lt;/li&gt;&lt;/ul&gt;There are however several refinements which need to be made to these formulae to be reflect how the Optimizer actually costs the access paths.&lt;br /&gt;&lt;br /&gt;First, it is possible that matching data rows are stored in the same data block.  In such cases the data block is only read once, even though multiple data rows are used from it.  Such clustering of similar values will reduce the number of data blocks that need to be read i.e. the read of one data block from disk can satisfy multiple entries from the same leaf block.  Oracle calculates this clustering as a statistic for each index, and uses this to report the number of data blocks that would need to be read – the Clustering Factor. This is visible in the CLUSTERING_FACTOR column in the USER_INDEXES and USER_IND_STATISTICS data dictionary views.&lt;br /&gt;&lt;br /&gt;For many indexes it will be the case that the Clustering Factor is the same value as the number of Rows in the table – no co-location of data values for that particular column.  But for some indexes, the Clustering Factor will be much lower, such as an index on the date the data was loaded, or an incrementing sequential account number.  The Clustering Factor Statistic factors in the benefits of the reduced disk I/O from this co-location of Index and Data entries, to better estimate the real disk I/O that would actually take place.&lt;br /&gt;&lt;br /&gt;The final full index cost formula is thus:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;#Levels + (#Leaf Blocks * Filter Factor) + (Clustering Factor * Filter Factor)&lt;/li&gt;&lt;/ul&gt;Second the Filter Factor is calculated in different ways for different types of predicate conditions within a query.  For now I will restrict myself to equality conditions (=), as they are the most common.  Please read Wolfgang's Under The Hood paper for information on the Filter Factor for other conditions.&lt;br /&gt;&lt;br /&gt;For a column with no Histograms (this is important), the Filter Factor for an equality condition will be the density of data values within that column - a measure of the proportion of data rows that have the same value for that column.  This is true whether the condition uses a literal value or a bind variable.  In turn the density value is simply the inverse of the number of distinct values in that column i.e. 1 / NDV.  This count is available in the DISTINCT_KEYS column of USER_IND_STATISTICS and USER_INDEXES.&lt;br /&gt;&lt;br /&gt;The Filter Factor can also be used to calculate the number of matching rows produced by the predicate condition i.e. the Cardinality of the result set.  This is used by the Optimizer when multiple conditions are combined, either on the same table or joined tables.&lt;br /&gt;&lt;br /&gt;Filter Factors on multiple conditions in a query are combined together according to the standard rules for probabilities:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;c1 AND c2    ==&gt;    FF1 * FF2&lt;/li&gt;&lt;li&gt;c1 OR c2    ==&gt;    FF1 + FF2 – (FF1 * FF2) i.e. exclude duplicate matches&lt;/li&gt;&lt;/ul&gt;If the column has a Histogram on it, the Density value will be calculated from the Density values of each bucket within the Histogram that fall within the query range.  This might be close to 1 / NDV for the whole table, or it might be different.  That is one way in which Histograms can have such a big impact on query execution plans.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Examples&lt;/h3&gt;&lt;br /&gt;Using the index cost formula of:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;#Levels + (#Leaf Blocks * Filter Factor) + (Clustering Factor * Filter Factor)&lt;/li&gt;&lt;/ul&gt;For a query with a condition on column C3, which has an index on it with a Density of 0.001002, 250 Leaf Blocks, 1 Level and 100,000 rows and the same value for Clustering Factor the cost should be:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;1 + 250 * 0.001002 + 100000 * 0.001002 = 1 + 0.2505 + 100.2 = 101.4505&lt;/li&gt;&lt;/ul&gt;The reported I/O cost by Oracle was 102, which agrees very closely with our calculated value.  The difference is probably due to rounding errors.  The expected cardinality is 100,000 * 0.001002 = 100.2.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-9089285813673724909?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/9089285813673724909/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=9089285813673724909' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/9089285813673724909'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/9089285813673724909'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/05/optimizer-costing-of-index-scans.html' title='Optimizer Costing of Index Scans'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-5852434857483268376</id><published>2009-05-08T21:38:00.002+01:00</published><updated>2009-05-08T21:58:16.886+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='optimizer'/><category scheme='http://www.blogger.com/atom/ns#' term='SQL cost'/><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>Oracle Optimizer Plan Costing - Full Table Scans</title><content type='html'>Hopefully this is the start of a series on how the Oracle Cost Based Optimizer determines the cost of a particular execution plan for a SQL statement.  Knowing how it arrives at the cost of each possible access path to a data set, we can better understand why a particular access path was chosen, and why an alternative access path had a higher cost.  This information was kick started by &lt;a href="http://www.centrexcc.com/A%20Look%20under%20the%20Hood%20of%20CBO%20-%20the%2010053%20Event.pdf"&gt;A Look Under The Hood Of CBO: The 10053 Event&lt;/a&gt; by Wolfgang Breitling, and also information in &lt;a href="http://www.amazon.co.uk/Troubleshooting-Performance-Hardback-Christian-Antognini/dp/1590599179/ref=sr_1_1?ie=UTF8&amp;amp;s=books&amp;amp;qid=1241815264&amp;amp;sr=8-1"&gt;Troubleshooting Oracle Performance&lt;/a&gt; by Christian Antognini, and refined by various tests I did.The Under the Hood paper is very good, and has a good description of how the Optimizer goes about costing different access paths to data sets.  Unfortunately it seems to be based on Oracle 9i, and things have changed slightly in Oracle 10g.  The principles are the same, but the trace information output by the optimizer has changed slightly.&lt;br /&gt;&lt;br /&gt;For this post I will focus on how the optimizer works generally, and specifically how a Full Table Scan is costed.  These will not be full, exhaustive explanations of the inner workings of the Oracle Cost Based Optimiser, but short and to the point descriptions of the key costings.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Plan Costing&lt;/h3&gt;&lt;br /&gt;The Optimizer itself costs all execution plans in terms of the number of disk I/Os involved, because these are generally the largest component of the execution plan cost.  Other costs such as CPU costs are also included in the total plan cost, after being converted to an I/O count via a conversion value. When the lowest cost plan has been produced, this can be easily converted to an elapsed time value by simply multiplying the cost by an average I/O time value. &lt;br /&gt;&lt;br /&gt;When producing an execution plan for a SQL query (queries are the most common and also the most complicated things that the Optimizer has to deal with), the Optimizer simply tries to produce all possible execution plans and cost each of them.  From these it chooses the plan with the lowest cost.  Clearly this could be quite an exhaustive way of doing things.  So what the Optimizer actually does is to fully cost one execution plan, and then only partially cost the subsequent execution plans. &lt;br /&gt;&lt;br /&gt;Each subsequent plan is costed as each step of the plan is determined by the Optimizer.  If the partial cost of such a plan so far is greater than the current best plan, then it is abandoned.  This early pruning avoids the optimizer wasting too much time on plans that are too costly.  Likewise if a new plan is produced with a total cost less than the current best plan, then the new plan becomes the current best plan.&lt;br /&gt;&lt;br /&gt;After considering all possible execution plans the optimizer will have the lowest cost execution plan for the SQL query.&lt;br /&gt;&lt;br /&gt;The execution plans themselves are formed in an iterative fashion.  First, for each table referenced in the SQL query the optimizer costs each potential access path to the data in that table.  Indexes are considered if they can be used directly to implement a predicate - a filter condition within the SQL query restricting the rows returned.  Then the optimizer considers each possible join combination between the tables in the SQL query, using each possible join method - Nested Loop, Sort Merge or Hash Join.  These joins use some of the costs for accessing each of the tables determined eariler, combined according to how that join method works.  Thus each possible join path (A to B to C, A to C to B, B to A to C, etc.) is costed in turn using each of the 3 possible join methods, to arrive at the lowest cost execution plan for the SQL query.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Full Table Scan&lt;/h3&gt;&lt;br /&gt;This is one access method to data in a table that is always available to the optimizer.  Logically the cost is the number of disk I/Os required to read every data block in the table.  The number of data blocks in a table is obtained from the BLOCKS column in USER_TABLES.  It is as accurate as the last time you gathered statistics on that table.  Oracle 10g has an automatic job that runs every night by default to update the statistics on tables that have changed significantly, to minimise the drift between actual table sizes and the statistics stored about them in the database.&lt;br /&gt;&lt;br /&gt;An enhancement Oracle has is the ability to issue multi-block reads that read more than one block from disk in a single read request.  This reduces the total number of disk I/Os needed.  Thus the number of disk I/Os needed is reduced by the multi-block read count (MBRC), according to the formula:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: courier new;"&gt;Blocks / MBRC&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;The Optimizer wants all costs to be in units of equal disk I/Os i.e. in units of single block reads.  However, the time for a multi-block read is likely to be different – generally slightly longer given that more data is transferred.  To convert a multi-block read cost into units of a single block read, the multi-block read cost is divided by the time for a multi-block read and multiplied by the time for a single block read.  This adjustment means that the “cost” is now in units of cost of single block reads, as used by other access paths such as indexes.&lt;br /&gt;&lt;br /&gt;Therefore the cost of a Full Table Scan is actually:&lt;br /&gt;&lt;div style="text-align: left;"&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: courier new;"&gt;(Blocks * MREADTIM) / (MBRC * SREADTIM)&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;br /&gt;MREADTIM and SREADTIM, as well as MBRC, are part of a set of statistics Oracle has about the relative performance of the computer system it is running on.  These are termed the System Statistics, and are visible in sys.aux_stats$ or via the dbms_stats.get_system_stats procedure.  Out of the box following installation, Oracle only has a minimal set of default values for some of these statistics - IOSEEKTIM, IOTFRSPEED and CPUSPEEDNW.  All the other system statistics have no value.  They can be set in a number of ways - the intended way being to get Oracle to measure these values from the system itself while it is executing some representative workload (these are termed Workload Statistics) - but they can be set manually if desired, or left unset.&lt;br /&gt;&lt;br /&gt;Oracle uses those System Statistics that it has values for.  For those with no value Oracle uses various substitution formulae to calculate values to use instead.&lt;br /&gt;&lt;br /&gt;If there is a value for MBRC in the System Statistics then the Optimizer uses that value.  Otherwise, it uses the value of the db_file_multiblock_read_count (DBFMBRC) initialization parameter.&lt;br /&gt;&lt;br /&gt;If MREADTIM and SREADTIM do not have any values in the System Statistics, then the following substitutions are made:&lt;br /&gt;&lt;div style="text-align: left;"&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: courier new;"&gt;SREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE / IOTFRSPEED)&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family: courier new;"&gt;MREADTIM = IOSEEKTIM + (MBRC * DB_BLOCK_SIZE / IOTFRSPEED)&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;br /&gt;This is probably the general case on most systems - no Workload System Statistics are present (MBRC, SREADTIM, MREADTIM and others) - and only default values for IOSEEKTIM (10 milliseconds) and IOTFRSPEED (4 KB / millisecond).  For an 8 KB block size and DBFMBRC of 8 this gives values for SREADTIM of 12 ms and for MREADTIM of 26 ms, and the ratio of MREADTIM to SREADTIM of 2.167.&lt;br /&gt;&lt;br /&gt;A simple test shows this to be the case - a SELECT on a table with no indexes.  The table I used had 1252 blocks of 8 KB each, and DBFMBRC was set to 8.  No System Statistics had been collected, so only the defaults exist for IOSEEKTIM and IOTFRSPEED.  Using the substituted formulae for MREADTIM and SREADTIM the calculated cost should be:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: courier new;"&gt;(1252 * 26) / (8 * 12) = 32552 / 96 = 339.08&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;The reported cost by Oracle was 341, which is very close.  The difference is due to various factors, such as rounding within the calculation, the optimizer probably costing an extra read for the table header (+1), and the optimizer adding in a CPU cost for processing the data.&lt;br /&gt;&lt;br /&gt;Using these formulae and the values of the System Statistics, we can calculate for ourselves the expected cost of a Full Table Scan of any table.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-5852434857483268376?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/5852434857483268376/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=5852434857483268376' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/5852434857483268376'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/5852434857483268376'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/05/oracle-optimizer-plan-costing-full.html' title='Oracle Optimizer Plan Costing - Full Table Scans'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-6168814798206945216</id><published>2009-03-08T17:35:00.002Z</published><updated>2009-03-08T17:41:56.927Z</updated><title type='text'>System Activity Data - What is the System up to?</title><content type='html'>I'm always amazed at how many people do not know about the SAR utility on UNIX and Linux systems.  It is a great little utility for collecting a lot of data about what is going on on a system.  No, it is not the best possible tool for all such information, but it comes as standard on most UNIX systems and will give you most of what you want for very little effort on your part.&lt;br /&gt;&lt;br /&gt;The problem we often deal with is that when running an application and taking application level measurements we are not sure what is happening lower down on the system, under the covers as it were.  When the application performance is slow it would be nice to know if the computer system itself was very heavily loaded too, or whether the cause of the slowness lies somewhere else.  Such relatively simple data about activity and resource usage on the system can help us establish what class of problem we have - a saturated resource in the system itself, or a badly behaving application.&lt;br /&gt;&lt;br /&gt;To know what the computer system is up to we need to know about all of the resources in it - CPU, Memory, and Disks.  And we need several different measurements on each resource - number of requests, elapsed times, and queue lengths.  We also need an accurate timestamp with each of these measurements, to know when it occurred, and the period of time it covered.&lt;br /&gt;&lt;br /&gt;All of this can be relatively easily achieved with SAR - the System Activity Reporter.  It will collect and record a wide range of measurements on system resources, and the time when they occurred. &lt;br /&gt;&lt;br /&gt;Behind the scenes SAR uses SADC - the System Activity Data Collector.  And in fact it is SADC we are more concerned with than SAR.  SAR simply extracts and formats the data you want from an SADC data set.  We can run SADC to collect all the data it can about the system, at the frequency we want it collected (measurement period) and for the duration we want (how long until).  SADC has a very simple set of options to it:&lt;blockquote&gt;&lt;/blockquote&gt;&lt;div style="text-align: left;"&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: courier new;"&gt;/usr/lib/sa/sadc t n file &lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;blockquote&gt;&lt;/blockquote&gt;where&lt;br /&gt;&lt;ul&gt;&lt;li&gt;t is the time duration between measurements i.e. the frequency&lt;/li&gt;&lt;li&gt;n is the number of measurements to make in total&lt;/li&gt;&lt;li&gt;file is the name of a file to write these measurements to&lt;/li&gt;&lt;/ul&gt;To not have SADC tie up a terminal window, it is often run in the background with the '&amp;amp;' specifier at the end of the command line - control then returns to the terminal window.  This can also be used in command files and scripts.&lt;br /&gt;&lt;br /&gt;The file produced by SACD contains binary data, not text data, and needs SAR to read it and format it as readable text. &lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: courier new;"&gt;sar [-data_option] -f file&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;If no data option is specified it defaults to CPU (-u).  The common data options are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;-u CPU&lt;/li&gt;&lt;li&gt;-d Disk I/O&lt;/li&gt;&lt;li&gt;-r Physical Memory&lt;/li&gt;&lt;li&gt;-p Virtual Memory Paging&lt;/li&gt;&lt;/ul&gt;Remember that the SADC file contains all the data collected, so you can analyse any sub-set of this data any time you want to, at any later point in time.  That is the benefit of collecting all of the possible measurements together at the same time and then saving them to a file.&lt;br /&gt;&lt;br /&gt;You can obviously use SADC to collect data whenever you have performance problems.  But it is far better to run SADC all the time, so that you already have the data to analyse after someone reports a problem.  One way to do this is to put an entry into a "crontab" file for use by the "cron" utility, which runs jobs at regular intervals.  Putting an entry similar to the following into "crontab" for the system administrator for midnight would run SADC for 24 hours collecting measurements every minute.  If this is too frequent just modify the time and number values by the same ratio, so that the two multiplied together give 86400 seconds (24 hours).&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family: courier new;"&gt;/usr/lib/sa/sadc 60 1440 /usr/adm/sa/`uname -n`_`date +%y%m%d`.sad &amp;amp;&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;This will create a file named after the system and the current date in a central directory.  Make sure there is enough space in that location, or change it to somewhere else.  You will need to do housekeeping once in a while to purge and delete old data sets.  This could also be done via another "cron" to delete files older than one or two months.&lt;br /&gt;&lt;br /&gt;I have found SADC and SAR invaluable in a number of circumstances where I have been able to show that it was the underlying hardware that had performance issues, and not the application itself.  And I have used it in the opposite manner too, to show that a poorly performing application was not suffering from running out of system resources or their slow performance.  On most UNIX systems the impact of running SADC is minimal, as the UNIX kernel is already collecting all of these measurements anyway.  The overhead is simply of SADC reading them from memory and writing them to disk at the frequency requested.  And generally this is a very low and negligible overhead, compared to that of the application software itself.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-6168814798206945216?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/6168814798206945216/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=6168814798206945216' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/6168814798206945216'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/6168814798206945216'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/03/system-activity-data-what-is-system-up.html' title='System Activity Data - What is the System up to?'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-2240758389861846084</id><published>2009-02-12T20:29:00.002Z</published><updated>2009-02-12T20:40:24.005Z</updated><title type='text'>Java Data Types and Oracle (2)</title><content type='html'>As I &lt;a href="http://databaseperformance.blogspot.com/2009/01/java-data-types-and-oracle-index-usage.html"&gt;said before&lt;/a&gt;, when using a Java application against an Oracle database, the&lt;span style="font-style: italic;"&gt; java.util.Date &lt;/span&gt;data type gets mapped to a&lt;span style="font-style: italic;"&gt; java.sql.Timestamp&lt;/span&gt; within the layers above JDBC (this application uses an abstraction layer on top of JDBC), and in turn this comes through to Oracle as being of type &lt;span style="font-style: italic;"&gt;TIMESTAMP&lt;/span&gt; when used against a bind variable in a SQL statement.  When used within a WHERE clause, rather than truncate the &lt;span style="font-style: italic;"&gt;TIMESTAMP&lt;/span&gt; to a &lt;span style="font-style: italic;"&gt;DATE&lt;/span&gt; by losing the fractions of a second part, Oracle does the opposite and converts each &lt;span style="font-style: italic;"&gt;DATE&lt;/span&gt; value read from the database to a &lt;span style="font-style: italic;"&gt;TIMESTAMP&lt;/span&gt; value by adding on a zero sub-second part.  The result of this is that any index on such a &lt;span style="font-style: italic;"&gt;DATE&lt;/span&gt; column is not used by the optimizer in the execution plan for the query, because of the data type conversion.&lt;br /&gt;&lt;br /&gt;Oracle themselves acknowledge that this is an issue in their &lt;a href="http://www.oracle.com/technology/tech/java/sqlj_jdbc/htdocs/jdbc_faq.html#08_01"&gt;JDBC FAQ&lt;/a&gt; and state that in the Oracle 11g JDBC  drivers the mapping has reverted back to &lt;span style="font-style: italic;"&gt;java.sql.Timestamp&lt;/span&gt; and Oracle's &lt;span style="font-style: italic;"&gt;DATE&lt;/span&gt; being equivalent rather than different.  In the meantime, those of us in the real world have to deal with real applications using the Oracle 10g drivers which map&lt;span style="font-style: italic;"&gt; java.sql.Timestamp&lt;/span&gt; to Oracle's &lt;span style="font-style: italic;"&gt;TIMESTAMP&lt;/span&gt; data type instead.&lt;br /&gt;&lt;br /&gt;As I described before, we came up with a short term fix of creating a different index on the table being queried, with the &lt;span style="font-style: italic;"&gt;DATE&lt;/span&gt; column as the last column in the index.  This did have a major performance benefit (15 hours of the last run down to 2.5 hours with the new index), but was still not the optimal solution.  Oracle was still doing an &lt;span style="font-style: italic;"&gt;INDEX SKIP SCAN&lt;/span&gt; rather than an &lt;span style="font-style: italic;"&gt;INDEX UNIQUE SCAN&lt;/span&gt;, and so was reading many more index entries than it needed to i.e. more Buffer Gets than it needed to.&lt;br /&gt;&lt;br /&gt;I put together a Java program using JDBC to test the effects of using different Java data types (Classes really) and different JDBC settings.  The simplest fix is to use the "V8Compatible" property as mentioned in the Oracle FAQ linked to earlier.  You create the Property, set the "V8Compatible" value to "true" and then make your normal JDBC connection to Oracle (I've removed all error handling code in the following):&lt;br /&gt;&lt;blockquote&gt;// Create and set V8Compatible property&lt;br /&gt;   Properties oraProperties = new Properties () ;                  // NEW&lt;br /&gt;   oraProperties.setProperty("oracle.jdbc.V8Compatible", "true") ; // NEW&lt;br /&gt;&lt;br /&gt;// Standard JDBC connection code&lt;br /&gt;   OracleDataSource ods = new OracleDataSource();&lt;br /&gt;   ods.setURL(connect_string);&lt;br /&gt;   ods.setUser(username);&lt;br /&gt;   ods.setPassword(password);&lt;br /&gt;// Set properties of the connection&lt;br /&gt;   ods.setConnectionProperties (connectionProperties) ;            // NEW&lt;br /&gt; &lt;br /&gt;   Connection dbcon = ods.getConnection () ;&lt;br /&gt;&lt;/blockquote&gt;With no other changes than the 3 NEW lines indicated, the application's&lt;span style="font-style: italic;"&gt; java.util.Date&lt;/span&gt; value now came through as an Oracle &lt;span style="font-style: italic;"&gt;DATE&lt;/span&gt; after going through the intermediate mapping layers, and the original index was now correctly used in a &lt;span style="font-style: italic;"&gt;UNIQUE SCAN&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;This is clearly the easiest and most direct solution, and does what we want - all Java Date / Time values now come through mapped as Oracle &lt;span style="font-style: italic;"&gt;DATE&lt;/span&gt; values, rather than &lt;span style="font-style: italic;"&gt;TIMESTAMP&lt;/span&gt; values.  Although the "V8Compatible" property is deprecated in Oracle 11g, this will have no effect as the behaviour of the Oracle 11g JDBC drivers has changed to what we want, and it can ignore the unneeded property setting.&lt;br /&gt;&lt;br /&gt;The other possible solution, which is being investigated by those who know more about the Java abstraction layer API being used by the application than I do, is to get this abstraction layer to wrap all &lt;span style="font-style: italic;"&gt;java.util.Date&lt;/span&gt; objects within&lt;span style="font-style: italic;"&gt; oracle.sql.DATE&lt;/span&gt; objects.  Thus the JDBC connection is only ever given &lt;span style="font-style: italic;"&gt;oracle.sql.DATE&lt;/span&gt; values, which are direct equivalents of &lt;span style="font-style: italic;"&gt;DATE&lt;/span&gt; values in the database.  So again, any index would be used directly, and no data type conversions would take place within the Oracle database server.&lt;br /&gt;&lt;br /&gt;However, the details of this look like they would still involve mapping a &lt;span style="font-style: italic;"&gt;java.util.Date&lt;/span&gt; object to a &lt;span style="font-style: italic;"&gt;java.sql.Timestamp&lt;/span&gt; first in the abstraction layer, and then wrapping this within an &lt;span style="font-style: italic;"&gt;oracle.sql.DATE&lt;/span&gt; object.  This is because &lt;span style="font-style: italic;"&gt;oracle.sql.DATE&lt;/span&gt; has a constructor that takes a &lt;span style="font-style: italic;"&gt;java.sql.Timestamp&lt;/span&gt; argument, allowing easy conversion.  But it does not have a constructor that takes a &lt;span style="font-style: italic;"&gt;java.util.Date&lt;/span&gt; argument, or any superclass of it.  &lt;span style="font-style: italic;"&gt;java.sql.Timestamp&lt;/span&gt; is a subclass of&lt;span style="font-style: italic;"&gt; java.util.Date&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;So for now we have one guaranteed long term fix with the "V8Compatible" property, and another possible fix by changing the mapping within the abstraction layer.  So now it is up to the people that own the Java application code to deliver their preferred change, and then we can remove the extra index that we created as a short term fix.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-2240758389861846084?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/2240758389861846084/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=2240758389861846084' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/2240758389861846084'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/2240758389861846084'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/02/java-data-types-and-oracle-2.html' title='Java Data Types and Oracle (2)'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-8885563513806567296</id><published>2009-01-29T20:41:00.003Z</published><updated>2009-01-29T20:52:05.994Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='java'/><category scheme='http://www.blogger.com/atom/ns#' term='index'/><category scheme='http://www.blogger.com/atom/ns#' term='date'/><title type='text'>Java Data Types and Oracle Index Usage</title><content type='html'>I'm not a great fan of Java.  I'm not saying I hate it or anything, just that I've never exactly fallen in love with it.  While it offers some potential advantages as an Object Oriented Language over others, it also suffers from a lot of complexity in trying to get it to do anything useful that you want it to do.  One way or another I keep running into issues with Java, of one form or another.&lt;br /&gt;&lt;br /&gt;I've been working with a client who has a Java based application, which through various layers of Classes and Objects ends up issuing SQL statements against Oracle 10g through JDBC.  Up to now the client has been happy with the application, in spite of some of the hoops they have had to jump through to get it to work.  Finding data abstraction layers that work together and can be used in production systems has been an issue for them - Open Source legal liabilities etc.  But one way or another they have managed to assemble a set of Java based components that they have integrated together into their application.&lt;br /&gt;&lt;br /&gt;They have been live with a new module for 3 months now, and reported that elapsed times on the module were gradually getting longer and longer.  It is a batch job that takes transactions posted into the system today, and updates various counts and totals in other data sets.  Elapsed times had gone from an initial 2 hours to 10 hours only 2 months later, which was far too long.&lt;br /&gt;&lt;br /&gt;One of the first weird things was that the client could not believe that their application ran so slow, because they had done lots of testing before going live with it.  And their testing had used similar data volumes to their production environment.  More on that later.&lt;br /&gt;&lt;br /&gt;Looking at some AWR reports I was quickly able to establish that the elapsed time was down to one UPDATE statement.  The client knew this too, and could not tell why the UPDATE was so slow.  It was using bind variables throughout, with a very simple WHERE clause.  There were no sub-queries or joins, it was using the main primary index on the table, and was not doing any disk I/O (physical reads).&lt;br /&gt;&lt;br /&gt;When they next ran that module of the application, I monitored V$SQL until I saw that UPDATE, and then did a DBMS_XPLAN.DISPLAY_CURSOR on it using its SQL_ID and CHILD_NUMBER.  The execution plan I got did indeed saying it was using an index, but a SKIP SCAN on it.  This means that the index is being used inefficiently, and not as efficiently as it could be.  Something was stopping Oracle using all of the index - it has 9 columns in it.&lt;br /&gt;&lt;br /&gt;Further examination showed that the WHERE clause in the UPDATE did specify all columns in the unique index it was using.  So it should have been doing a UNIQUE SCAN.  Checking the execution plan again, I spotted that there was a reference to one column within a call to an INTERNAL_FUNCTION.  This got me thinking.  The column in question was a DATE column, and was the second column in the 9 column unique index.&lt;br /&gt;&lt;br /&gt;Ah-ha.  Something is wrong with the DATE value provided by the application at run time, and Oracle cannot use it directly to compare to the DATE values in the database.  Instead Oracle is converting the value in each record in the database to something else, and then comparing it to the value in the query.  With the net effect that it is not using all of the index.  Thus, instead of doing a UNIQUE SCAN to find one single record, it was instead finding all records with the same value for the first column, and then comparing each of these to the converted DATE value.  And in some cases one value of the first column could have many hundreds or thousands of matching data rows.&lt;br /&gt;&lt;br /&gt;Why was the query gradually getting slower each day and week?  Because the DATE column referred to the date of the business transaction itself, and sometimes they had to process late transactions from several days ago - hence the UPDATE of an existing record rather than an INSERT.  And over time, more and more records were being added for different DATE values.  So the UPDATEs were having to retrieve and check more records each week that went by.&lt;br /&gt;&lt;br /&gt;What about the testing done prior to production?  They had assumed that there would be very few late, back dated transactions, and had only tested new transactions for the past day.  Thus all the processing involved INSERTs only into the table.  Any late transactions in the test data were negligible in volume.  However, in production up to 50% of the transactions were late.  And the UPDATE was taking 100 times longer to execute than the INSERT.&lt;br /&gt;&lt;br /&gt;What caused this?  Digging further I established that Oracle saw the value it was getting was of type TIMESTAMP, which has sub-second values in it.  And rather than lose precision by truncating its value, it was instead converting the DATE value in each data row to a TIMESTAMP value before comparing.&lt;br /&gt;&lt;br /&gt;And some further digging shows that this is a known issue with Oracle and Java applications.  See their own FAQ at:&lt;br /&gt;&lt;a href="http://www.oracle.com/technology/tech/java/sqlj_jdbc/htdocs/jdbc_faq.html#08_01" target="_blank"&gt;http://www.oracle.com/technology/tech/java/sqlj_jdbc/htdocs/jdbc_faq.html#08_01 &lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Most of the Java Date types (Timestamp or Util.Date) are considered by Oracle to be Timestamps, and the JDBC layer passes such values through as Timestamp.  Hence the conversion by Oracle, and the lack of use of the index.  What is clear from reading the documentation is that Oracle got this mapping wrong.  Instead of mapping the Oracle DATE to only the Java sql.Date type, they should also have allowed a mapping from Java Timestamp to Oracle DATE.  Which is why they have changed the behaviour in the Oracle 11.1 JDBC driver.  With this correct mapping, no conversion is done, the Java supplied query value is used directly, and the index is used properly for a UNIQUE SCAN.&lt;br /&gt;&lt;br /&gt;A solution to this is tricky, as the customer is on Oracle 10g, and not planning on moving to 11g at the moment.  Why?  Well if it was simple there would already be a solution published by Oracle and people would be using it.  And there isn't - see the FAQ referenced before.  The database is not new, and the Java module is something new added on to an existing application suite.  So switching all columns in the database from DATE to TIMESTAMP is not an option.  Using sql.Date in the Java application is not an option, because it has a zero value for its time part i.e. no time part.  So something else needs to be done for the proper solution.&lt;br /&gt;&lt;br /&gt;One option is to change the SQL being used in the UPDATE and in the WHERE clause instead of:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;date_column = :bind_variable&lt;/li&gt;&lt;/ul&gt;do&lt;br /&gt;&lt;ul&gt;&lt;li&gt;date_column = cast (:bind_variable as date)&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;This way the value will always be a date, and the UPDATE will use the full index.  However, the customer is worried about other such queries that may be lurking within their application code.  They want a generic solution that just works for all DATE columns.  Which means a change to the Java data type mapping somewhere, somehow.&lt;br /&gt;&lt;br /&gt;While the customer works out how to implement their own change to their Java code (at which abstraction layer of all of the layers involved in mapping an application Object down to something that calls JDBC do they make a change?) they needed a quick fix to reduce the elapsed time of this job.  The solution?  Create an index with the same columns, but with the DATE column last in it.  Yes, it takes time to create the index, and some disk space is used up.  But the UPDATE now runs a lot faster, because it uses more of the index structure with the other 8 columns in it, and ends up with far fewer data rows to check against the DATE value.&lt;br /&gt;&lt;br /&gt;The net result?  The 10 hour job has come down to 2.5 hours, putting the customer back inside their 8 hour window.  And it will probably remain at similar levels for at least the next month or so.  Giving the customer time to work out how to fix their Java data type mappings.&lt;br /&gt;&lt;br /&gt;As I said at the beginning, using Java often can cause all kinds of other problems elsewhere.  Even when you are using it as the Java documentation tells you to.  And isn't it amazing how the best tests in the world always seem to miss something that turns out to be more important and relevant than the other tests they concentrated on?  It seems that the Testing Team never actually took the time and effort to establish exactly what mix of business transactions would be processed by the application.  They just made some assumptions about what it would be.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-8885563513806567296?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/8885563513806567296/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=8885563513806567296' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8885563513806567296'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8885563513806567296'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/01/java-data-types-and-oracle-index-usage.html' title='Java Data Types and Oracle Index Usage'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-3939004807343377186</id><published>2009-01-26T19:33:00.002Z</published><updated>2009-01-26T19:37:48.604Z</updated><title type='text'>Queuing Theory &amp; Resource Utilisation</title><content type='html'>While Queuing Theory can be quite academic and mathematical at times, it does include a number of core rules or laws about how systems and their components behave under higher levels of utilization and the effect on the length of queues of pending requests.&lt;br /&gt;&lt;br /&gt;One of the conclusions from this is that you cannot have high utilization of any resource without some level of queuing occurring. It may not seem obvious, but it is true. Essentially the queue of pending requests or jobs is needed to keep the resource busy most of the time. If the queue was ever empty then the utilization of the resource would be very low. High utilization is only achieved by having a queue of waiting requests, so that there is always a next thing to do when the current thing completes.&lt;br /&gt;&lt;br /&gt;You could theoretically also achieve 100% CPU utilization by only having a few very active processes and no queues. So on a 4 CPU system you could achieve 100% CPU utilization with only 4 processes, each always busy and executing instructions. In this scenario there is no queuing for the CPUs, as there are only 4 processes, and the efficiency of the system is very good.&lt;br /&gt;&lt;br /&gt;However, such a scenario is very, very unlikely. The processes could never block for anything and would always be executing CPU instructions. Which means that they would not be doing any disk I/Os, or communicating with each other by some kind of messages, or using locks or latches on shared structures. Which is the complete opposite of the Oracle Database Architecture.&lt;br /&gt;&lt;br /&gt;An Oracle database server will have many processes running on it - one shadow server process per connected session (unless you are using the Shared Server configuration) - and they will use shared memory, and locks and latches to control data consistency, and perform lots of disk I/Os.&lt;br /&gt;&lt;br /&gt;So an Oracle database server does conform to the general model in Queuing Theory of having lots of separate clients (the shadow servers) making requests on the resources in the system. And as a result, it does conform to the golden rule of high resource utilisation equals queues of pending requests.&lt;br /&gt;&lt;br /&gt;As a result, 100% CPU utilization is very bad, and is symptomatic of very large queues of waiting processes. Queuing Theory also shows that above 50% utilization of a resource, there is always a request in the queue more often than not. Note that this is ‘on average’ - sometimes the queue can be empty and sometimes it can have several requests in it - but on average the number of waiting requests will be more than zero.&lt;br /&gt;&lt;br /&gt;A general rule of thumb is to get worried at 80% utilization, as the number of concurrent requests will average something around four, and rises exponentially above this.  An explanation of some of this and a nice graph of queue length versus utilization is available in this Microsoft article on&lt;a href="http://technet.microsoft.com/en-gb/library/cc181450.aspx"&gt; Modeling Principles for Sizing&lt;/a&gt;. I know it is a Microsoft article, but it does summarize the effects of queuing well, and has the nice graph in it.&lt;br /&gt;&lt;a href="http://technet.microsoft.com/en-gb/library/cc181450.aspx" target="_blank"&gt;&lt;/a&gt;&lt;br /&gt;These queues can apply to all aspects of a computer system - CPU, Disk, Network and Memory. To drive any one of these at 80% utilisation or above means that you have queues of pending requests, which are needed to keep the utilisation that high.&lt;br /&gt;&lt;br /&gt;The net effect of such high utilisation is an increase in response time. The total elapsed time for a given request is now the wait time in the queue plus the service time of the request itself. When the queue is 4 long on average, then the response time is actually 5 times the service time e.g. you might spend 40 ms waiting to perform a disk I/O of 10 ms itself. So high utilisation and high queues has a direct effect on the elapsed time of individual transactions.&lt;br /&gt;&lt;br /&gt;The formula for the Total Number of jobs in the system (N) in proportion to Utilisation (U) is:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;N = U / (1 - U)&lt;/li&gt;&lt;/ul&gt;Note that N is not the same as Q - the number of requests waiting in the queue.  N is both the number in the Queue plus any requests currently being serviced - the total within the system.&lt;br /&gt;&lt;br /&gt;And the formula for Response Time is:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;R = Service Time / (1 - U).&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;So higher utilisation directly leads to longer queues, and larger response times.&lt;br /&gt;&lt;br /&gt;Whether this is acceptable to your situation depends whether your goal is total throughput irrespective of transaction response time, or your goal is individual transaction response time. In other words, is it an online system, or more of a batch system processing relatively large units of work.&lt;br /&gt;&lt;br /&gt;While Queuing Theory can seem theoretical at times, to me it reinforces the message that when you scale up the load on any system there will always be a bottleneck. And that bottleneck will reach high utilisation as it nears its capacity, and large queues will form in front of it. Identifying where the queues are in a system, what the bottleneck is, and what can be done to fix it - reduce service time or increase capacity - are key to performance tuning. And an appreciation of Queuing Theory has helped me get a deeper understanding of this area.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-3939004807343377186?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/3939004807343377186/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=3939004807343377186' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/3939004807343377186'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/3939004807343377186'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2009/01/queuing-theory-resource-utilisation.html' title='Queuing Theory &amp; Resource Utilisation'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-4352484398845609898</id><published>2008-10-21T15:26:00.003+01:00</published><updated>2008-10-21T16:16:35.518+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statistics'/><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='oracle'/><title type='text'>Staging Tables and Statistics in Oracle</title><content type='html'>As we should all know, Oracle 10g only has a cost based optimiser for executing SQL statements, and all of its decisions are based on the statistics it has about the tables in the database.  Generally this works well, and there is also a default job that runs each day to keep these statistics up to date.  Where there can be a problem though is with tables that are used to stage intermediate results during a series of processing steps.&lt;br /&gt;&lt;br /&gt;Such tables often start off empty, are loaded with data produced by the first steps in processing, and then read and processed by the subsequent steps.  Eventually the data will be deleted from the staging table as its processing is finished.  Because the statistics on these tables indicate that they are empty or near empty, Oracle will simply scan them rather than use any index.  When the data set being processed is small, this mistake can go unnoticed, with little effect on the overall performance and elapsed time.  However, as the data set being processed gets larger and larger so the effects of the wrong execution plan become larger too, and the elapsed time grows and grows.&lt;br /&gt;&lt;br /&gt;The solution of course is to update the statistics on such a staging table after the data has been loaded into it, so that Oracle can make the right decisions.  There are a number of ways to do this, which I'll get onto, and you can also find out elsewhere on the web easily enough.&lt;br /&gt;&lt;br /&gt;The other thing to be aware of in this situation is temporary tables - global temporary tables as Oracle calls them - where the data in the temporary table is session private and is deleted when the session disconnects.  These temporary tables are also often used to stage data that is being processing in a series of steps, and have certain potential advantages - data is private to each session, and the data is automatically deleted when the session disconnects.&lt;br /&gt;&lt;br /&gt;However, there are issues with collecting statistics on such temporary tables, which you must be aware of.  The net result is that it is basically impossible to &lt;span style="font-weight: bold;"&gt;collect&lt;/span&gt; statistics on such temporary tables.  You have to use another mechanism to set the statistics on a temporary table.&lt;br /&gt;&lt;br /&gt;For normal data tables you can use the GATHER_TABLE_STATS procedure in the DBMS_STATS package to gather up to date statistics on the table after the data has been loaded into it.  By leaving other options to their default values, Oracle should also update the statistics on the indexes too.&lt;br /&gt;&lt;br /&gt;If you want to avoid the elapsed time associated with the full GATHER_TABLE_STATS you can instead run DELETE_TABLE_STATS.  In such situtations when the Oracle optimiser has no statistics on a table it samples the table itself to make some estimates for those statistics.  This is termed dynamic sampling.  In principle this ensures that the optimiser has some statistics on the table that reflect its size and data distribution.  Again this will not work as intended for a temporary table.&lt;br /&gt;&lt;br /&gt;And finally you can use SET_TABLE_STATS to explicitly set statistics such as the row count of the table.  This avoids the gathering of statistics associated with either full statistics or dynamic sampling, and works for temporary tables too. &lt;br /&gt;&lt;br /&gt;Although this may not seem earth shattering, having &lt;span style="font-weight: bold;"&gt;correct and valid&lt;/span&gt; statistics is critical to the way Oracle 10g works.  And poor performance and wrong behaviour from Oracle when executing queries can often be due to poor or incorrect statistics.  Often developers can forget about this, assuming that the overnight job will keep the statistics up to date - a database administrator's responsibility.  But these temporary staging tables are something that change so frequently that they can never have up to date or correct statistics on them, no matter what the database administrators do.  Which in turn means that they must be dealt with explicitly within the application code, to somehow set the statistics to suitable values using one of the mechanisms just described.  Only then will Oracle access the data in these tables in an optimal way.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-4352484398845609898?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/4352484398845609898/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=4352484398845609898' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/4352484398845609898'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/4352484398845609898'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2008/10/staging-tables-and-statistics-in-oracle.html' title='Staging Tables and Statistics in Oracle'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-8849627113902403804</id><published>2008-09-22T15:19:00.004+01:00</published><updated>2008-09-22T15:47:41.399+01:00</updated><title type='text'>Contention and a bottleneck - or not?</title><content type='html'>I recently had the following situation: during a heavy data load by multiple, concurrent jobs the customer said that they saw contention, and provided me with a STATSPACK report from this period.  Initially I was confused:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;On the one hand, the top wait event was a very high value for a latch wait on the cache buffers chain.&lt;/li&gt;&lt;li&gt;On the other hand, the CPU utilisation during this whole period was only around 10%.&lt;/li&gt;&lt;/ul&gt;Normally I would say that this system had capacity to spare - 90% CPU unused - but the wait figures indicated that something was wrong somewhere.  Digging further into STATSPACK I verified that the top SQL statement was the INSERT into the one table, and that the number of waits correlated with the number of sleeps on the cache buffer chain latch.&lt;br /&gt;&lt;br /&gt;Now this told me that these were actual sleeps, where the session failed to get the latch when it tries in a busy spin loop.  And these sleeps are relatively expensive and time consuming.  Hence the high value for the wait time experienced.&lt;br /&gt;&lt;br /&gt;Digging further I saw that there were buffer busy waits too, and managed to establish that these were all on the same index on the table.  Now this table has six indexes - i1 to i6 - and the buffer busy waits were all on index i6.  And this is on a column called last_update_time.  So clearly this was a sequential, monotonically increasing series of values, and would all end up in the final leaf block of the index.  So we had a classic case of records with the same value, or very close values, being inserted, and causing contention on the index block where those same values would be stored.  And the lock on the index block during the block split and contents copying, would be the cause of the sleeps on the cache buffer chain latch for that block.&lt;br /&gt;&lt;br /&gt;I also did some checking on the index structure itself, by analysing the index and then looking at the statistics.  This showed that the percentage of space actually used in each block averaged just over 50%.  So Oracle was actually doing 50:50 index block splits when the last index leaf block became full, rather than 90:10.  This is probably due to the update times not being perfectly in sequence from the different data loading processes.  But this also meant that this new block was already 50% full and so would be ready to split even sooner than if it were only 10% full.&lt;br /&gt;&lt;br /&gt;The solution?  Well, there were a couple of possibilities.&lt;br /&gt;&lt;br /&gt;One was to use a REVERSE key index.  However, these are only useful for equality tests and not ranges.  And date columns tend to be used for range queries (from and to).&lt;br /&gt;&lt;br /&gt;Another option would be to partition the index by another column, which would have different values in each INSERT from each of the loading processes.  This basically creates a set of separate smaller indexes, and so increases the number of last leaf blocks across these sub-indexes, and so reduces contention.  However, partitioning is an extra cost option, and needs extra maintenance over the number of partitions and their configuration.&lt;br /&gt;&lt;br /&gt;The recommended option is to modify the index and add the extra column after the last_update_time, mentioned just now in partitioning.  This would mean the index was still useful for equality and range conditions, and would spread out the same last_update_time value between multiple index leaf blocks having different values for the second column - in this case essentially an account identifier.  Now with this extra layer of data values in the index it will distribute the new values over more index leaf blocks, and so reduce contention and improve throughput times.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-8849627113902403804?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/8849627113902403804/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=8849627113902403804' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8849627113902403804'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8849627113902403804'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2008/09/contention-and-bottleneck-or-not.html' title='Contention and a bottleneck - or not?'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-4872507539794203638</id><published>2008-03-04T20:40:00.002Z</published><updated>2008-03-04T20:47:09.094Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='holistic'/><category scheme='http://www.blogger.com/atom/ns#' term='measure'/><title type='text'>Gathering Performance Statistics - What and When</title><content type='html'>In  performance  analysis  and  tuning  "Measurement  is  Everything".   Or  to  put  it  another  way  -  "What  you  do  not  measure  you  cannot  control".   So  we  want  to  measure  our  system  and  what  is  happening  on  it.   Doing  this  all  the  time  provides  a  baseline  to  which  we  can  compare  should  performance  suddenly  change.   Which  leaves  the  questions  of:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;What  should  we  be  measuring?&lt;/li&gt;&lt;li&gt;How  often  should  we  be  measuring  it?&lt;/li&gt;&lt;/ul&gt;In  principle  the  more  we  measure  the  better,  as  we  have  a  finer  grained  level  of  data  to  analyse,  and  nothing  is  lost.   We  can  always  summarise  this  low  level  data  in  various  ways  during  the  initial  analysis  steps.&lt;br /&gt;&lt;br /&gt;However,  collecting  too  much  data  too  often  can  itself  end  up  being  a  significant  workload  on  the  system.   "Nothing  comes  for  free".   So  we  may  need  to  restrict  how  much  data  we  collect,  and  how  often  we  collect  it,  to  minimise  the  impact  on  the  system.   It  would  also  be  good  if  within  the  data  we  collected  we  were  able  to  identify  the  workload  of  the  measurement  collector  itself. &lt;br /&gt;&lt;br /&gt;In  terms  of  frequency  I  believe  that  no  more  than  every  minute  is  generally  needed.   This  provides  an  adequate  level  of  detail  for  profiling  and  establishing  a  baseline,  and  provides  60  data  points  per  hour.   More  frequent  measurements  provide  more  data  points,  but  not  necessarily  anything  more  useful  for  analysis  purposes.   And  the  measurement  collector  itself  may  become  a  significant  workload  on  the  system.   Less  frequent  measurements  quickly  reduce  you  to  20  or  less  data  points  per  hour,  which  I  believe  is  too  few.&lt;br /&gt;&lt;br /&gt;Of  course  there  are  some  caveats  and  assumptions  to  this  'per  minute'  rule  of  thumb  I  use:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The  workloads  on  the  system  have  a  lifespan  of  significantly  longer  than  one  minute.   Thus  we  will  have  multiple  data  points  covering  the  lifespan  of  each  workload  on  the  system.   If  the  workloads  are  shorter  than  a  minute  to  arrive  and  complete,  then  you  should  investigate  a  higher  frequency  of  measurement,  subject  to  the  load  of  collecting  the  measurements  themselves.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Collecting  the  measurements  is  relatively  quick  and  a  much  lighter  load  on  the  system  than  anything  else  i.e.  negligible.   The  collection  should  be  a  read  only  activity  and  then  saving  the  measurements  rather  than  a  complicated  set  of  processing  to  arrive  at  the  measurements.   If  this  is  not  true,  and  the  collection  involves  significant  system  resources,  then  it  should  be  done  at  a  lower  frequency.   Oracle  STATSPACK  is  an  example  of  this,  while  Oracle's  AWR  is  a  much  lighter  weight  alternative  in  Oracle  10g  onwards.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;The  volume  of  data  is  not  too  great,  and  the  measurements  change  in  value  between  samples.   If  these  are  not  true  then  the  frequency  should  be  reduced  and  measurements  collected  less  often.   Or  break  the  measurements  down  into  different  sets  -  some  collected  more  frequently  than  others.&lt;/li&gt;&lt;/ul&gt;In  terms  of  what  data  to  collect,  I  believe  you  should  collect  measurements  from  all  levels  of  the  stack.   A  computer  system  is  not  just  one  thing  (the  application),  but  a  stack  of  things  all  layered  one  on  top  of  the  other.   The  main  layers  of  the  stack  include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Hardware  -  Processor  (CPU),  Memory,  Disk,  Network&lt;/li&gt;&lt;li&gt;Operating  System  -  Abstracts  hardware  to  standard  services  and  interfaces&lt;/li&gt;&lt;li&gt;Database  software  -  Implements  persistant  transaction  oriented  data  store&lt;/li&gt;&lt;li&gt;Middleware  -  Provides  various  facilities  such  as  application  servers,  containers,  connection  pools,  etc&lt;/li&gt;&lt;li&gt;Application  -  Contains  the  business  logic&lt;/li&gt;&lt;li&gt;User  Interface  -  Renders  graphical  user  interface  and  interacts  with  application.   May  be  separate  or  integrated  with  Application  itself.&lt;/li&gt;&lt;/ul&gt;By  measuring  performance  at  all  levels  of  the  stack  you  gain  a  number  of  benefits:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Measuring  performance  at  the  Application  /  User  level  gives  you  a  meaningful  and  true  business  measure  of  performance  e.g.  orders  processed&lt;/li&gt;&lt;li&gt;Measuring  performance  at  other  levels  of  the  stack  lets  you  see  if  any  individual  component  or  resource  is  overloaded&lt;/li&gt;&lt;li&gt;Measuring  performance  at  all  levels  lets  you  correlate  changes  in  activity  at  one  level  with  changes  at  another  level.   This  correlation  helps  you  identify  cause  and  effect  from  one  level  to  another.   Though  you  may  need  to  investigate  further  to  prove  that  you  have  a  true  cause  and  effect  link.&lt;/li&gt;&lt;/ul&gt;Then  at  each  layer  of  the  stack,  you  should  collect  as  comprehensive  a  set  of  performance  measurements  as  you  can,  subject  to  the  earlier  caveats  about  not  overloading  the  system.   This  is  because  of  the  adage  that  if  you  do  not  collect  it,  you  cannot  analyse  it.&lt;br /&gt;&lt;br /&gt;One  of  the  worst  situations  is  to  have  a  performance  problem  in  the  future,  and  find  that  a  key  measurement  of  performance  data  that  would  indicate  the  nature  of  the  problem  is  not  being  collected.   And  although  you  might  be  able  to  add  this  extra  measurement  in  and  collect  it  from  now  on,  you  do  not  have  it  in  your  history  to  compare  to  and  establish  how  much  it  has  changed,  if  at  all.&lt;br /&gt;&lt;br /&gt;With  all  this  data  being  collected,  it  will  soon  amass  to  a  large  volume  on  disk,  and  needs  to  be  managed.   The  basic  principles  are:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Keep  all  collected  measurements  for  a  fixed  period,  to  allow  post  analysis  of  reported  problems&lt;/li&gt;&lt;li&gt;Beyond  this  period  you  can  combine  summarising  the  data  in  various  ways  to  reduce  its  volume,  and   deleting  it&lt;/li&gt;&lt;li&gt;Freeze  and  keep  a  period  as  a  baseline  for  reference  purposes,  and  comparison  to  any  abnormal  behaviour&lt;/li&gt;&lt;li&gt;Multiple  baselines  can  be  established  and  kept  for  different  workload  profiles&lt;/li&gt;&lt;/ul&gt;In  summary:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Collect  as  much  as  you  can,  at  a  reasonable  frequency.&lt;/li&gt;&lt;li&gt;Breadth  (many  separate  measurements)  is  more  important  than  depth  (frequency  of  collection)&lt;/li&gt;&lt;li&gt;Collect  at  all  levels  of  the  stack  to  allow  a  holistic  analysis  and  identify  overloaded  resources&lt;/li&gt;&lt;li&gt;Manage  the  historical  measurements,  retaining  them  for  a  period  of  time&lt;/li&gt;&lt;li&gt;Representative  periods  can  be  frozen  and  kept  forever,  and  others  deleted  on  a  rolling  basis&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-4872507539794203638?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/4872507539794203638/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=4872507539794203638' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/4872507539794203638'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/4872507539794203638'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2008/03/gathering-performance-statistics-what.html' title='Gathering Performance Statistics - What and When'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-5014250565937969929</id><published>2008-02-18T19:44:00.002Z</published><updated>2008-02-18T19:50:12.441Z</updated><title type='text'>Good SQL goes hand in hand with Good Database Design</title><content type='html'>Or to put it another way - if you find poor SQL in an application, it might be as much to do with poor database and application design than plain, old poor programming.&lt;br /&gt;&lt;br /&gt;Ever since I had to use COBOL briefly many years ago, I've learnt the lesson that often a good design can make the programming easier.  By carefully designing and defining the DATA sections at the top of your COBOL program, you could make the later manipulation of the data itself that much easier. &lt;br /&gt;&lt;br /&gt;Time and again I have seen the results of both practices - good design leading to a good, efficient implementation; and bad or weak design leading to a poor implementation.  Not only can a good design help achieve a good implementation, but a poor design will often hamper the programmer and reduce the options available to them, restricting what choices they have.  And so the result of a poor design is a poor application with poor SQL in it.&lt;br /&gt;&lt;br /&gt;The SQL to access data goes hand in hand with the design of the database containing that data.  You cannot separate the application code from the database design.  Put the data in the right place, and the SQL becomes simple and straightforward.  Put the data in the wrong place, and the SQL becomes more complicated, joining tables together and adding extra constraints and conditions, and possibly needing to use more horrible things such as outer joins and unions.  I'm not saying that these are always unavoidable, but I am saying that the core design of a database for an application should not need them.&lt;br /&gt;&lt;br /&gt;This is why good database design is important, and how it can impact application performance as much as the application code itself. Time spent doing a good database design is repaid by a good, well performing application.  A skimped, quick database design often results in a longer development cycle, probably needing changes to the database design to correct things missed out earlier, and extra effort by the developers to write the SQL that gets them the data they need when they need it.&lt;br /&gt;&lt;br /&gt;I'm saying all this because I have seen systems with poor SQL that was causing poor performance, and it turned out to be due to the poor database design, and not the programmer's ability to write good SQL.  The SQL was in fact as simple and direct as it could be, but it had to go to multiple tables and join them together and apply different restrictions to each table, to get the data it needed.&lt;br /&gt;&lt;br /&gt;In fact, some of the worst impact SQL I have seen on a system was incredibly simple, and was written to continually count the rows in a table to see if any new records had arrived.  Of course it was not that simple - only certain rows in a certain state were to be counted, and these also depended on having matching data in another table, and so on.  The net result was a SELECT that queried 5 separate tables to determine if any data at all was in a 'ready' state.  And because it was a count, it actually needed to touch a lot of data or index records.&lt;br /&gt;&lt;br /&gt;Although the joins were efficient (using indexes), the constraints were very few, so that Oracle was retrieving most of the rows that matched in most of the tables, before rejecting them and counting what was left.  Such a query cannot avoid a lot of row accesses, even if they are logical and not physical I/Os due to being in the buffer cache.  It still takes time to access these rows in the tables and count them all up. &lt;br /&gt;&lt;br /&gt;What made this case worse, was that for various reasons the query also did a self join.  New entries could be sequenced, and so to count a record as being 'ready' it also had to have no predecessors in its sequence which were also 'ready'.  This doubled the amount of work being done by the query, in terms of data records accessed.&lt;br /&gt;&lt;br /&gt;As you can imagine, this made it a slow query.  And this was being run multiple times a second to see if any new work had arrived.  So it became the largest resource consumer in Oracle from the application, even though it was not doing any real work - just counting how many jobs were ready to be processed.&lt;br /&gt;&lt;br /&gt;The solution?  A redesign.  We tried everything we could about using indexes and caching data, and pushing joins down into sub-queries to steer the optimiser one way or another.  But still the query involved a lot of buffer gets, which took time, and it was being run very frequently.  Which only left a redesign as a viable solution.  By moving around where data was stored, and in what form, we were able to eliminate many of the joins, and drastically reduce the number of buffer gets per execution.&lt;br /&gt;&lt;br /&gt;And the biggest win was to turn the design completely on its head.  Instead of counting rows in a table, we had a counter column in another record that was updated every time a new record was inserted.  Although this meant changing the application in more than one place (adding an UPDATE of the counter when a new record was inserted and later when processed), the original query was now reduced to a single record fetch.  The remaining joins were now only joining to a single row, and not to every row in the original table, and the self join disappeared too.&lt;br /&gt;&lt;br /&gt;The result was vastly improved performance - of both the query and the overall system - and in fact it reduced the total resource consumption on the system, so that CPU utilisation when down while throughput went up.&lt;br /&gt;&lt;br /&gt;And all because of a poor initial database design, restricting a programmer to writing a poor piece of SQL, that performed inefficiently and used too many resources.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-5014250565937969929?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/5014250565937969929/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=5014250565937969929' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/5014250565937969929'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/5014250565937969929'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2008/02/good-sql-goes-hand-in-hand-with-good.html' title='Good SQL goes hand in hand with Good Database Design'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-724188674567922889</id><published>2008-01-15T20:41:00.000Z</published><updated>2008-01-15T20:45:18.778Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='contention'/><category scheme='http://www.blogger.com/atom/ns#' term='bottleneck'/><title type='text'>Limits, Contention and Capacity</title><content type='html'>One of the key concepts in performance tuning and analysis is that of the "bottleneck".  At low levels of throughput everything looks okay - as you increase the incoming requests so the output of completed work increases at the same rate.  But as the throughput of the system rises you will eventually reach a point where the output levels off and no longer increases inline with input increases.  At this point some part of the system is saturated and has become a bottleneck - you cannot squeeze any more through it, even though the rest of the system has capacity to spare.&lt;br /&gt;&lt;br /&gt;Tuning a system is often about identifying this bottleneck, and establishing that it is indeed the bottleneck rather than just a slow component in the system.  Fixing the bottleneck involves somehow increasing its capacity.  Bottlenecks are normally considered in terms of the hardware in the computer system, but they can also be caused by the software being run.&lt;br /&gt;&lt;br /&gt;In my view there are only 2 fundamental ways to fix any bottleneck:&lt;br /&gt;1. Increase the capacity of the resource itself i.e. buy more or faster hardware, or change the software algorithm.&lt;br /&gt;2. Reduce the usage of the resource by the application, freeing up capacity.&lt;br /&gt;&lt;br /&gt;Which action you take depends on the nature of the bottleneck.  If there is no contention and it is just a slow resource, then the first option is the most viable one.  The second option is best when there is contention for the resource and a backlog queue of outstanding requests is forming in front of the resource for access to it.  Reducing the use of the resource will reduce the size of the queue, which in turn reduces the time any request has to wait to gain access, and so has a direct impact on performance and throughput.  But, as stated, this only works when the resource is overloaded and there is contention for it, which you have to identify.&lt;br /&gt;&lt;br /&gt;Equally, the first option of replacing the resource with a faster one can work just as well for a contended resource that is the bottleneck.  Being able to process each request quicker will reduce the contention and reduce or eliminate any queue of waiting requests, increasing performance and throughput.  This is why many people often take the simple, brute force approach of upgrading hardware to make a system perform faster.  However, it does not guarantee to remove the contention or the queuing, just to reduce it.  There is the possibility that processing requests faster just leads to even more requests being submitted at a higher rate, and in turn back to having a queue of outstanding requests.  Which is why the application behaviour needs to be considered.&lt;br /&gt;&lt;br /&gt;The second way to fix a bottleneck can actually be approached in two different ways:&lt;br /&gt;1. Optimise and tune the application's configuration from the outside.&lt;br /&gt;2. Change the application itself from the inside.&lt;br /&gt;&lt;br /&gt;The first approach typically involves changing configuration settings, such as increasing the number of records to read at a time to reduce the number of data reads, or creating an extra index on a table.  The second approach typically involves rewriting part of the application to use more efficient algorithms that use less resources, such as reading a set of data into an array in memory once, and doing in memory lookups for validation instead of reads from disk.&lt;br /&gt;&lt;br /&gt;What you do is dependent on the nature of the bottlenecked resource, the application software, and how much money and time you have available.&lt;br /&gt;&lt;br /&gt;Another key point to be aware of, is that fixing one resource simply moves the bottleneck to another one.  The bottleneck never actually goes away, it just moves somewhere else within the system.  Presuming you are able to increase the effective capacity of the resource that is initially the bottleneck then throughput will increase, which is good.  But eventually you will reach a point where the carrying capacity of that resource is now greater than another resource in the system.  And now you have to repeat the analysis of the system again to identify the new bottleneck.  And again work out the best way to either increase its capacity or reduce the number of times it is used.&lt;br /&gt;&lt;br /&gt;Which is why performance tuning is so often an iterative exercise.  You fix the first problem, but the bottleneck just moves somewhere else, and you repeat the analysis again, and again.&lt;br /&gt;&lt;br /&gt;A valid alternative is to calculate in advance how much of each resource is available in the system, and how much can be used if performance targets are to be met, such as a given number of transactions per minute.  This is more of a capacity planning exercise, but is directly useful in such performance tuning situations with poor performance.  We can run the application under a relatively light workload and measure the actual amount of each resource used.  These measurements can then be scaled up to the target workload and compared to the calculated resource requirements from before.  This will relatively easily identify which resources will be saturated and overloaded, and by how much.  This is a much easier approach than the repeated iterative analysis and idenfication of bottlenecks, and also establishes how much of a gap there is between desired performance and actual achieved performance.  If the gap is known to be too wide, then simple tuning may not be the answer, and may be a wasted effort.  Something more drastic and radical may be required.  But you can only know this by doing the capacity planning in advance of deploying the application, and then comparing the two results.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-724188674567922889?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/724188674567922889/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=724188674567922889' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/724188674567922889'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/724188674567922889'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2008/01/limits-contention-and-capacity.html' title='Limits, Contention and Capacity'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-8548288537831343752</id><published>2007-12-07T20:12:00.000Z</published><updated>2007-12-07T20:23:22.557Z</updated><title type='text'>Oracle SQL Optimization and Variables</title><content type='html'>I was going to get around to writing something up about the Oracle Optimizer in the future, and how it can sometimes choose non-optimal plans for what it thinks are perfectly logical reasons.  But suddenly today I find that the &lt;a href="http://optimizermagic.blogspot.com/2007/12/why-are-there-more-cursors-in-11g-for.html"&gt;optimizer development group inside Oracle have done their own post&lt;/a&gt; on this.  So I thought I would point you at it as an excellent description of the issues I was going to discuss, and so that you can see that in 11g Oracle have finally got around to addressing a problem caused by previous enhancements to the optimizer.&lt;br /&gt;&lt;br /&gt;What I'd add is that I see the nature of the problem of these non-optimal SQL execution plans as being one of the consequences of previous attempts by Oracle to improve performance for the majority of SQL queries.  Although most queries have indeed benefited from better and faster execution plans, some have not and have actually got worse between versions of Oracle.&lt;br /&gt;&lt;br /&gt;Originally there was no Cost Based Optimizer (CBO) and only the Rule Based Optimizer, which was unaware of even things like table sizes.  So Oracle added in the CBO and it used some simple data on table sizes and indexes on columns to make better execution decisions.  Over time Oracle added more and more features to the CBO to improve the performance of most SQL statements most of the time, and added extra sets of statistics to make more accurate predictions about query execution behaviour.  Unfortunately, while the vast majority of queries have indeed benefited from these changes to the way the CBO works, there have always been examples of queries that have suffered and actually got worse.  The example given in the post by Oracle is just one.  I know of many others.&lt;br /&gt;&lt;br /&gt;The problem is really due to two issues, now solved by 11g, hopefully.  The shared execution plan in the library cache, means that the same execution will always be used for the same SQL statement since it was first parsed ('&lt;span style="font-style: italic;"&gt;same' &lt;/span&gt;meaning identical text of the SQL statement).  Initially this was okay.  But then Oracle introduced bind variable peeking, to try and optimize the query even further.  By looking at the actual value supplied for a bind variable in a query, and using statistics on the value distribution of the queried column, Oracle could better optimize that particular query.  But now all subsequent executions of the same query, regardless of the value used in the bind variable, would be forced to use the same, original execution plan.  Which may not be optimal for those subsequent query executions.&lt;br /&gt;&lt;br /&gt;There is in fact another, more manual, solution to this problem.  Do not use bind variables in queries you know are sensitive to the query values used, and for which you want distinct execution plans.  By putting the query value itself directly into the SQL statement, Oracle will treat each of them as distinct SQL statements and parse and optimize them individually, each with their own entry and associated execution plan in the library cache.  Unless, of course, you have enabled forced cursor sharing, where Oracle dynamically rewrites SQL statements to remove hard coded values and replace them with references to bind variables.  Which would simply map them all to the same, single SQL statement and execution plan in the library cache.  Something to be aware of.&lt;br /&gt;&lt;br /&gt;I've recommended this solution a few times to customers with data sets that have only a few values and very skewed distributions, and queries that do not strongly constrain on other columns.  The example in Oracle's post is one such case - no other constraints other than on department, which is highly skewed.  These cases are very few and far between, but often they do occur in critical pieces of code e.g. pick the next request to process that is in a '&lt;span style="font-style: italic;"&gt;ready to run&lt;/span&gt;' state.  The longer such a query takes, the more time that is wasted before the application gets on with processing the request itself.  And this can exasperate things when picking the request ends up taking longer than processing the request itself.&lt;br /&gt;&lt;br /&gt;It is nice to see that Oracle have got around to addressing this problem of queries that are sensitive to the constraining values used in the query.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-8548288537831343752?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/8548288537831343752/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=8548288537831343752' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8548288537831343752'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/8548288537831343752'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2007/12/oracle-sql-optimization-and-variables.html' title='Oracle SQL Optimization and Variables'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-5507567176252298763</id><published>2007-12-06T21:10:00.000Z</published><updated>2007-12-06T21:26:14.071Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='performance'/><category scheme='http://www.blogger.com/atom/ns#' term='introduction'/><title type='text'>Some Basics</title><content type='html'>This blog will be about anything to do with the performance of computer systems, especially when running relational database products.  In fact the majority of it will be about the Oracle database on multi-processor UNIX systems (such as Solaris and Linux), and how to measure and monitor the performance of such systems and how to tune them.  There is a lot of information out there already on these topics, and I hope to add to it with a pragmatic approach suggesting real things that you can do about the performance of your systems and the applications running on them.&lt;br /&gt;&lt;br /&gt;Before I get going and dive into deeply technical things I thought I would cover a few basic points.  Don't worry, I don't want to repeat a lot of stuff that is already out there in numerous publications.  I'll just cover some of the things that I feel are most important.  And I do want to get straight what I mean by performance, and how I go about any performance analysis exercise.&lt;br /&gt;&lt;br /&gt;Nowadays we use the term "performance" as if it were a real physical thing - a resource that we can have a quantity of.  We talk of "&lt;span style="font-style: italic;"&gt;enough performance&lt;/span&gt;" and "&lt;span style="font-style: italic;"&gt;needing more performance&lt;/span&gt;".  And in the same way as for physical things, we need to be able to measure the performance of a system.  As the saying goes - you cannot control what you do not measure - which in turn means that if you can't measure it you can't improve it.&lt;br /&gt;&lt;br /&gt;Adequate performance really means that you are able to achieve your goals.  And poor performance means that your goals are not being met.  So all performance related work starts with goals.  Without goals you cannot truly say whether performance is good or bad, or how close you are to achieving what is needed. &lt;br /&gt;&lt;br /&gt;Performance analysis needs to be done in an objective fashion, with well defined goals, which can be measured.  No goals, means no measurements, means no ability to determine where you are.  Everything becomes subjective - "&lt;span style="font-style: italic;"&gt;too slow&lt;/span&gt;" - and impossible to pin down.  Often relative terms are used without a context - "&lt;span style="font-style: italic;"&gt;slower&lt;/span&gt;".  You have to define up front what "&lt;span style="font-style: italic;"&gt;fast enough&lt;/span&gt;" means.  And these goals are best defined in terms of the business requirements the system was deployed to satisfy.  The higher the level you can define the goals at, the closer the result is to something that makes sense to the business and directly benefits it.  A low level performance measure is often not particularily relevant to an overall business process.&lt;br /&gt;&lt;br /&gt;Performance analysis and tuning needs to be done top-down, not bottom-up.  Top down tuning focuses on those activities that have the greatest impact on the performance goal i.e. the business goal of the system.  Bottom-up tuning often focuses on low-level metrics and tries to identify those that are out of bounds and fix them.  Often many "rules of thumb" are used.  However, there is no guarantee that this tuning effort will have any impact on the business goal at all.  In my view all performance tuning actions should be linked back to the expected impact on the business goals of the system.&lt;br /&gt;&lt;br /&gt;When analysing a system there are two ways you can look at the system's performance and the resource usage on it:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;From a capacity perspective of how much of each resource is being used, and identify saturated resources&lt;/li&gt;&lt;li&gt;From a usage perspective of identifying which processes and tasks are consuming the resources&lt;/li&gt;&lt;/ul&gt;Ultimately both provide similar information on the system's resources that are being used.  But I feel it is much more useful to know &lt;span style="font-weight: bold;"&gt;what&lt;/span&gt; is using a resource than to just know that the resource is being used a lot as measured by its utilization.  By identifying what is the highest resource consumer you can work backwards to the part of the application responsible, and in turn identify what it is doing.  This will lead to an understanding at a higher level of what the application is doing, and why, as opposed to simply investigating low level resource utilization measurements.&lt;br /&gt;&lt;br /&gt;Performance itself can only really be measured in 3 ways:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Latency - the elapsed time to perform one unit of work.&lt;/li&gt;&lt;li&gt;Throughput - the amount of work done in a period of time.&lt;/li&gt;&lt;li&gt;Utilization - the amount of a fixed capacity resource that was used.&lt;/li&gt;&lt;/ul&gt;All performance measurements will be one of these types.  They are measuring different characteristics of the system:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Latency measures the quality of the service - fast or slow?&lt;/li&gt;&lt;li&gt;Throughput measures the quantity of the service - how much?&lt;/li&gt;&lt;li&gt;Utilization measures the efficiency of the system when delivering the service - what did it cost?  how much did we use?&lt;/li&gt;&lt;/ul&gt;We have had many rules of thumb about tuning Oracle systems for many years - everyone knows how to measure the "&lt;span style="font-style: italic;"&gt;buffer cache hit ratio&lt;/span&gt;" - but they do not solve all performance problems.  And as time goes on, they are less and less likely to be the cause of any real performance problem, as they will have been configured adequately from the beginning due to recognised best practice.  This is why a top-down structured approach delivers far more benefit in the long term, than a quick and dirty low level analysis with a few rules of thumb applied. &lt;br /&gt;&lt;br /&gt;And luckily Oracle has made significant improvements to the measurements it takes within itself between versions, which help with this top down analysis and identifying the biggest resource consumers.  It too has moved from a bottom up approach of reporting low level utilisation figures, to a top down approach showing the relative proportion of resources consumed by which SQL statements and even today being able to estimate the net impact on performance of certain changes to the database.&lt;br /&gt;&lt;br /&gt;I'll get onto specific tuning issues for both Oracle, Solaris and computers in general in future entries.&lt;br /&gt;&lt;br /&gt;And what about me?  I've been using Oracle since about 1988 (version 5), but really got into it with version 6 and every version since then.  Since starting in application development (C on UNIX) many years ago, I then moved into technical support and more lower level details of how systems actually work and achieve what they do behind the scenes.  And this led onto benchmarking and performance tuning and problem analysis of Oracle based systems, which is what I have continued to work on since about 1990, one way or another.  I keep trying to expand and move into new fields and technologies, but Oracle and performance issues never seem to go away, and I keep coming back to them time and time again.  I have worked for a number of computer hardware manufacturers (including Sequent and Sun) and in a variety of benchmarking, consultancy and delivery roles, often a mix of architect / design work and hands on implementation.  This has given me exposure to some of the largest Oracle based systems in the world in their time, and to the latest versions of Oracle and tuning the system to squeeze out the last drop of performance.  No matter how good the technology gets, people still keep getting performance problems.  And it looks like they won't go away in the future too, which will hopefully keep me busy.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-5507567176252298763?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/5507567176252298763/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=5507567176252298763' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/5507567176252298763'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/5507567176252298763'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2007/12/some-basics.html' title='Some Basics'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7302956329008495023.post-1371226521499370154</id><published>2007-09-26T20:38:00.000+01:00</published><updated>2007-09-26T20:40:28.006+01:00</updated><title type='text'>New Blog</title><content type='html'>This is a new blog for posts about relational database performance on UNIX multi-processor systems - mainly Oracle, but possibly others too.  I've done a lot with Oracle on large systems over the past 15 years (mainly Sequent and Sun servers), and will post various things I continue to learn as a database developer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7302956329008495023-1371226521499370154?l=databaseperformance.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://databaseperformance.blogspot.com/feeds/1371226521499370154/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=7302956329008495023&amp;postID=1371226521499370154' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/1371226521499370154'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7302956329008495023/posts/default/1371226521499370154'/><link rel='alternate' type='text/html' href='http://databaseperformance.blogspot.com/2007/09/new-blog.html' title='New Blog'/><author><name>John Brady</name><uri>http://www.blogger.com/profile/08719865814813032071</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='25' height='32' src='http://2.bp.blogspot.com/_NEp2JRhM8nE/SqURShDkFoI/AAAAAAAAAAM/5edJ6ALbvKU/S220/JB1.jpg'/></author><thr:total>0</thr:total></entry></feed>
