Tuesday 28 June 2011

Histograms are not what I assumed

One of the themes I'll keep coming back to is how things are often not how you have assumed they are, and how there are always opportunities to learn something new about Oracle. I am quite familiar with histograms in Oracle from reading the manuals and white papers, but have always left it to Oracle to decide what columns histograms were needed. This is mainly due to the databases I have worked on not being large or complicated enough to need further analysis so far. But knowing the day would come when I would have to do something I thought I would review what I knew, and in doing so learnt a few new things - or rather realised that some of my assumptions had been incorrect.

Rather than re-read the manual I decided to go to a better source of knowledge rather than just raw information - Jonathan Lewis's Cost Based Oracle Fundamentals. I already knew that Oracle had 2 types of histogram - Frequency and Height Balanced. And I knew that Frequency stored a row count per individual data value, while a Height Balanced stores a row count per range of values (strictly the row count is the same per range, and it is the data range size that varies). And I knew that the Optimizer in Oracle used this extra row count data in the histogram to calculate better estimates for row counts that would match constraints in queries. In reading the chapter on Histograms and working through the examples and doing my own tests I realised that some of my assumptions had been wrong about how Oracle uses these histograms.

Assumption 1 - Oracle uses all the entries in a histogram to calculate its row count estimate

I knew this to be true for a Frequency Histogram, and had assumed something similar for a Height Balanced one. But I was wrong. In a height balanced histogram although Oracle may have up to 254 buckets, it only uses the ones that contain the same end point values to identify "popular values" - those that occur more frequently than the width of a single bucket, and so span more than one bucket in size. For all other values it ignores the histogram completely and uses the column level Density statistic in the same manner as if there were no histogram.

In other words there is a threshold, being the number of rows covered by a single bucket. For data values that occur more often than this and are also recorded in two or more buckets in a height balanced histogram, the Optimizer uses the number of buckets to calculate the estimated row count, along with the number of rows per bucket. For all other values Oracle assumes a uniform distribution and uses the column level Density statistic. This will not be the same as one over the Number of Distinct Values (NDV), but is calculated differently to remove the effect of the rows for those popular values i.e. the Density is lowered to the average of the "unpopular" values.

Assumption 2 - A Frequency Histogram has an entry for all data values occurring in the table

This is true if either the table is relatively small, or you force Oracle to read all the data rows when gathering statistics. If you leave Oracle to use its default sampling method and you have a large table then some values may not be sampled, and they will be missing from the Frequency Histogram produced.

What Oracle does is to calculate and store a Density value for the column that is half that of the least frequent occurring value in the histogram. Values that appear in query constraints that do not appear in the Frequency Histogram are therefore assumed to occur at half the row count of the value with the smallest row count in the histogram. So again, the Optimizer may end up not using a histogram that exists and instead use the Density statistic of the column when executing a particular query.

None of this is new, and I was able to double check this via various other sources. It was just another thing to add to the list of assumptions I've made in the past that turn out not to be true. Even with histograms in place there is a reasonable chance that the Optimizer will actually not be using the histogram itself but instead the Density statistic value of the column. And also changing the value of the Density for the column can have an impact on queries, even when there is a histogram on that column.

Thursday 16 June 2011

Hinting - Lessons Learnt

Following on from my previous post about some SQL that needed to be tuned, I thought I'd summarise some additional important lessons I've learnt about using hints. I already knew quite a lot about hints in Oracle from reading different sources, and to avoid them as much as possible. In spite of this I have still learnt a few new things about using hints in practise. I know that others have said most of this before, but it bears repeating because it really does affect whether any hints do work or not.
  1. Name all your query blocks using the qb_name hint. This clarifies things both in terms of how Oracle itself reports back your execution plan, and in terms of how you specify your hints.
  2. Qualify all object references with query block names e.g. t1@mainq. Again, this is how Oracle reports object references, and it is precise in terms of where the hint should be applied.
  3. Check your hints are being used with the 'outline' option to dbms_xplan.display_cursor. If the hints being listed are the same ones that you used, then all is well. If not, then it is likely that some of your hints are actually being ignored.
  4. Test the hints individually. This follows on from the previous point about proving the hint is recognised and honoured by Oracle. It is possible that a similar hint is being automatically produced by the Optimizer as a result of another hint you are using. In my case it looks like a USE_NL hint was being ignored, but a Nested Loop was produced anyway because of an INDEX hint.
  5. Include all other relevant hints, such as LEADING and UNNEST. Previously I would have assumed that these would be produced by the Optimzer automatically but Jonathan Lewis includes them in his hint sets so they must be relevant.

Thursday 9 June 2011

Customers who won't wait to be helped

[I'm back. I haven't posted for a long while due to a combination of being busy and somewhat losing focus on the blogging side, but I hope to post more frequently in the future. I was going to start with a post on histograms in Oracle but ...]

I work for a software house, whose heavyweight application runs on top of Oracle. Earlier this week I was dealing with a customer who had poor performance on just one SQL query after having updated their statistics. I spent a day analysing the SQL and the data - what was it trying to do and why - and another day trying different solutions to make it use a different execution plan and run faster for them. I then sent the results to the support people dealing with this customer so that the solution could be implemented. Imagine my surprise then when I see today that Jonathan Lewis has a blog post about the same SQL statement, and makes reference to a post on one of the Oracle Technical Forums.

I'm surprised that a customer would post SQL from a third party application on the web like that, and that they would somehow expect a better answer to come back than from the software vendor themselves. I was even more surprised because they were asking for a set of hints to change the execution plan, and I had just come up with such a set of hints just 2 days earlier. Okay, maybe this had not been forwarded to the customer yet, because our Support department was doing further testing on it first. So it is highly likely that the customer had not seen my solution yet. But I'm still amazed that a customer would be willing to put into production some recommendation they had got off a web forum, and on the basis that they could not work out a better solution themselves - the poster did not know how to do hints properly.

In terms of the problem itself, the cause of the poor performance is actually the extreme skew within the data in this table. On average most of the date values occur thousands of times, but the particular value used in the query only occurs about ten times in the table. Hence another execution plan is faster for this particular value.

Which brings me back to histograms, which I was planning on doing a completely different post on anyway. There is a histogram on the constrained column, but it is a height balanced histogram because there are over 254 distinct values, and so there are no statistics on row counts of individual values other than the most popular values. The value used in the query is unpopular, and the average number of rows per value across all the unpopular values is in the tens of thousands. Hence the choice of an execution plan using a HASH JOIN, as that scales in a different way to higher row counts than a NESTED LOOP join does.

Maybe Oracle should introduce a new type of histogram? One that records both the most popular values, and the least popular values, and then stores a single average for all the values in between (the density). That would have helped here. It certainly seems the case that Oracle does not handle heavily skewed data sets well, though of course you should never expect a "one size fits all" solution to be perfect for all possible scenarios. What the Optimizer does is try and produce an execution plan that is the best for most of the possible input values. Which is what it did here, based on the statistics available for the average number of rows per date value via the density statistic.

Another viable solution is to make the Optimizer believe that the date constraint will match far fewer rows and so choose the NESTED LOOP join method itself. This is the approach put forward by Wolfgang Breitling in his paper on Tuning by Cardinality Feedback, in which he suggests changing the Density statistic stored for the column. And indeed reducing the density value by a factor of 100 has this effect. The upside of this approach is that it avoids the need to change the SQL or use hints, which do not always work as intended.