I was going to get around to writing something up about the Oracle Optimizer in the future, and how it can sometimes choose non-optimal plans for what it thinks are perfectly logical reasons. But suddenly today I find that the optimizer development group inside Oracle have done their own post on this. So I thought I would point you at it as an excellent description of the issues I was going to discuss, and so that you can see that in 11g Oracle have finally got around to addressing a problem caused by previous enhancements to the optimizer.
What I'd add is that I see the nature of the problem of these non-optimal SQL execution plans as being one of the consequences of previous attempts by Oracle to improve performance for the majority of SQL queries. Although most queries have indeed benefited from better and faster execution plans, some have not and have actually got worse between versions of Oracle.
Originally there was no Cost Based Optimizer (CBO) and only the Rule Based Optimizer, which was unaware of even things like table sizes. So Oracle added in the CBO and it used some simple data on table sizes and indexes on columns to make better execution decisions. Over time Oracle added more and more features to the CBO to improve the performance of most SQL statements most of the time, and added extra sets of statistics to make more accurate predictions about query execution behaviour. Unfortunately, while the vast majority of queries have indeed benefited from these changes to the way the CBO works, there have always been examples of queries that have suffered and actually got worse. The example given in the post by Oracle is just one. I know of many others.
The problem is really due to two issues, now solved by 11g, hopefully. The shared execution plan in the library cache, means that the same execution will always be used for the same SQL statement since it was first parsed ('same' meaning identical text of the SQL statement). Initially this was okay. But then Oracle introduced bind variable peeking, to try and optimize the query even further. By looking at the actual value supplied for a bind variable in a query, and using statistics on the value distribution of the queried column, Oracle could better optimize that particular query. But now all subsequent executions of the same query, regardless of the value used in the bind variable, would be forced to use the same, original execution plan. Which may not be optimal for those subsequent query executions.
There is in fact another, more manual, solution to this problem. Do not use bind variables in queries you know are sensitive to the query values used, and for which you want distinct execution plans. By putting the query value itself directly into the SQL statement, Oracle will treat each of them as distinct SQL statements and parse and optimize them individually, each with their own entry and associated execution plan in the library cache. Unless, of course, you have enabled forced cursor sharing, where Oracle dynamically rewrites SQL statements to remove hard coded values and replace them with references to bind variables. Which would simply map them all to the same, single SQL statement and execution plan in the library cache. Something to be aware of.
I've recommended this solution a few times to customers with data sets that have only a few values and very skewed distributions, and queries that do not strongly constrain on other columns. The example in Oracle's post is one such case - no other constraints other than on department, which is highly skewed. These cases are very few and far between, but often they do occur in critical pieces of code e.g. pick the next request to process that is in a 'ready to run' state. The longer such a query takes, the more time that is wasted before the application gets on with processing the request itself. And this can exasperate things when picking the request ends up taking longer than processing the request itself.
It is nice to see that Oracle have got around to addressing this problem of queries that are sensitive to the constraining values used in the query.
Friday, 7 December 2007
Thursday, 6 December 2007
Some Basics
This blog will be about anything to do with the performance of computer systems, especially when running relational database products. In fact the majority of it will be about the Oracle database on multi-processor UNIX systems (such as Solaris and Linux), and how to measure and monitor the performance of such systems and how to tune them. There is a lot of information out there already on these topics, and I hope to add to it with a pragmatic approach suggesting real things that you can do about the performance of your systems and the applications running on them.
Before I get going and dive into deeply technical things I thought I would cover a few basic points. Don't worry, I don't want to repeat a lot of stuff that is already out there in numerous publications. I'll just cover some of the things that I feel are most important. And I do want to get straight what I mean by performance, and how I go about any performance analysis exercise.
Nowadays we use the term "performance" as if it were a real physical thing - a resource that we can have a quantity of. We talk of "enough performance" and "needing more performance". And in the same way as for physical things, we need to be able to measure the performance of a system. As the saying goes - you cannot control what you do not measure - which in turn means that if you can't measure it you can't improve it.
Adequate performance really means that you are able to achieve your goals. And poor performance means that your goals are not being met. So all performance related work starts with goals. Without goals you cannot truly say whether performance is good or bad, or how close you are to achieving what is needed.
Performance analysis needs to be done in an objective fashion, with well defined goals, which can be measured. No goals, means no measurements, means no ability to determine where you are. Everything becomes subjective - "too slow" - and impossible to pin down. Often relative terms are used without a context - "slower". You have to define up front what "fast enough" means. And these goals are best defined in terms of the business requirements the system was deployed to satisfy. The higher the level you can define the goals at, the closer the result is to something that makes sense to the business and directly benefits it. A low level performance measure is often not particularily relevant to an overall business process.
Performance analysis and tuning needs to be done top-down, not bottom-up. Top down tuning focuses on those activities that have the greatest impact on the performance goal i.e. the business goal of the system. Bottom-up tuning often focuses on low-level metrics and tries to identify those that are out of bounds and fix them. Often many "rules of thumb" are used. However, there is no guarantee that this tuning effort will have any impact on the business goal at all. In my view all performance tuning actions should be linked back to the expected impact on the business goals of the system.
When analysing a system there are two ways you can look at the system's performance and the resource usage on it:
Performance itself can only really be measured in 3 ways:
And luckily Oracle has made significant improvements to the measurements it takes within itself between versions, which help with this top down analysis and identifying the biggest resource consumers. It too has moved from a bottom up approach of reporting low level utilisation figures, to a top down approach showing the relative proportion of resources consumed by which SQL statements and even today being able to estimate the net impact on performance of certain changes to the database.
I'll get onto specific tuning issues for both Oracle, Solaris and computers in general in future entries.
And what about me? I've been using Oracle since about 1988 (version 5), but really got into it with version 6 and every version since then. Since starting in application development (C on UNIX) many years ago, I then moved into technical support and more lower level details of how systems actually work and achieve what they do behind the scenes. And this led onto benchmarking and performance tuning and problem analysis of Oracle based systems, which is what I have continued to work on since about 1990, one way or another. I keep trying to expand and move into new fields and technologies, but Oracle and performance issues never seem to go away, and I keep coming back to them time and time again. I have worked for a number of computer hardware manufacturers (including Sequent and Sun) and in a variety of benchmarking, consultancy and delivery roles, often a mix of architect / design work and hands on implementation. This has given me exposure to some of the largest Oracle based systems in the world in their time, and to the latest versions of Oracle and tuning the system to squeeze out the last drop of performance. No matter how good the technology gets, people still keep getting performance problems. And it looks like they won't go away in the future too, which will hopefully keep me busy.
Before I get going and dive into deeply technical things I thought I would cover a few basic points. Don't worry, I don't want to repeat a lot of stuff that is already out there in numerous publications. I'll just cover some of the things that I feel are most important. And I do want to get straight what I mean by performance, and how I go about any performance analysis exercise.
Nowadays we use the term "performance" as if it were a real physical thing - a resource that we can have a quantity of. We talk of "enough performance" and "needing more performance". And in the same way as for physical things, we need to be able to measure the performance of a system. As the saying goes - you cannot control what you do not measure - which in turn means that if you can't measure it you can't improve it.
Adequate performance really means that you are able to achieve your goals. And poor performance means that your goals are not being met. So all performance related work starts with goals. Without goals you cannot truly say whether performance is good or bad, or how close you are to achieving what is needed.
Performance analysis needs to be done in an objective fashion, with well defined goals, which can be measured. No goals, means no measurements, means no ability to determine where you are. Everything becomes subjective - "too slow" - and impossible to pin down. Often relative terms are used without a context - "slower". You have to define up front what "fast enough" means. And these goals are best defined in terms of the business requirements the system was deployed to satisfy. The higher the level you can define the goals at, the closer the result is to something that makes sense to the business and directly benefits it. A low level performance measure is often not particularily relevant to an overall business process.
Performance analysis and tuning needs to be done top-down, not bottom-up. Top down tuning focuses on those activities that have the greatest impact on the performance goal i.e. the business goal of the system. Bottom-up tuning often focuses on low-level metrics and tries to identify those that are out of bounds and fix them. Often many "rules of thumb" are used. However, there is no guarantee that this tuning effort will have any impact on the business goal at all. In my view all performance tuning actions should be linked back to the expected impact on the business goals of the system.
When analysing a system there are two ways you can look at the system's performance and the resource usage on it:
- From a capacity perspective of how much of each resource is being used, and identify saturated resources
- From a usage perspective of identifying which processes and tasks are consuming the resources
Performance itself can only really be measured in 3 ways:
- Latency - the elapsed time to perform one unit of work.
- Throughput - the amount of work done in a period of time.
- Utilization - the amount of a fixed capacity resource that was used.
- Latency measures the quality of the service - fast or slow?
- Throughput measures the quantity of the service - how much?
- Utilization measures the efficiency of the system when delivering the service - what did it cost? how much did we use?
And luckily Oracle has made significant improvements to the measurements it takes within itself between versions, which help with this top down analysis and identifying the biggest resource consumers. It too has moved from a bottom up approach of reporting low level utilisation figures, to a top down approach showing the relative proportion of resources consumed by which SQL statements and even today being able to estimate the net impact on performance of certain changes to the database.
I'll get onto specific tuning issues for both Oracle, Solaris and computers in general in future entries.
And what about me? I've been using Oracle since about 1988 (version 5), but really got into it with version 6 and every version since then. Since starting in application development (C on UNIX) many years ago, I then moved into technical support and more lower level details of how systems actually work and achieve what they do behind the scenes. And this led onto benchmarking and performance tuning and problem analysis of Oracle based systems, which is what I have continued to work on since about 1990, one way or another. I keep trying to expand and move into new fields and technologies, but Oracle and performance issues never seem to go away, and I keep coming back to them time and time again. I have worked for a number of computer hardware manufacturers (including Sequent and Sun) and in a variety of benchmarking, consultancy and delivery roles, often a mix of architect / design work and hands on implementation. This has given me exposure to some of the largest Oracle based systems in the world in their time, and to the latest versions of Oracle and tuning the system to squeeze out the last drop of performance. No matter how good the technology gets, people still keep getting performance problems. And it looks like they won't go away in the future too, which will hopefully keep me busy.
Subscribe to:
Posts (Atom)