Tuesday, 24 November 2015

Use Bind Variables in Application SQL for Oracle

If you don't already know, then you should almost always be using bind variables in all SQL statements used in any applications you write that run against Oracle. Bind variables are place holders within your SQL statements that get replaced by real data values at run time when the SQL statement is executed i.e. at run time the real data value is "bound" to the corresponding place holder variable in the SQL statement. So rather than using unique SQL statements with explicit data values embedded in them:
select col1, col2, col3 from mytable where col4 = 'A Value' ;
select col1, col2, col3 from mytable where col4 = 'Another Value' ;
select col1, col2, col3 from mytable where col4 = 'Yet Another Value' ;
use one common SQL statement for all executions instead, binding a different value for each execution:
select col1, col2, col3 from mytable where col4 = :b1 ;
Note that there are a couple of circumstances to be aware of where it might make sense to not use bind variables e.g. data warehouses running ad-hoc queries with histograms present on many skewed data columns. But for the vast majority of circumstances on all OLTP type Oracle databases the applications should be using bind variables.

The most commonly mentioned benefit of using bind variables is quicker SQL statement execution because an existing execution plan in memory is simply reused. There is also an associated reduction in CPU load by avoiding the need to fully parse and validate every SQL statement when executed. However, there are many other benefits that come from using bind variables, which makes the case for using them even stronger. Even if your system does not have high CPU utilisation and performance is adequate today, it is still worth exploring switching to use bind variables for the other benefits you gain.

Here are some of the other benefits, in no particular order.

Library Cache

The memory needed for your library cache where SQL statements and their execution plans are stored in the SGA is significantly reduced. Reusing an existing SQL execution plan for another execution of the same SQL statement but with different run time data values for the bind variables means that fewer SQL statements and execution plans in total need to be stored in the library cache. The main benefit is from the storage for the SQL statement itself which is now stored once only in the library cache and reused, rather than once per execution with different embedded run time data values.

Latch Contention

Latch contention internal to Oracle is reduced. Parsing a SQL statement involves actions such as allocating storage slots in the library cache for the SQL statement and the execution plan, and reading parts of the data dictionary cache while checking table and column names, etc. All of this involves latches - internal locks on Oracle data structures to stop them being used or changed by another session. If you don't need to parse and validate a SQL statement from scratch then the number of latches that get locked during the execution of each SQL statement is significantly reduced.

Better SQL Statement Execution Statistics

When a SQL statement is executed Oracle measures and maintains a number of statistics on the execution of that SQL statement, such as total number of executions, CPU time used, disk reads, etc.

When not using bind variables each SQL statement is unique in the library cache. Eventually when another unique SQL statement is executed it will force one of the other SQL statements to be aged out of the library cache to release its slot. When this happens all of the execution statistics for that SQL statement get destroyed and lost too.

When using bind variables then any existing entry for that SQL statement in the library cache is reused, and the execution statistics for this particular execution are added on to those already accumulated so far. Given that such shared SQL execution plans will stay in the library cache longer, then the execution statistics that Oracle keeps will be accumulated with each execution. And in turn this means that these execution statistics better reflect the true execution profile of all SQL statements being executed against that database.

More Useful AWR Reports

In turn these SQL statement execution statistics are captured and stored in the Automatic Workload Repository (AWR) when the regular snapshots are taken (every hour by default). When using bind variables you get more complete and accurate statistics on total SQL statement execution captured for the reasons given previously. When not using bind variables the churn of SQL statements in the library cache mean that very few are still present in the library cache when the regular AWR snapshot is taken, and their execution statistics are missing from the AWR.

In the SQL Statistics section of an AWR report the top SQL statements are displayed by their cumulative resource usage such as number of executions, CPU usage, disk reads, etc. The idea is that it is trying to help you identify the SQL statements that are having the most impact on the performance of your system. This means that when not using bind variables you are probably missing most of the SQL statement executions from this part of the AWR report because individually their resource usage is very low.

The AWR report helps you spot such an anomaly by telling you the total SQL execution count covered by the statistics being reported at the start of each table of resource usage by SQL statement:
Captured SQL account for ##.#% of Total [Statistic Name]
If the percentage is too low then it implies that there are a lot of SQL statements being executed that are not being captured and reported on by AWR. Even if it is over 50% it still means that up to 50% of the SQL workload on your system is not being captured and reported on by AWR. And you don't know whether you can ignore this or not because you don't know how many SQL statements are involved, how often they are executed, or how much resource they use while executing.

Not using bind variables is an obvious candidate for causing this "missing" set of SQL statements, but there are other possibilities too. So investigate further either way if this percentage for captured SQL is not over 80%. What is causing so much executed SQL to not be captured by AWR?

What about third party applications?

What if you don't have control over the application, and have no ability to change any of the SQL in it? Well you can still benefit from the use of bind variables by forcing Oracle to rewrite all SQL statements dynamically to use bind variables instead. This feature was introduced in Oracle 10g precisely to help with this kind of problem.

There is an initialization parameter "CURSOR_SHARING" which defaults to a value of "EXACT" i.e. an existing cursor in the library cache is only shared and reused if the text of the new SQL statement exactly matches that of the SQL statement that created that cursor originally. If bind variables are not being used, then this means that each SQL statement with an embedded data value will be different and will have a new cursor allocated for it in the library cache.

You can set "CURSOR_SHARING" to "FORCE" which causes Oracle to dynamically rewrite each SQL statement being executed and replace embedded data values by bind variables i.e. Oracle ends up executing the equivalent SQL statement with bind variables. This results in greater reuse of execution plans within the library cache and all of the benefits of using bind variables outlined so far.

Note that there was also a value of "SIMILAR" you could set it to but this was deprecated in, so you shouldn't use this setting any more.

Monday, 9 November 2015

Data Driven or Application Driven Development?

I'm the kind of person who believes that Data is really important and needs to be treated and dealt with properly. Of course we also need Applications that use that Data - one without the other is no use at all. But a battle I often have with clients now is that Data needs to be modelled and designed at the same stage as the Application is. Ignore the data model and database design and you will likely end up with a mess that will only come back to haunt you later on. (More on this in other blog posts to come).

I've just read a reasonably interesting blog post from Pythian on being Data Driven where this distinction is mentioned explicitly. At one point one of the panel mentions how some people continue to ignore the importance of Data and its design (my emphasis within this quote):
I recently I talked with someone who felt extremely behind the curve and they only recently did things like looking at data design as integral to the design phase of their system development. They had always, of course, reviewed design of the applications before implementing them, but data was always an afterthought. Applications create data, so what?

Now that they’re trying to get better at data, they had to actually explicitly make sure you got a data person in there designing the data, looking at the impact of what this application’s doing with the landscape or the information architecture they have and reviewing that before going off and actually writing a code.

That was new for them.
This is the kind of thing I have been on about for years - the Data is as important as the Application and needs to be designed properly up front. In fact, you can argue that the Data is more important in the long run. Why? Because Applications come and go but Data lives forever. Think about it - you often replace one Application suite with another, but you will almost always migrate over your existing Data i.e. you might throw away the Application software but you keep the Data itself.

And yet I continue to see software development organisations adopt Object Oriented and Agile Development methodologies wholesale and treat the Data as an "afterthought". Why, when the Data is at least as important as the Application?