Tuesday, 15 January 2008

Limits, Contention and Capacity

One of the key concepts in performance tuning and analysis is that of the "bottleneck". At low levels of throughput everything looks okay - as you increase the incoming requests so the output of completed work increases at the same rate. But as the throughput of the system rises you will eventually reach a point where the output levels off and no longer increases inline with input increases. At this point some part of the system is saturated and has become a bottleneck - you cannot squeeze any more through it, even though the rest of the system has capacity to spare.

Tuning a system is often about identifying this bottleneck, and establishing that it is indeed the bottleneck rather than just a slow component in the system. Fixing the bottleneck involves somehow increasing its capacity. Bottlenecks are normally considered in terms of the hardware in the computer system, but they can also be caused by the software being run.

In my view there are only 2 fundamental ways to fix any bottleneck:
1. Increase the capacity of the resource itself i.e. buy more or faster hardware, or change the software algorithm.
2. Reduce the usage of the resource by the application, freeing up capacity.

Which action you take depends on the nature of the bottleneck. If there is no contention and it is just a slow resource, then the first option is the most viable one. The second option is best when there is contention for the resource and a backlog queue of outstanding requests is forming in front of the resource for access to it. Reducing the use of the resource will reduce the size of the queue, which in turn reduces the time any request has to wait to gain access, and so has a direct impact on performance and throughput. But, as stated, this only works when the resource is overloaded and there is contention for it, which you have to identify.

Equally, the first option of replacing the resource with a faster one can work just as well for a contended resource that is the bottleneck. Being able to process each request quicker will reduce the contention and reduce or eliminate any queue of waiting requests, increasing performance and throughput. This is why many people often take the simple, brute force approach of upgrading hardware to make a system perform faster. However, it does not guarantee to remove the contention or the queuing, just to reduce it. There is the possibility that processing requests faster just leads to even more requests being submitted at a higher rate, and in turn back to having a queue of outstanding requests. Which is why the application behaviour needs to be considered.

The second way to fix a bottleneck can actually be approached in two different ways:
1. Optimise and tune the application's configuration from the outside.
2. Change the application itself from the inside.

The first approach typically involves changing configuration settings, such as increasing the number of records to read at a time to reduce the number of data reads, or creating an extra index on a table. The second approach typically involves rewriting part of the application to use more efficient algorithms that use less resources, such as reading a set of data into an array in memory once, and doing in memory lookups for validation instead of reads from disk.

What you do is dependent on the nature of the bottlenecked resource, the application software, and how much money and time you have available.

Another key point to be aware of, is that fixing one resource simply moves the bottleneck to another one. The bottleneck never actually goes away, it just moves somewhere else within the system. Presuming you are able to increase the effective capacity of the resource that is initially the bottleneck then throughput will increase, which is good. But eventually you will reach a point where the carrying capacity of that resource is now greater than another resource in the system. And now you have to repeat the analysis of the system again to identify the new bottleneck. And again work out the best way to either increase its capacity or reduce the number of times it is used.

Which is why performance tuning is so often an iterative exercise. You fix the first problem, but the bottleneck just moves somewhere else, and you repeat the analysis again, and again.

A valid alternative is to calculate in advance how much of each resource is available in the system, and how much can be used if performance targets are to be met, such as a given number of transactions per minute. This is more of a capacity planning exercise, but is directly useful in such performance tuning situations with poor performance. We can run the application under a relatively light workload and measure the actual amount of each resource used. These measurements can then be scaled up to the target workload and compared to the calculated resource requirements from before. This will relatively easily identify which resources will be saturated and overloaded, and by how much. This is a much easier approach than the repeated iterative analysis and idenfication of bottlenecks, and also establishes how much of a gap there is between desired performance and actual achieved performance. If the gap is known to be too wide, then simple tuning may not be the answer, and may be a wasted effort. Something more drastic and radical may be required. But you can only know this by doing the capacity planning in advance of deploying the application, and then comparing the two results.