Monday 26 January 2009

Queuing Theory & Resource Utilisation

While Queuing Theory can be quite academic and mathematical at times, it does include a number of core rules or laws about how systems and their components behave under higher levels of utilization and the effect on the length of queues of pending requests.

One of the conclusions from this is that you cannot have high utilization of any resource without some level of queuing occurring. It may not seem obvious, but it is true. Essentially the queue of pending requests or jobs is needed to keep the resource busy most of the time. If the queue was ever empty then the utilization of the resource would be very low. High utilization is only achieved by having a queue of waiting requests, so that there is always a next thing to do when the current thing completes.

You could theoretically also achieve 100% CPU utilization by only having a few very active processes and no queues. So on a 4 CPU system you could achieve 100% CPU utilization with only 4 processes, each always busy and executing instructions. In this scenario there is no queuing for the CPUs, as there are only 4 processes, and the efficiency of the system is very good.

However, such a scenario is very, very unlikely. The processes could never block for anything and would always be executing CPU instructions. Which means that they would not be doing any disk I/Os, or communicating with each other by some kind of messages, or using locks or latches on shared structures. Which is the complete opposite of the Oracle Database Architecture.

An Oracle database server will have many processes running on it - one shadow server process per connected session (unless you are using the Shared Server configuration) - and they will use shared memory, and locks and latches to control data consistency, and perform lots of disk I/Os.

So an Oracle database server does conform to the general model in Queuing Theory of having lots of separate clients (the shadow servers) making requests on the resources in the system. And as a result, it does conform to the golden rule of high resource utilisation equals queues of pending requests.

As a result, 100% CPU utilization is very bad, and is symptomatic of very large queues of waiting processes. Queuing Theory also shows that above 50% utilization of a resource, there is always a request in the queue more often than not. Note that this is ‘on average’ - sometimes the queue can be empty and sometimes it can have several requests in it - but on average the number of waiting requests will be more than zero.

A general rule of thumb is to get worried at 80% utilization, as the number of concurrent requests will average something around four, and rises exponentially above this. An explanation of some of this and a nice graph of queue length versus utilization is available in this Microsoft article on Modeling Principles for Sizing. I know it is a Microsoft article, but it does summarize the effects of queuing well, and has the nice graph in it.

These queues can apply to all aspects of a computer system - CPU, Disk, Network and Memory. To drive any one of these at 80% utilisation or above means that you have queues of pending requests, which are needed to keep the utilisation that high.

The net effect of such high utilisation is an increase in response time. The total elapsed time for a given request is now the wait time in the queue plus the service time of the request itself. When the queue is 4 long on average, then the response time is actually 5 times the service time e.g. you might spend 40 ms waiting to perform a disk I/O of 10 ms itself. So high utilisation and high queues has a direct effect on the elapsed time of individual transactions.

The formula for the Total Number of jobs in the system (N) in proportion to Utilisation (U) is:
  • N = U / (1 - U)
Note that N is not the same as Q - the number of requests waiting in the queue. N is both the number in the Queue plus any requests currently being serviced - the total within the system.

And the formula for Response Time is:
  • R = Service Time / (1 - U).

So higher utilisation directly leads to longer queues, and larger response times.

Whether this is acceptable to your situation depends whether your goal is total throughput irrespective of transaction response time, or your goal is individual transaction response time. In other words, is it an online system, or more of a batch system processing relatively large units of work.

While Queuing Theory can seem theoretical at times, to me it reinforces the message that when you scale up the load on any system there will always be a bottleneck. And that bottleneck will reach high utilisation as it nears its capacity, and large queues will form in front of it. Identifying where the queues are in a system, what the bottleneck is, and what can be done to fix it - reduce service time or increase capacity - are key to performance tuning. And an appreciation of Queuing Theory has helped me get a deeper understanding of this area.

5 comments:

Anonymous said...

Amiable post and this fill someone in on helped me alot in my college assignement. Thanks you as your information.

Anonymous said...

It`s really nice article. Thank u a lot

Anonymous said...

Good Article

Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...


Perfect just what I was looking for!