Monday, 22 September 2008

Contention and a bottleneck - or not?

I recently had the following situation: during a heavy data load by multiple, concurrent jobs the customer said that they saw contention, and provided me with a STATSPACK report from this period. Initially I was confused:
  • On the one hand, the top wait event was a very high value for a latch wait on the cache buffers chain.
  • On the other hand, the CPU utilisation during this whole period was only around 10%.
Normally I would say that this system had capacity to spare - 90% CPU unused - but the wait figures indicated that something was wrong somewhere. Digging further into STATSPACK I verified that the top SQL statement was the INSERT into the one table, and that the number of waits correlated with the number of sleeps on the cache buffer chain latch.

Now this told me that these were actual sleeps, where the session failed to get the latch when it tries in a busy spin loop. And these sleeps are relatively expensive and time consuming. Hence the high value for the wait time experienced.

Digging further I saw that there were buffer busy waits too, and managed to establish that these were all on the same index on the table. Now this table has six indexes - i1 to i6 - and the buffer busy waits were all on index i6. And this is on a column called last_update_time. So clearly this was a sequential, monotonically increasing series of values, and would all end up in the final leaf block of the index. So we had a classic case of records with the same value, or very close values, being inserted, and causing contention on the index block where those same values would be stored. And the lock on the index block during the block split and contents copying, would be the cause of the sleeps on the cache buffer chain latch for that block.

I also did some checking on the index structure itself, by analysing the index and then looking at the statistics. This showed that the percentage of space actually used in each block averaged just over 50%. So Oracle was actually doing 50:50 index block splits when the last index leaf block became full, rather than 90:10. This is probably due to the update times not being perfectly in sequence from the different data loading processes. But this also meant that this new block was already 50% full and so would be ready to split even sooner than if it were only 10% full.

The solution? Well, there were a couple of possibilities.

One was to use a REVERSE key index. However, these are only useful for equality tests and not ranges. And date columns tend to be used for range queries (from and to).

Another option would be to partition the index by another column, which would have different values in each INSERT from each of the loading processes. This basically creates a set of separate smaller indexes, and so increases the number of last leaf blocks across these sub-indexes, and so reduces contention. However, partitioning is an extra cost option, and needs extra maintenance over the number of partitions and their configuration.

The recommended option is to modify the index and add the extra column after the last_update_time, mentioned just now in partitioning. This would mean the index was still useful for equality and range conditions, and would spread out the same last_update_time value between multiple index leaf blocks having different values for the second column - in this case essentially an account identifier. Now with this extra layer of data values in the index it will distribute the new values over more index leaf blocks, and so reduce contention and improve throughput times.