Monday, 18 February 2008

Good SQL goes hand in hand with Good Database Design

Or to put it another way - if you find poor SQL in an application, it might be as much to do with poor database and application design than plain, old poor programming.

Ever since I had to use COBOL briefly many years ago, I've learnt the lesson that often a good design can make the programming easier. By carefully designing and defining the DATA sections at the top of your COBOL program, you could make the later manipulation of the data itself that much easier.

Time and again I have seen the results of both practices - good design leading to a good, efficient implementation; and bad or weak design leading to a poor implementation. Not only can a good design help achieve a good implementation, but a poor design will often hamper the programmer and reduce the options available to them, restricting what choices they have. And so the result of a poor design is a poor application with poor SQL in it.

The SQL to access data goes hand in hand with the design of the database containing that data. You cannot separate the application code from the database design. Put the data in the right place, and the SQL becomes simple and straightforward. Put the data in the wrong place, and the SQL becomes more complicated, joining tables together and adding extra constraints and conditions, and possibly needing to use more horrible things such as outer joins and unions. I'm not saying that these are always unavoidable, but I am saying that the core design of a database for an application should not need them.

This is why good database design is important, and how it can impact application performance as much as the application code itself. Time spent doing a good database design is repaid by a good, well performing application. A skimped, quick database design often results in a longer development cycle, probably needing changes to the database design to correct things missed out earlier, and extra effort by the developers to write the SQL that gets them the data they need when they need it.

I'm saying all this because I have seen systems with poor SQL that was causing poor performance, and it turned out to be due to the poor database design, and not the programmer's ability to write good SQL. The SQL was in fact as simple and direct as it could be, but it had to go to multiple tables and join them together and apply different restrictions to each table, to get the data it needed.

In fact, some of the worst impact SQL I have seen on a system was incredibly simple, and was written to continually count the rows in a table to see if any new records had arrived. Of course it was not that simple - only certain rows in a certain state were to be counted, and these also depended on having matching data in another table, and so on. The net result was a SELECT that queried 5 separate tables to determine if any data at all was in a 'ready' state. And because it was a count, it actually needed to touch a lot of data or index records.

Although the joins were efficient (using indexes), the constraints were very few, so that Oracle was retrieving most of the rows that matched in most of the tables, before rejecting them and counting what was left. Such a query cannot avoid a lot of row accesses, even if they are logical and not physical I/Os due to being in the buffer cache. It still takes time to access these rows in the tables and count them all up.

What made this case worse, was that for various reasons the query also did a self join. New entries could be sequenced, and so to count a record as being 'ready' it also had to have no predecessors in its sequence which were also 'ready'. This doubled the amount of work being done by the query, in terms of data records accessed.

As you can imagine, this made it a slow query. And this was being run multiple times a second to see if any new work had arrived. So it became the largest resource consumer in Oracle from the application, even though it was not doing any real work - just counting how many jobs were ready to be processed.

The solution? A redesign. We tried everything we could about using indexes and caching data, and pushing joins down into sub-queries to steer the optimiser one way or another. But still the query involved a lot of buffer gets, which took time, and it was being run very frequently. Which only left a redesign as a viable solution. By moving around where data was stored, and in what form, we were able to eliminate many of the joins, and drastically reduce the number of buffer gets per execution.

And the biggest win was to turn the design completely on its head. Instead of counting rows in a table, we had a counter column in another record that was updated every time a new record was inserted. Although this meant changing the application in more than one place (adding an UPDATE of the counter when a new record was inserted and later when processed), the original query was now reduced to a single record fetch. The remaining joins were now only joining to a single row, and not to every row in the original table, and the self join disappeared too.

The result was vastly improved performance - of both the query and the overall system - and in fact it reduced the total resource consumption on the system, so that CPU utilisation when down while throughput went up.

And all because of a poor initial database design, restricting a programmer to writing a poor piece of SQL, that performed inefficiently and used too many resources.