Monday, 19 October 2009

Solving Database Design in Development

Previously I have looked at the challenges involved in being able to support a changing database design in an application development environment, which were:
  1. Support change at any time
  2. Ensure database designed and modelled properly
  3. Record each individual change, to enable metamorphosis of older databases
  4. Allow multiple versions or branches of the Database Design to exist, and be changed individually or collectively
Although I started off on this problem of database structure change from the context of the need for it when using Agile development methods, the actual problem is not unique or specific to Agile. It is a generic problem to the development of any application that involves the use of a relational database to store its data in.

What I now want to do is list the key features needed in a solution to this problem which, if all assembled together, would deliver a viable solution to this problem.
  • With Application Source Code we only care about the final state it ends up in. But with a Database Design we care about each individual change, and what gets changed. We need to record each of these Databse Changes individually, so that they can be applied to other instances of that database, as well as recording the final state of the database design itself in the Database Model.
  • Changes are physically made to a Database Instance by executing appropriate SQL statements. This is the only way to make a change to an existing database instance.
  • Each "Database Change" should be an atomic change to a database - it either completes successfully or not at all. Changes cannot be partially implemented on a database instance.
  • The Database Changes must be executed in the correct sequence on each database instance, for repeatability and due to any dependencies between the changes.
  • Changes to the Database Design should be formally Requested, and these requests recorded somewhere to provide an audit trail - Who, When, What Changed, Where in the Database.
  • Changes must be reviewed and approved by the Data Modeller, to ensure a good, scalable design. The Data Modeller is responsible for updating the Database Model - the record of the Database Design - and producing the SQL statement that implements that change. Some of this could be done automatically using appropriate tools, or manually. All that matters is that it does happen.
  • The Database Designer or Data Modelling role now becomes a part time role during the whole lifecycle of the application's development, instead of being a full time role only in the first phase of its development. It is now likely that all development cycles will involve the Database Designer, who becomes a central and critical member of the development team. The role itself, however, could be fulfilled by someone who has other roles too within the development cycle, or shared amongst a number of people, because it is not a full time role. This could be someone who is also a developer or a database administrator.
  • Each Database Change is given its own unique Identifier, which must be globally unique over all the other Database Changes. This identifies the particular Database Change, regardless of which Database Instances it is applied to, or which versions of the Application it appears in.
  • Changes to a particular Database Instance must be recorded within the Database Instance itself. This is the only way to avoid applying the same changes twice to a Database Instance. This implies some form of Change History table within each Database Instance.
  • This Change History table needs to record the Change Identifer for those changes applied, and the date and time when it was done. Other data could also be recorded, but these two are the minimum required.
  • The set of changes should be stored in a single file, often termed the Change Log. Each entry in this will include the Change Identifier and the SQL statement that implements that change. All necessary changes can be easily located and sequenced this way.
  • This Change Log needs to be well structured, so that a program can read and decode the Changes in it and execute the SQL statements for changes not yet applied to a given Database Instance. This is probably best done using XML.
  • There must be separate instances of the Database Model / Database Design and the Change Log file in each version / branch of the Application source code. This allows changes to be made independently to each branch of the Application.
  • In reality SQL statements are often specific to one database product, such as Oracle, Sybase, SQL*Server or MySQL. The Change Log will need to record the separate SQL statements for each supported database product for each Database Change. Thus each Change will have an Identifier, and a SQL statement per database product, clearly labelled with the database product name.
  • The whole solution is brought together in a program that is run during an Application Upgrade. It opens the Change Log XML file and reads in each Change. If that Change has not yet been applied to this particular Database Instance, then the corresponding SQL statement is executed and a record inserted into the Change History metadata table.
This is my interpretation of how to solve the problem of a changing database design during application development. But it turns out of course that others have been through exactly the same process before me and come up with exactly the same answers. And in fact, once I had thoroughly understood the nature of the problem, I could immediately see that these other people too had understood it and had arrived at pretty much the same conclusions I would. So although I did work through the problem from first principles myself, simply because that is the best way for me to fully understand a topic and appreciate whether any "solution" is right or not, I have no doubt borrowed from some of the material I have read in the way that I have described this solution here.

I still find some of the descriptions I found on the Web around Database Design, Changes and Agile Development unclear around this particular problem of implementing changes to a database design during application development. Quite a lot discuss the need to change the database design, and how to model different types of changes (refactoring is often mentioned). But almost none discuss how such changes get implemented in real deployed databases, and how you maintain a database over the long term with a series of changes to the application that uses it. None of them covered enough of it at once to leave me feeling that I had a real solution I could go out and apply. Hence my need to work through the problem from first principles for myself, to get the full picture as it were.

Again, one article that did help to clarify the nature of the problem a lot for me was Rethinking Agility in Databases: Evolution from Hexagon Software. This brought home the message that Databases must be treated differently to Applications in how you change them.

And the Change Log file of SQL statements was explicitly mentioned by Peter Schuh and Pramod Sadalage, who have put this into practice themselves on large projects. See Agility and the Database by Peter Schuh for instance.

So although I have not invented anything new here that has not already been described by other people elsewhere, in one form or another, I have tried to bring together all of the necessary ingredients for a solution in a coherent manner. As I said, this was my way of working through the database design problem to arrive at a solution that I fully understood and felt able to go out and apply in the real world.

I may revisit this again and try and outline a working solution to this i.e. what would need to be delivered to implement the kind of solution I have described here. Anybody reading this and want to know more?

No comments: