Monday, 5 October 2009

The Challenge of Agile Database Design

Previously I have said that for what I call Enterprise Applications data modelling is important because the data itself has meaning and value outside of the application that manipulates it, and that scalability requires a good database design because you cannot just add scalability on afterwards. The Data Model or Database Design defines the structure of the database and the relationships between the data sets, and is part of the foundation on which the application is built. And a good database design is essential to achieve a scalable application. Which leads to the challenge - How do I go about designing a database in an Agile development project, when not all the requirements are known initially? What techniques should I be using?

Having read as much as I can find on this topic, I think I have a better understanding of the nature of this challenge. And this is what I want to explain here - the nature of the challenge of Agile Database Design.

First, we should restate the problem in a more positive way. Rather than "when not all the requirements are known", we can say "How should I be designing an Agile database that will change in the future". Change happens all the time, in one form or another, and is inevitable. We need to embrace it, assume that our database design will change over time, and find ways to support this changing database. This need for change over time is not unique to Agile development, and is in fact really a universal problem for applications and databases.

Second, we need to accept that we cannot skip the design stage for the database in any way. We must design the database properly - at least those parts of the database that we need to design now - and produce a correct model. As I argued before, a good and correct database design is essential to a well performing application, and to the integrity of the data itself.

The outcome of the database design is documentation on the structure of the database, often termed the Data Model. There are many tools that can be used to help you design your database and record the details of the model. Any such tool chosen should enable and support small and frequent changes to the data model, as this is a major requirement of Agile development. But you could also use tools as simple as a spreadsheet and a set of diagrams.

Third, a Database Design is not the same as Application Source Code. There are similarities, but they are actually different beasts. Both act as Blueprints for a thing that can be built - an Instance of that Design. And both can be changed over time as needed. The difference is that when the Source Code to an Application changes, we rebuild the Application completely, typically compiling all source code files. We have produced a new instance of the Application Program, as a next generation instance of it. This is an example of "Evolution": the Blueprint changes, and a brand new instance is created using it. Existing instances are not modified, but instead "replaced" by the newly created instance.

Databases are the opposite. Changes must be applied "in place" directly to each Database Instance (a real database on a computer system), to modify it into the latest database design. Such an in place changing in the structure of a thing is termed "Metamorphosis", and is quite different from "Evolution".

While I have appreciated for some time that there is a difference in type between an Application's Source Code and a Database's Design, I read about the explicit nature and form of this difference (Evolution versus Metamorphosis) in an article on Rethinking Agility in Databases: Evolution from Hexagon Software. All credit for this distinction between them and the terminology goes to them.

Application Source Code editing and maintenance methods will not work for a Database Design. A "replace and rebuild" methodology cannot be used for databases, which need a change to the design to be applied "in place" instead to each instance of that database.

Fourth, there may be multiple separate versions or branches of the Database Design to be maintained, as a result of the existence of separate branches of the Application Source Code. It is common for Application Source Code to be branched when major releases are done, typically producing a new branch for support of that release, and a new branch for the next major release. It is possible that the Database Design may need to change in different ways in different branches of the Application. Likewise, the same change may need to be made to the Database Design in different branches - correcting a bug for instance. We need a way to record each version of the Database Design separately to support this.

These then form the Challenges of delivering a changing Database Design:

1. Supporting change at any time to the Database Design

2. Ensuring that the Database is Designed and Modelled properly

3. Recording each individual change to the Database Design, so that older instances of that Database Design can metamorphose by having these changes applied to them

4. Allowing multiple versions or branches of the Database Design to exist, and to be changed individually or collectively

These challenges are not specific to Agile Development, and apply to any large enough application software development. Addressing these challenges will provide a solution that could be used in any database oriented application development, whether using Agile development methods or not.

In the next post I hope to start describing the outlines of what you would need in order to achieve what I call "Agile Database Design" that addresses these challenges. And then subsequently how to meet this in a minimal way.


Anonymous said...

In our company we actually allow the developers themselves to change the schema as they go, a log off all changes is applied once in a while on a schema after the dba checks it,
but it allows (experienced) developers to mold the structure as they go.

Roenbaeck said...

There is now both a technique, Anchor Modeling, and a tool, the Anchor Modeler, that supports agile database modeling.


It is all open source and free to use!