Sunday 21 July 2013

Mercurial for tracking script change history

On the one hand source code control can be a nightmare involving complicated tools and formal processes, but on the other hand the ability to track the history of changes to a file can be really useful. With complicated SQL queries and stored procedures, seeing what changed between each major version can be useful when debugging a change in behaviour or when application performance degrades. What is needed is something that is straightforward enough to use (not overcomplicated), does not get in the way of what you really want to do, helps you easily see what changed and when, and provides the right amount of functionality (nothing essential missing). If it fails on any of these then it is probably not worth using.

Having stumbled across Mercurial about 4 years ago when I needed something for a project I was working on, over time I have found it will do almost everything I need, and I've not looked back since. I use it for pretty much anything - most obviously Oracle SQL scripts, database build scripts, stored procedures, and other programming languages (I like Python which I've mentioned before). But it will work on any kind of text file.

Why Mercurial? What are the main things I like about it?
  • Command line interface and various GUI options
    • Command line access makes direct use easy if you prefer that
    • GUI options include a web browser interface, TortoiseHg GUI front end, and integration with other tools e.g. Eclipse
  • Truly tracks changes to files
    • It really records what changed between your saves (commits), not just a copy of the new file
    • This makes more advanced things really easy and almost trivial e.g. taking a change and applying it as a patch to another version
  • Changes saved as "Change Sets" - groups of changes to files are saved together at the same time
    • This is logically what you want - all your changes saved together as one set
  • Changes saved when you say - as frequently or infrequently as you want
  • You always have a full copy of the source code tree locally
    • None of this stuff about needing to check out in advance files you will need to change
  • Detects which files have changed or been added
    • Makes it really easy when saving changes - helps you avoid "forgetting" an important file
    • Also makes it easy to see if anyone else has changed any other files
  • Its repository (history of changes) sits alongside your source code in a separate directory at the top level
    • Its metadata is kept separate from your source code files
      • There are no extra metadata files or sub-directories mixed in with your source code files
  • Many advanced features too
    • Branching within repositories - different sets of changes from a common point in history
      • Useful for support versions of released versions, and bug fixes
    • Linked repositories - parent / child relationship between repositories
      • Another way of doing branches and separating released versions from in-development ones
      • Change Sets made to one repository can be pushed across to another repository e.g. bug fixes
    • Fully distributed - multiple copies of a repository can exist and be reconciled with each other
      • Developers can work on their own copy of the source code, then push changes back into group level repositories
    • Flexibility over repository hierarchies
      • Can be flat (master and one per developer) or very hierarchical (master, testing, project, sub-project, developer)
I'm probably making it sound more complicated than it is. I use it all the time in a very straightforward way because it is so easy, and it lets me see my change history if I ever need to. From my perspective here is what I do on any discrete project I work on.
  • If there is any existing source code for the project get a copy of it locally
  • Initialise the Mercurial repository ("hg init") and save the baseline
    • hg commit -m 'Baseline'
  • Edit any files I need to edit, test, and repeat until I'm satisfied
  • Save my changes locally in Mercurial
    • hg commit -m 'Description of what I did'
  • If changes need to be propagated to another source code control tool, work out what files changed
    • TortoiseHg gives you a straightforward view of what files were changed in each commit
    • Or use the command line to see what changed in the last commit "hg log -v"
    • Or any previous commit (revision number) "hg log -v -r REV#"
  • If using another source code control tool, synchronise with it when needed
    • Update my local copy of files from the other source code control tool with their latest version
    • Mercurial will detect what files have changed, if any
    • Save the changes in Mercurial with an appropriate message
      • hg commit -m 'Synced with master for latest changes by others'
  • If the intended changes are tentative or need extensive testing, I first take a full copy of the source code tree (Mercurial repository)
    • hg clone existing-repo repo-copy-name
    • This keeps my new changes separate from the original ones, and avoids issues over needing to back out rejected changes later on
    • After testing if the changes are accepted I "push" them back to the parent repository it was cloned from
      • Otherwise I just delete the copy of the source code and abandon those changes
    • All changes made since the repository was cloned are pushed back to the parent
      • So the number of local saves I did during editing and testing does not matter - all my changes are propagated back
  • Having multiple clones lets me work on different projects at the same time, without mixing up their changes in the same source code tree
As I said before, this lets me easily keep a local repository tracking all changes made to source code files, so I can see who changed what and when. It also enables me to keep multiple copies of a master source code tree and test out different changes independently of each other, and only push back changes when thoroughly tested and accepted by everyone else.

Your mileage will vary, but I find Mercurial easy enough to use, and it provides the right level of tracking of source code changes that I want.

Saturday 6 July 2013

My Top Technical Tools

I find that during my technical work on Oracle and Performance Tuning there are a few key pieces of software that I keep coming back to again and again. They are not directly related to databases, and are really more like technical tools that help me do my work better. Each does a unique thing, and does it so well that they have become standard tools I use all of the time one way or another.

So my top technical utilities that I could not do without, in no particular order, are:
  • TiddlyWiki - a great personal Wiki for keeping together all the bits of useful information I come across
  • Mercurial - source code control, great for tracking the history of all my SQL scripts
  • VirtualBox - virtualised environments for testing multiple Linux and Oracle versions
  • Trello - an online list tool, great for do lists and following mini projects
I'm not trying to say that each of these is the best possible for each thing. I'm saying that they are so easy and straightforward to use that I use them again and again. And their feature set is great in each case - just what I want, no more, no less. Each of these tools has advantages that makes it more useful than other options out there or just not using anything at all.

TiddlyWiki is a self contained Wiki in a single HTML file, with a built in editing capability. So you can carry around your own Wiki on a memory stick as a single HTML file, and keep adding to it any time you want to. As ever it uses a bit of meta-syntax to embed formatting and hyperlinks. But that is it - no complicated set up or configuration, or other dependencies. Its portable - I use Firefox as my browser, but on Windows or Linux or anything else.

Mercurial tracks changes to text files i.e. source code versioning. Great for tracking changes to all of my numerous SQL scripts over time. Again, no complicated set up, though you do need to install it first (either build from scratch or as an installable package if available for your operating system). It keeps a record of what you changed in each file whenever you do a commit (an update to the history in its repository). It is dead easy to see the history of changes to any given file (often difficult for some tools) - specifically what changed and when. I've used it for all kinds of "source code files" on projects and been able to debug bad changes by working out when certain changes were made and by whom. You might never need to go "back in time" to see old versions of scripts, but when you do need to it is invaluable. And for just a little disk overhead of keeping the change history.

I'm sure many of you know about VirtualBox for creating virtual machines. I can easily create a new virtual machine, install what variant of Linux I want, and do any testing I want - all in isolation from other things. Really useful for creating sandpit test environments, and throwing together specific combinations of software without needing a dedicated system.

Trello is my latest discovery from about 2 years ago. I've always wanted a "good enough" do list tool, but nothing too complicated or over the top - not full blown project management. I want to add new items easily, add new lists, move things between lists, order them as I want to within a list, and mix lists and items together nesting them in each other. Trello lets me do this. Although web based and requiring registration, it has a great GUI with drag and drop of items - within lists and between lists. Again, they have made it really easy to use. Click for a new item, type it in, and save. Editing is just a case of clicking twice (once to open, then edit). Trello kind of has 4 levels of hierarchy within it, so you can work out which combination of levels works best for you. There are boards (separate sets of lists), lists within a board, items within a list, and items can also have a checklist within them which is useful. So you either have one board with multiple lists and items, or multiple boards (one per project?) with lists of sub-projects and task level items. Or mix it another way. I like the checklists because they only appear when you open the item (keeping the main display less cluttered), and they work well for task lists as you tick them off as complete.