Start Working with a Version Control System

Every so often I get asked about what to consider when introducing Continuous Integration (CI) to an organization. Interestingly though, most of the details discussed are about working with a version control system (VCS) and not CI itself. That is understandable because the VCS is the “gateway” for all developers. So here are my recommendations.

Use of Branches

It is important to distinguish between the goal (Continuous Integration) and the means (trunk-based development). Yes, it is possible to implement a system that facilitates frequent integration of code from various branches. On the other hand it is a considerably more complex approach than to simply work off trunk. So in most cases I would argue that simpler is better.

In any case I recommend to also look at using branches and can recommend this video on YouTube as a starting point. Whatever path you choose, it will always improve your understanding of the subject and you do not have to take my word for it.

Number of Commits

Most people that do not use a VCS will typically work through the day and create a file copy (snapshot-like) of their project in the evening just before they leave for the day. So it is a natural conclusion to transfer this approach like-for-like to the VCS. In practical terms this would mean to perform a single commit every day just before you go home. And the commit message would be similar to “Work for <DATE>” or “WIP”.

But instead of doing so, developers should commit as often as possible. In my experience 5 to 15 times for a full day of development work is a good rule-of-thumb. There will be exceptions, of course. But whenever you are far enough outside this ballpark-figure, you should analyze why that is.

Time to Commit

Instead of looking at time intervals, people should commit whenever the code has reached a stable state. Or in other words: It does not make sense to have people commit every 30 to 45 minutes. They should rather do this after e.g. having fixed a small bug (e.g. correction of a threshold). But for changes that require more than roughly 60 minutes of work, things need to be broken down. This will be looked at in detail in the next bullet point.

Especially when starting with a VCS, people will quite often miss to commit when they have completed a somewhat discrete piece of work. That is normal and happens to everybody. Even today, with more than ten years of experience on the subject, I still sometimes miss the point. Adding the step of committing a set of changes to your work routine, is something that really takes time. It is a bit like re-ordering your morning routine in the bathroom. Most people do things in the exact same order every day. Changing something there is just as difficult as performing a commit “automatically”.

What to do when you realize your miss, depends on the circumstances. If this is your personal pet project, you may just virtually slap yourself on the head and continue or do the infamous “WIP” commit. But if this a critical project for you organization and you collaborate with others, you need to undo the last couple of changes until you are back where you should have performed the commit in the first place. Yes, this is cumbersome and feels like a waste of time, especially if you are working under time pressure, i.e. always.

But there is no alternative and anyone who says differently (typically project managers without a solid background in software development) is just completely wrong. Because you need to be able to understand exactly who performed what change to the code base and when. But with messy commits this will not work in practice. Or to rephrase in management speak: It is much more time-consuming and error-prone to go through untidy changes every single time you try find something in the VCS, than to spend the effort only once and correct things.

Split Up Larger Work Items

In many cases the effort to implement a new feature or fix a really nasty bug will exceed let’s say 60 minutes. In those cases the developer should have a rough a plan how the overall work be structured. For a new feature this could mean something like:

Add test-cases that pass for the current implementation
Re-factor in preparation without changing behavior
Add test-cases for new feature
Implement first half of new feature but ensure that it cannot be executed yet (think feature-toggle here)
Finish new feature and enable execution

Working Code

The example above for how to structure the implementation of something larger has a critical aspect to it. Which is that at every point in time the code in the VCS must be in a consistent and operational (=deployable) state. If things look different (i.e. some parts are not working every now and then) in your development environment, as opposed to the VCS, that is ok. Although it has proven to make life easier when both the VCS and your environment do not stray too far apart from each other.

What I discovered for myself is that the approach has a really nice by-product: cleaner and more stable code. In hindsight I cannot say when this materialized for me. So there is a small chance that from a clean code perspective things got worse before they got better. But my gut feeling tells me that this was not the case. Because an always-working code also means a better structured code, which is by definition more stable due to reduced complexity (relative to a messy codebase).

Fix Immediately

This has been written about many times and I merely mention it for completeness here. Whenever a change breaks the code, and thus causes automated tests to fail, the highest priority is to get things back into a working state. No exceptions ever!

When NOT to Commit

A VCS is not a backup system for your code but a VCS. This also means that you should not simply commit at the end of the day before you go home, unless your code happens to be in a working state. Otherwise, if you feel the need or are obliged to do so, have a backup location and/or script that handles this. But please do not clutter the VCS with backups.

At least in the early days of CI (the early 2000s) it was a somewhat common phenomenon at the beginning of projects that at the end of the day people checked in whatever they had done so far and went home. In many cases this broke the code and tests failed on the CI server. Until the next morning it was not possible for others to work effectively because you cannot reasonably integrate further changes with an already broken codebase. That is bad enough if people are located in one timezone. But think about the effect it has on an organization that works with a follow-the-sun approach.

Commit Messages

The reason for commit messages, in addition to the technical details that the VCS records anyway, is to describe the intent of the change. It does not make sense to list technical details, because those can always be retrieved with much more precision from the VCS log. But why you performed the sum of those changes is usually hard to extract from the technical delta. So think about how you would describe the change in a way that allows you to understand things when you look at them in six months.

In Closing

These are just a few point I learned over the years and have been able to validate with various projects. They are practical and provide, in my view, a good balance between the ideal world and the reality you find in many larger organizations. Please let know if you agree or (more importantly!) disagree.

Christoph Jahn