Tag Archives: Subversion

Starting with Continuous Integration

In this post I look at things to consider when an organization wants to introduce Continuous Integration (CI). As in so many other situations the non-technical challenges are more difficult to solve than some nitty-gritty details.

Start Small Right Now

If ever there was a place for the proverb “the better is the enemy of the good” it is here. Waiting days, weeks, or months because you have not sorted out all details is the worst you can do. Instead you should start immediately by just installing a CI server (Jenkins is the de facto standard) and set up a simple job that does nothing but check out the source code from the VCS and compile it.

More advanced stuff like test automation, setting up delivery pipelines, integration with binary repositories like Artifactory or Nexus is not needed in the beginning.

Agile Automatically?

Most development teams that have not used CI so far are probably operating in a more or less non-agile fashion. That is fine and can stay as it is! Because while CI is virtually a prerequisite for agile development, that does absolutely not mean that teams following a waterfall model will not benefit considerably from CI.

So establishing CI can but does not have to be the first step of moving towards agile development. In fact I would argue that introducing CI is a large-enough step for an existing development organization. Only when this has been “digested”, you should think about moving towards agile. Otherwise too many things would be changed in parallel, similar to combining a new release of your own software with an upgrade of the underlying platform, e.g. the database server.

Frequency of Builds

This is the only part where I strongly recommend that you start at full throttle. What I mean by that is that you resist the temptation to run your builds only once a day or even less frequently. Ideally, every commit into the VCS triggers a build via a post-commit hook (here is more information for Git and Subversion). But polling the VCS every e.g. 10 minutes is a good-enough approximation in most cases. And it is also a little bit easier to set up when you just start on the whole topic.

Why am I so adamant on this particular point? I think that almost-instant feedback is at the very core of CI and the only way to deliver it is by running the build. All the points below change the amount of details that are provided or reduce the risk of introducing bugs into the code. But this hugely powerful feeling you get after your first commit triggers a build, is the important aspect for successful adoption in my view.

Test Automation

Start with “compilation works” as the lowest common denominator. When you want to start adding the use of “proper” test frameworks, feel free to do so. But is nothing you need on day one.

When you are ready to do more, you need to focus on those parts of your code that are most relevant for the business. Resist the temptation of striving for large test coverage of your code for the sake of it (having a KPI on this is a really bad idea). Otherwise people will start writing test for trivial helper functions, testing which on their own is of low relevance.

Instead take the critical parts of the business logic and develop a way to test them end-to-end (if possible without the GUI yet). With this approach you will implicitly cover all the lower-level stuff underneath automatically. Unless you have someone on your team with practical experience on integration testing frameworks (e.g. Citrus), I would not start with a full-blown approach but rather develop a few custom scripts.

The point in time when to start with more advanced topics, especially automated performance tests, depends on your individual situation and I will not make recommendations about it here. But what you should do as soon as possible, is read up on the subject and get an understanding about the different types of test and what they are good for. You do not need to implement everything now, but this will allow you to make informed judgements about the path you choose.

In Closing

You should now have an idea how to get started with CI quickly and in a way that delivers positive results pretty much from day one. Gaining traction in the organization should be your first priority in the beginning. There is a widespread misconception that things like CI, while theoretically the right to do, slow developers down. Nothing could be further from the truth. But unless you fight this impression fiercely, sooner or later management will ask for by-passing that “nice new thing” and get code out of the code faster using the old way.

Related posts:

Start Working with a Version Control System

Every so often I get asked about what to consider when introducing Continuous Integration (CI) to an organization. Interestingly though, most of the details discussed are about working with a version control system (VCS) and not CI itself. That is understandable because the VCS is the “gateway” for all developers. So here are my recommendations.

Use of Branches

It is important to distinguish between the goal (Continuous Integration) and the means (trunk-based development). Yes, it is possible to implement a system that facilitates frequent integration of code from various branches. On the other hand it is a considerably more complex approach than to simply work off trunk. So in most cases I would argue that simpler is better.

In any case I recommend to also look at using branches and can recommend this video on YouTube as a starting point. Whatever path you choose, it will always improve your understanding of the subject and you do not have to take my word for it.

Number of Commits

Most people that do not use a VCS will typically work through the day and create a file copy (snapshot-like) of their project in the evening just before they leave for the day. So it is a natural conclusion to transfer this approach like-for-like to the VCS. In practical terms this would mean to perform a single commit every day just before you go home. And the commit message would be similar to “Work for <DATE>” or “WIP”.

But instead of doing so, developers should commit as often as possible. In my experience 5 to 15 times for a full day of development work is a good rule-of-thumb. There will be exceptions, of course. But whenever you are far enough outside this ballpark-figure, you should analyze why that is.

Time to Commit

Instead of looking at time intervals, people should commit whenever the code has reached a stable state. Or in other words: It does not make sense to have people commit every 30 to 45 minutes. They should rather do this after e.g. having fixed a small bug (e.g. correction of a threshold). But for changes that require more than roughly 60 minutes of work, things need to be broken down. This will be looked at in detail in the next bullet point.

Especially when starting with a VCS, people will quite often miss to commit when they have completed a somewhat discrete piece of work. That is normal and happens to everybody. Even today, with more than ten years of experience on the subject, I still sometimes miss the point. Adding the step of committing a set of changes to your work routine, is something that really takes time. It is a bit like re-ordering your morning routine in the bathroom. Most people do things in the exact same order every day. Changing something there is just as difficult as performing a commit “automatically”.

What to do when you realize your miss, depends on the circumstances. If this is your personal pet project, you may just virtually slap yourself on the head and continue or do the infamous “WIP” commit. But if this a critical project for you organization and you collaborate with others, you need to undo the last couple of changes until you are back where you should have performed the commit in the first place. Yes, this is cumbersome and feels like a waste of time, especially if you are working under time pressure, i.e. always.

But there is no alternative and anyone who says differently (typically project managers without a solid background in software development) is just completely wrong. Because you need to be able to understand exactly who performed what change to the code base and when. But with messy commits this will not work in practice. Or to rephrase in management speak: It is much more time-consuming and error-prone to go through untidy changes every single time you try find something in the VCS, than to spend the effort only once and correct things. 

Split Up Larger Work Items

In many cases the effort to implement a new feature or fix a really nasty bug will exceed let’s say 60 minutes. In those cases the developer should have a rough a plan how the overall work be structured. For a new feature this could mean something like:

  1. Add test-cases that pass for the current implementation
  2. Re-factor in preparation without changing behavior
  3. Add test-cases for new feature
  4. Implement first half of new feature but ensure that it cannot be executed yet (think feature-toggle here)
  5. Finish new feature and enable execution

Working Code

The example above for how to structure the implementation of something larger has a critical aspect to it. Which is that at every point in time the code in the VCS must be in a consistent and operational (=deployable) state. If things look different (i.e. some parts are not working every now and then) in your development environment, as opposed to the VCS, that is ok. Although it has proven to make life easier when both the VCS and your environment do not stray too far apart from each other.

What I discovered for myself is that the approach has a really nice by-product: cleaner and more stable code. In hindsight I cannot say when this materialized for me. So there is a small chance that from a clean code perspective things got worse before they got better. But my gut feeling tells me that this was not the case. Because an always-working code also means a better structured code, which is by definition more stable due to reduced complexity (relative to a messy codebase).

Fix Immediately

This has been written about many times and I merely mention it for completeness here. Whenever a change breaks the code, and thus causes automated tests to fail, the highest priority is to get things back into a working state. No exceptions ever!

When NOT to Commit

A VCS is not a backup system for your code but a VCS. This also means that you should not simply commit at the end of the day before you go home, unless your code happens to be in a working state. Otherwise, if you feel the need or are obliged to do so, have a backup location and/or script that handles this. But please do not clutter the VCS with backups.

At least in the early days of CI (the early 2000s) it was a somewhat common phenomenon at the beginning of projects that at the end of the day people checked in whatever they had done so far and went home. In many cases this broke the code and tests failed on the CI server. Until the next morning it was not possible for others to work effectively because you cannot reasonably integrate further changes with an already broken codebase. That is bad enough if people are located in one timezone. But think about the effect it has on an organization that works with a follow-the-sun approach.

Commit Messages

The reason for commit messages, in addition to the technical details that the VCS records anyway, is to describe the intent of the change. It does not make sense to list technical details, because those can always be retrieved with much more precision from the VCS log. But why you performed the sum of those changes is usually hard to extract from the technical delta. So think about how you would describe the change in a way that allows you to understand things when you look at them in six months.

In Closing

These are just a few point I learned over the years and have been able to validate with various projects. They are practical and provide, in my view, a good balance between the ideal world and the reality you find in many larger organizations. Please let know if you agree or (more importantly!) disagree.

Related posts:

Structuring a VCS Repository

My main programming hobby project will soon celebrate its tenth birthday, so I thought a few notes on how I structure my VCS repository might be of interest. The VCS I have been using since the beginning is Subversion. (When I started, Git had already been released, but was really not that popular yet.) So while some details of this article will be specific to Subversion, the general concepts should be applicable elsewhere as well.

When it comes to the structure of a repository, Subversion does not impose anything from a technical point of view. All it sees is a kind-of file system, with all the pros and cons that come with the simplicity of this approach. It makes it easy for people to start using it, which is really good. But it also does not offer help for more advanced use-cases, so that people need to find a way how to map certain requirements onto that file system concept.

As a result, a convention has emerged and been there for many years now. It says that at the top-level of the project there should be only the following folders:

  • trunk: Home of the latest version (sometimes called the HEAD revision)
  • branches: Development sidelines where work happens in isolation from trunk
  • tags: Snapshots that give meaningful names to a certain revision

You will find plenty of additional information on the subject when searching the Internet. I can also recommend the book “Pragmatic Version Control: Using Subversion“, although it seems to be out of print now.

With these general points out of the way, let me start with how I work on my project. There are only a few core rules and despite their simplicity I can handle all situations.

  • The most important aspect for the structure of my SVN repository is that all active development on the coming version happens at trunk. See this article for all the important details, why you really want to follow that approach in almost all cases.
  • Once a new version is about to be released, I need a place where bug-fixes can be developed. So I create a release branch (e.g. /branches/releases/v1.3) with major and minor version number but not the patch version (I use semantic versioning). From this release branch I then cut the release (v1.3.0 in this case) by pointing the release job of my CI server to the release branch.
  • Once the release is done, I create a tag that also includes the patch version. In this example the tag will be from /branches/releases/v1.3 to /tags/releases/v1.3.0 .
  • Now I return to working on the next release (v1.4) by switching back to trunk.
  • Bug fixing happens primarily on trunk with fixes being back-ported to released versions. There are cases when this is not practical, of course. The two main reasons are that significant structural changes were done on trunk (you do refactor, don’t you?) or another change has implicitly removed the bug there already. But that is the exception.
  • When a bug-fix is needed on a released version, I temporarily switch my working copy to the release branch and do the respective work there. Unless the bug is critical I do not release a new version immediately after that, though. So this may repeat a few times, before the maintenance release.
  • The maintenance release is then cut, again, from /branches/releases/v1.3 . And after that a new tag is created to /tags/releases/v1.3.1 .

Those rules have proven to be working perfectly and I hope they will continue to do so for the next ten years. I have been quite lucky in that, although for me this is still a hobby project, the result is used by many global companies and organizations in a business-critical context. There are at least 11.000 installations in production that I am aware about, so I cannot be casual about reliability of the delivery process.

 

Tooling for Agile and Traditional Development Methodologies

A hot topic of the last few years has been the debate as to whether traditional (aka waterfall-like) methodologies or agile ones (XP, SCRUM, etc.) deliver better results. Much of the discussion that I am aware of has focused on things like

  • Which approach fits the organization?
  • How strategic or tactical (both terms usually go undefined) is the project and how does this affect the suitability of one approach over the other?
  • What legal and compliance requirements must be taken into account?
  • How large and distributed is the development team?

This is all very important stuff and thinking about it is vital. Interestingly, though, what has largely been ignored, at least in the articles I have come across, is the tooling aspect. A methodology without proper tool support has relatively little practical value. Well, of course the tools exist. But can they effectively be used in the project? In my experience this is mostly not the case, when we speak about the “usual suspects” for requirements and test management. The reason for that is simply money. It comes in many incarnations:

  • Few organizations have enterprise licenses for the respective tools and normally no budget is available for buying extra licenses for the project. The reason for the latter is either that this part of the budget was rejected, or that it was forgotten altogether.
  • Even if people are willing to invest for the project, here comes the purchasing process, which in itself can be quite prohibitive.
  • If there are licenses, most of these comprehensive tools have a steep learning curve (no blame meant, this is a complicated subject).
  • No project manager, unless career-wise suicidal, is willing to have his budget pay for people getting to know this software.
  • Even if there was budget (in terms of cash-flow), it takes time and often more than one project to obtain proficiency with the tools.

Let’s be clear, this is not product or methodology bashing. It is simply my personal, 100% subjective experience from many projects.

Now let’s compare this with the situation for Version Control Systems (VCS). Here the situation looks quite different. Products like Subversion (SVN) are well-established and widely used. Their value is not questioned and every non-trivial project uses them. Why are things so different here and since when? (The second part of the question is very important.) VCSes have been around for many years (RCS, CVS and many commercial ones) but none of them really gained the acceptance that SVN has today. I cannot present a scientific study here but my gut feeling is that the following points were crucial for this:

  • Freely available
  • Very simple to use, compared to other VCS. This causes issues for more advanced use-cases, especially merging, but allows for a fast start. And this is certainly better than avoiding a VCS in the first place.
  • Good tool suppport (e.g. TortoiseSVN for Windows)

Many people started using SVN under the covers for the aforementioned reasons and from there it gradually made its way into the official corporate arena. It is now widely accepted as the standard. A similar pattern can be observed for unit-testing (as opposed to full-blown integrating and user acceptance testing):  Many people use JUnit or something comparable with huge success. Or look at Continuous Integration with Hudson. Cruise Control was around quite a bit longer but its configuration was perceived to be cumbersome. And on top of its ease-of-use Hudson added something else: extensibility via plug-ins. The Hudson guys accepted upfront that people would want to do more than what the core product could deliver.

All these tools were designed bottom-up coming from people who knew exactly what they needed. And by “sheer coincidence” much of this stuff is what’s needed for an agile approach. My hypothesis is that more and more of these tools (narrow scope, free, extensible) will be coming and moving up the value chain. A good example is the Framework for Integrated Test that addresses user acceptance tests. As this happens and integration of the various tools at different levels progresses, the different methodologies will also converge.

USVN with CentOS 5

If you are looking for a Subversion web interface, chances are you come across USVN (User-friendly SVN). I first used it in August 2009 during a complex proof-of-concept (PoC). The current version at the time was 0.7.2 and it was of great help. Nevertheless there were a few things missing, esp. LDAP support. So I was really happy to recently learn that the project is being continued (it is an end-of-studies project) and in fact one of the first new features is support for LDAP.

One of the challenges I came across during the installation was the systems check that reported “Subversion has not been detected”. This simply means that the Subversion client binary (svn) was not found on the search path (PATH). The reason for this in my case was the fact that I had done a custom installation of Subversion and not relied on the one that comes with CentOS. For details on this please check [cref 879 this post] where I also present a way to custom-define environment variables for the Apache web server. Here is the respective snippet with the search path added (my changes are in bold)start() {
echo -n $"Starting $prog: "
check13 || exit 1
LANG=$HTTPD_LANG LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/CollabNet_Subversion/lib PATH=$PATH:/opt/CollabNet_Subversion/bin daemon --pidfile=${pidfile} $httpd $OPTIONS
RETVAL=$?
echo
[ $RETVAL = 0 ] && touch ${lockfile}
return $RETVAL
}
With this amendment the system check passed just fine. It should be noted, however, that at least for v1.0.1 this check is not complete. E.g. it misses on PHP support for the database. So you most likely also want to install php-pdo and php-mysql:yum install php-pdo php-mysql SQLite did not work at a first try whereas MySQL did, so I went for the latter.

Use CollabNet Subversion with Regular Apache

CollabNet are providing up-to-date binary packages of Subversion for many platforms. In my case this is CentOS 5, which by itself only has a rather dated version of Subversion. So I downloaded and installed the client, server and extras packages from CollabNet. The server package comes with a bundled Apache and a pretty nice installation script. However, I wanted to use my regular Apache for hosting the Subversion repositories, which means that I had to include the Apache modules from the CollabNet installation. So here are the respective lines from /etc/httpd/conf/httpd.confLoadModule dav_svn_module /opt/CollabNet_Subversion/modules/mod_dav_svn.so
LoadModule authz_svn_module /opt/CollabNet_Subversion/modules/mod_authz_svn.so
Those modules require access to additional libraries from /opt/CollabNet_Subversion/lib, so Apache needs to be told to include this directory into the search path (LD_LIBRARY_PATH). The bold part in the below snippet from /etc/init.d/httpd shows what needs to be added:start() {
echo -n $"Starting $prog: "
check13 || exit 1
LANG=$HTTPD_LANG LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/CollabNet_Subversion/lib daemon --pidfile=${pidfile} $httpd $OPTIONS
RETVAL=$?
echo
[ $RETVAL = 0 ] && touch ${lockfile}
return $RETVAL
}
Simply sourcing in LD_LIBRARY_PATH does not work, because the daemon function calls a separate Bash instance. The only way to feed environment variables into Apache, was by prepending them as shown above. This is also the approach to take for extending the PATH variable (which I needed to do for including /opt/CollabNet_Subversion/bin).

Why Subversion’s “svn:externals” is bad

Subversion provides a property (svn:externals) to include references to other projects into a given location within your source code tree. This is pretty much the same as a symbolic link (symlink) in Unix/Linux. But while the usage of symlinks is good practice to de-couple things in the file system, it is just the other way around for svn:externals, at least in my opinion.

Interestingly enough, there is a number of sources that recommend its usage. I disagree here and strongly discourage people from making use of it for a number of reasons:

  1. It creates a lock-in into Subversion, because many other Version Control Systems (VCS) do not have a comparable feature. And even if they had, an automated migration will most likely be cumbersome, to say the least. One would have to find and extract all svn:externals properties, build a dependency tree (hopefully without circular dependencies) and and process things in the appropriate order. This is far from trivial!
  2. On a conceptual level svn:externals is not about version control but dependency management. This is even more important an argument than the lock-in effect.
    • Dependency management and version control are two entirely different things, which should not be mixed. Having a somewhat implicit mechanism to define the dependencies will make it easier for people to not have a clear understanding about this separation.
    • Dependencies are hidden and only show up during VCS operations. To find out a project’s dependencies, in theory one could dig through the repository with a special browser but this is not feasible for a large enough project.
    • Different dependencies can occur at different stages of an artifact’s life-cycle: compilation, unit testing, run-time etc. There is no way to reflect this requirement.
    • Other dependency management systems (e.g. Maven or Ivy for Ant) offer way more functionality and can be extended for additional requirements. Those customisations would have to go into hook scripts for Subversion (which, on top of things, would probably be OS-specific).
  3. Quite often the dependencies will be about artifacts that are not source code at all (usually third-party libraries). You may not want to have compiled artifacts in your VCS.
  4. Also, the dependency could be about source code that is maintained by an external organisation. If they are not using Subversion you could not link directly there but would have to set up a mirror internally. (Admittedly, you may want to do that anyway.)

I had used svn:externals when I started out with Subversion and have gone through quite some headaches since then because of that. Practically, most of them were around the lock-in effect. Nevertheless, I still think the conceptual argument is more important in the long run.