Tag Archives: DevOps

How to Implement Test-Driven Development

Test-Driven Development (TDD) is something I have long had difficulties with. Not because I consider it a bad concept, but found it very difficult to start doing. In hindsight it appears that the advice given in the respective books and online articles was not suitable. So here is the approach that finally worked for me.

It boils down to deviating from the pure doctrine. Instead of writing a test before starting on a new piece of code, I start with the actual code right away. Yes, that violates the core principle, although only for a while. But I have found that in most cases my understanding of the problem is still somewhat vague when I start working on it. So for my brain it is better if I do not have to split its capacity between solving the actual problem and thinking about how to devise a proper test and what all that means for the structure of the future code.

Once the initial version of the working code is there and manually validated, I do add the test. From then on I am in a position to refactor the code without the risk of breaking something. And of course this refactoring is needed because the first version of any code is never really good. While you could write “better” initial code, this would require spending more time upfront than you otherwise need for refactoring later. And it also ignores the fact that you only really understand the problem, when you have finished implementing the solution.

What I later realized was that my approach also helped me to write more testable code. But instead of consciously having to work on it, this sneaked in as a by-product of my modified way of doing TDD. For me this is a more natural way of learning and the results are typically better than following some formal approach.

Start Working with a Version Control System

Every so often I get asked about what to consider when introducing Continuous Integration (CI) to an organization. Interestingly though, most of the details discussed are about working with a version control system (VCS) and not CI itself. That is understandable because the VCS is the “gateway” for all developers. So here are my recommendations.

Use of Branches

It is important to distinguish between the goal (Continuous Integration) and the means (trunk-based development). Yes, it is possible to implement a system that facilitates frequent integration of code from various branches. On the other hand it is a considerably more complex approach than to simply work off trunk. So in most cases I would argue that simpler is better.

In any case I recommend to also look at using branches and can recommend this video on YouTube as a starting point. Whatever path you choose, it will always improve your understanding of the subject and you do not have to take my word for it.

Number of Commits

Most people that do not use a VCS will typically work through the day and create a file copy (snapshot-like) of their project in the evening just before they leave for the day. So it is a natural conclusion to transfer this approach like-for-like to the VCS. In practical terms this would mean to perform a single commit every day just before you go home. And the commit message would be similar to “Work for <DATE>” or “WIP”.

But instead of doing so, developers should commit as often as possible. In my experience 5 to 15 times for a full day of development work is a good rule-of-thumb. There will be exceptions, of course. But whenever you are far enough outside this ballpark-figure, you should analyze why that is.

Time to Commit

Instead of looking at time intervals, people should commit whenever the code has reached a stable state. Or in other words: It does not make sense to have people commit every 30 to 45 minutes. They should rather do this after e.g. having fixed a small bug (e.g. correction of a threshold). But for changes that require more than roughly 60 minutes of work, things need to be broken down. This will be looked at in detail in the next bullet point.

Especially when starting with a VCS, people will quite often miss to commit when they have completed a somewhat discrete piece of work. That is normal and happens to everybody. Even today, with more than ten years of experience on the subject, I still sometimes miss the point. Adding the step of committing a set of changes to your work routine, is something that really takes time. It is a bit like re-ordering your morning routine in the bathroom. Most people do things in the exact same order every day. Changing something there is just as difficult as performing a commit “automatically”.

What to do when you realize your miss, depends on the circumstances. If this is your personal pet project, you may just virtually slap yourself on the head and continue or do the infamous “WIP” commit. But if this a critical project for you organization and you collaborate with others, you need to undo the last couple of changes until you are back where you should have performed the commit in the first place. Yes, this is cumbersome and feels like a waste of time, especially if you are working under time pressure, i.e. always.

But there is no alternative and anyone who says differently (typically project managers without a solid background in software development) is just completely wrong. Because you need to be able to understand exactly who performed what change to the code base and when. But with messy commits this will not work in practice. Or to rephrase in management speak: It is much more time-consuming and error-prone to go through untidy changes every single time you try find something in the VCS, than to spend the effort only once and correct things. 

Split Up Larger Work Items

In many cases the effort to implement a new feature or fix a really nasty bug will exceed let’s say 60 minutes. In those cases the developer should have a rough a plan how the overall work be structured. For a new feature this could mean something like:

  1. Add test-cases that pass for the current implementation
  2. Re-factor in preparation without changing behavior
  3. Add test-cases for new feature
  4. Implement first half of new feature but ensure that it cannot be executed yet (think feature-toggle here)
  5. Finish new feature and enable execution

Working Code

The example above for how to structure the implementation of something larger has a critical aspect to it. Which is that at every point in time the code in the VCS must be in a consistent and operational (=deployable) state. If things look different (i.e. some parts are not working every now and then) in your development environment, as opposed to the VCS, that is ok. Although it has proven to make life easier when both the VCS and your environment do not stray too far apart from each other.

What I discovered for myself is that the approach has a really nice by-product: cleaner and more stable code. In hindsight I cannot say when this materialized for me. So there is a small chance that from a clean code perspective things got worse before they got better. But my gut feeling tells me that this was not the case. Because an always-working code also means a better structured code, which is by definition more stable due to reduced complexity (relative to a messy codebase).

Fix Immediately

This has been written about many times and I merely mention it for completeness here. Whenever a change breaks the code, and thus causes automated tests to fail, the highest priority is to get things back into a working state. No exceptions ever!

When NOT to Commit

A VCS is not a backup system for your code but a VCS. This also means that you should not simply commit at the end of the day before you go home, unless your code happens to be in a working state. Otherwise, if you feel the need or are obliged to do so, have a backup location and/or script that handles this. But please do not clutter the VCS with backups.

At least in the early days of CI (the early 2000s) it was a somewhat common phenomenon at the beginning of projects that at the end of the day people checked in whatever they had done so far and went home. In many cases this broke the code and tests failed on the CI server. Until the next morning it was not possible for others to work effectively because you cannot reasonably integrate further changes with an already broken codebase. That is bad enough if people are located in one timezone. But think about the effect it has on an organization that works with a follow-the-sun approach.

Commit Messages

The reason for commit messages, in addition to the technical details that the VCS records anyway, is to describe the intent of the change. It does not make sense to list technical details, because those can always be retrieved with much more precision from the VCS log. But why you performed the sum of those changes is usually hard to extract from the technical delta. So think about how you would describe the change in a way that allows you to understand things when you look at them in six months.

In Closing

These are just a few point I learned over the years and have been able to validate with various projects. They are practical and provide, in my view, a good balance between the ideal world and the reality you find in many larger organizations. Please let know if you agree or (more importantly!) disagree.

Related posts:

Configuration Management – Part 9: The Audit Trail

Keeping track of  changes is a critical functionality in every configuration management system because there are legal requirements like  SOX (Sarbanes-Oxley Act) that require it. It can be accomplished in several ways. Basically you can either use an existing tool like a VCS (version control system) or have something custom-built.

When possible, I tend to prefer a VCS because it is (hopefully) already part of your process and governance approach. A typical workflow is that the underlying assets (i.e. configuration files) will be changed and then the VCS client be used to commit the change. The commit message allows to record the intent here, which is the critical information.

But there are cases when you need to be able to track things outside the VCS. In all cases I have seen so far the reason was that some information should not be maintained within the VCS for security or operational reasons. While organizations are often relaxed about data like host names in non-PROD environments, this changes abruptly when PROD comes into play. While I always think “security by obscurity” when I have that discussion, it is also a fight not worth having.

The other reason is operational procedures. The operations team often has a well-established approach that maintains configuration files for many applications in a unified way. The latter typically involves a dedicated location on network storage where configuration data sit. Ideally, there should also be a generic mechanism to track changes here. A dedicated VCS is of course a good option, but operations staff without a development background often (rightly) shy away from that route.

So it comes down to what the configuration management system itself offers. What I have implemented in WxConfig is a system where every operation that changes configuration data results in an audit event that gets persisted to disk. It includes metadata (e.g. what user initiated the change from which IP address), the actual change (e.g. file save from UI or change of value via API), and the old and new version of the affected configuration file.

The downside compared to a well-chosen commit message for VCS is that the system cannot record the intent. But on the other hand no change is lost, because no manual activity is needed. In practice this far outweighs the missing intent, at least for me. Also it has proven to be helpful during development when I had accidentally removed data. It was far easier to restore the latter from an audit record compared to looking them up in their original source.

All audit data get persisted to files and the metadata is recorded as XML. That allows automated processing, if required by e.g. a GRC system (Governance, Risk Management, and Compliance) or legal frameworks like the aforementioned Sarbanes-Oxley Act.

Configuration Management – Part 7: The Time

An often overlooked aspect of configuration management is time. There are many scenarios where the correct value is dependent on time (or date for that matter). Typical examples are organizational structures (think “re-org”) or anything that is somehow related to legislation (every change in law or regulation becomes active at a certain time).

There are several approaches to deal with this situation:

  • Deployment: This is the simplest way and its ease comes with a number of drawbacks. What you basically do is make sure that a deployment happens exactly at the time when the changes should take effect. Obviously this collides with a typical CD (Continuous Deployment) setup. And while it is certainly possible to work around this, the remaining limitations usually make it not worthwhile. As to the latter, what if you need to process something with historical data? A good example is tax calculation, when you need to re-process an invoice that was issued before the new VAT rate got active. Next, how do you perform testing? Again, nothing impossible but  a case of having to find ways around it. And so on…
  • Feature toggle: Here the configuration management solution has some kind of scheduler to ensure that at the right point in time the cut-over to the new values is being made. This is conceptually quite similar to the deployment approach, but has the advantage of not colliding with CD. But all the other limitations mentioned above still apply. WxConfig supports this approach.
  • Configuration dimension: The key difference here is that time is explicitly modeled into the configuration itself. That means every query not only has the normal input, usually a key or XPath expression, but also a point in time (if left empty just take the current time as default). This gives full flexibility and eliminates the aforementioned limitations. Unfortunately, however, very few people take this into consideration when they start their project. Adding it later is always possible, but of course it comes with an over-proportional effort compared to doing it right from the start. WxConfig will add support for this soon (interestingly no one asked for it so far).

That was just a quick overview, but I hope it provided some insights. Comments are very welcome (as always).

 

webMethods Integration Server: Continuous Deployment

For more than nine years I have been working on a package for webMethods Integration Server. With the experience gained there, I want to discuss a number of aspects about Continuous Deployment.

Versioning

I recommend the use of semantic versioning, which at its core is about the following (for a lot of additional details, just follow the link):

  • The version number consists of three parts: Major, minor, and patch (example v1.4.2).
  • An increase in the major version indicates a non-backwards-compatible change.
  • An increase in the minor version indicates a backwards-compatible change.
  • A increase in the patch version indicates bug fixes only, no functional changes at all.

It is a well-known approach and makes it very easy for everyone to derive the relevant aspects from just looking at the version number. If the release in question contains bug fixes for something you have in use, it is probably a good idea to have a close look and check if a bug relevant to you was fixed. If it is a minor update and thus contains improvements while being backwards compatible, you may want to start thinking about a good time to make the switch. And if it is major update that (potentially) breaks things, a deeper look is needed.

Each Integration Server package has two attributes for holding information about its “version”. One is indeed the version itself and the other is the build. The latter is by default populated with in auto-increase number, which I find not very helpful. Yes, it gives me a unique identifier, but one that does not hold any context. What I put into this field instead is a combination of a date-time-stamp and the change set identifier from the Version Control System (VCS). This allows me to see at a glance when this package was built and what it contains.

Build

Conceptually the work on a single package, as opposed to a set of multiple ones that comprise one application, is a bit different in that you simply deal with only one artifact that gets released. In many projects I have seen, people take a slightly different approach and see the entirety of the project as their to-be-released artifact. This approach is supported by how the related tools (Asset Build Environment and Deployer) work: You simply throw in the source code for several packages, create an archive with metadata (esp. dependencies), and deploy it. Of course you could do this on a per-package basis. But it is easier to have just one big project for all of them.

Like almost always in live, nothing comes for free. What it practically means is that for every change in only one of the potentially many packages of the application, all of them need a re-deployment. Suppose you have an update in a maintenance module that is somewhat unrelated to normal daily operation. If you deploy everything in one big archive, this will effectively cause an outage for your application. So you just introduced a massive hindrance for Continuous Deployment. Of course this can be mitigated with blue-green deployments and you are well advised to have that in place anyway. But in reality few customers are there. What I recommend instead is an approach where you “cut” your packages in such a way that they each of them performs a clearly defined job. And then you have discrete CI job for each of them, of course with the dependencies taken into account.

Artifact Storage

Once your build has been created, it must reside somewhere. In a plain webMethods environment this is normally the file system, where the build was performed by Asset Build Environment (ABE). From there it would be picked up by Deployer and moved to the defined target environment(s). While this has the advantage of being a quite simple setup, it also has the downside that you loose the history of your builds. What you should do instead, is follow the same approach that has been hugely successful for Maven: use a binary repository like Artifactory, Nexus, or one of the others (a good comparison can be found here). I create a ZIP archive of the ABE result and have Jenkins upload it to Artifactory using the respective plugin.

To have the full history and at the same time a fixed download location, I perform this upload twice. The first contains a date-time-stamp and the change set identifier from the Version Control System (VCS), exactly like for the package’s build information. This is used for audit purposes only and gives me the full history of everything that has ever been built. But it is never used for actually performing the deployment. For that purpose I upload the ZIP archive a second time, but in this case without any changing parts in the URL. So it effectively makes it behave a bit like a permalink and I have a nice source for download. And since the packages themselves also contain the date-time stamp and change set identifier, I still know where they came from.

Deployment

Depending on your overall IT landscape there are two possible approaches for handling deployments. The recommended way is to use a general-purpose configuration management tool like Chef, Puppet, Ansible, Salt, etc. This should then also be the master of your webMethods deployments. Just point your script to your “permalink” in the binary reposiory and take it from there. I use Chef and its remote file mechanism. The latter nicely detects if the archive has changed on Artifactory and only then executes the download and deployment.

You can also develop your own scripts to do the download etc., and it may appear to be easier at a first glance. But there is a reason that configuration management tools like Chef et al. have had such success over the last couple years, compared to home-grown scripts. In my opinion it simply does not make sense to spend the time to re-invent the wheel here. So you should invest some time (there are many good videos on YouTube about this topic) and figure out which system is best for your needs. And if you still think that you will be faster with your own script, chances are that you overlooked some requirements. The usual suspects are logging, error handling, security, user management, documentation, etc.

With either approach this makes deployment a completely local operation and that has a number of benefits. In particular you can easily perform any preparatory work like e.g. adjusting content of files, create needed directories, etc.

Summary

All in all, this approach has worked extremely well for me. While it was first developed for an “isolated” utility package, it has proven to be even more useful for entire applications, comprised of multiple packages; in other words, it scales well.

Another big advantages is separation of concerns. It is always clear which activity is done by what component. The CI server performs the checkout from VCS and orchestrates the build and upload to the binary repository. The binary repository holds the deployable artifact and also maintains an audit trail of everything that has ever been built. The general-purpose configuration management tool orchestrates the download from the binary repository and the actual deployment.

With this split of the overall process into discrete steps, it is easier to extend and especially to debug. You can “inject” additional logic (think user exits) and especially implement things like blue-green deployments for a zero-downtime architecture. The latter will require some upfront thinking about shared state, but this is a conceptual problem and not specific to Integration Server.

One more word about scalability. If you have a big-enough farm of Integration Servers running (and some customers have hundreds of them), the local execution of deployments also is much faster than doing it from a central place.

I hope you find this information useful and would love to get your thoughts on it.

Configuration Management – Part 6: The Secrets

Every non-trivial application needs to deal with configuration data that require special protection. In most cases they are password or something similar. Putting those items into configuration files in clear is a pretty bad idea. Especially so, because these configuration files are almost always stored in a VCS (version control system), that many people have access to.

Other systems replace clear-text passwords found in files automatically with an encrypted version. (You may have seen cryptic values starting with something like {AES} in the past.) But apart from the conceptual issue that parts of the files are changed outside the developer’s control, this also not exactly an easy thing to implement. How do you tell the system, which values to encrypt? What about those time periods that passwords exist in clear text on disk, especially on production systems?

My approach was to leverage the built-in password manager facility of webMethods Integration Server instead. This is an encrypted data store that can be secured on multiple levels, up to HSMs (hardware security module). You can look at it as an associative array (in Java usually referred to as map) where a handle is used to retrieve the actual secret value. With a special syntax (e.g. secretValue=[[encrypted:handleToSecretValue]]) you declare the encrypted value. Once you have done that, this “pointer”will of course return no value, because you still need to actually define it in the password manager. This can be done via web UI, a service, or by importing a flat file. The flat file import, by the way, works really well with general purpose configuration management systems like Chef, Puppet, Ansible etc.

A nice side-effect of storing the actual value outside the regular configuration file is that within your configuration files you do not need to bother with the various environments (add that aspect to the complexity when looking at in-file encryption from the second paragraph). Because the part that is environment-specific is the actual value; the handle can, and in fact should, be the same across all environments. And since you define the specific value directly within in each system, you are already done.

Configuration Management – Part 5: The External World

A standard requirement in configuration management is using values that already exist somewhere else. The most obvious places are environment variables (defined by the operating system) and Java system properties. Other are existing (!) databases, e.g. from an ERP system, or files.

The reason for referencing values directly from their original sources instead of duplicating them in a copy-paste fashion is the Don’t-Repeat-Yourself (DRY) principle. Although the latter is typically discussed in the context of code, it applies to configuration at least as well. We all know those “great” applications, which require manual updates of the same value in different places. And if you miss only one, all hell breaks loose.

For re-use of values within a file, the standard approach is variable interpolation. A well-known syntax for that comes from the build tool Apache Ant, where ${variableName} can be placed anywhere into a value assignment. Apache Commons Configuration, and therefore also WxConfig, support this syntax. In WxConfig you can even reference values from other files that belong to the same Integration Server package.

But what to do for other sources? Well, for the aforementioned environment variables and Java system properties, the respective interpolators from Apache Commons Configuration can also be used in WxConfig. But the latter also defines several interpolators of its own.

  • Cross package: Assuming you have an Integration Server package that holds some general values (often referred to as “global”), those can be referenced. (There are multiple ways to deal with truly global values and the specifics really determine which one is used best.)
  • Current date/time: Gets the current date and/or time, which is admittedly a bit of an edge-case. One could argue that this is not really a configuration value, and that is correct. However, there are scenarios when the ability to quite easily have such a value comes in very handy. Think about files that get created during processing of data. Instead of manually concatenating the path, the base filename, the date/time stamp and the file suffix in the code, you could just have something like this in your configuration file: workerFile=${tmpDir}/appFile_${date:yyyyMMdd-HHmmssSSS}.dat Doesn’t that make things a lot cleaner to read?
  • File content: Similar to date/time this was added primarily for convenience and clarity reasons. The typical use-case are templates, where a file (possibly maintained by another application) is just read directly into a configuration value.
  • Code invocation: When all else fails, you can have an arbitrary piece of logic be executed and the result be placed into the configuration value.

It is worth noting that all interpolators resolve at invocation. So you always get current results. In the case of the file content interpolator, this has the downside of file I/O; but if that really becomes a bottleneck, you are still not worse off than when having the read somewhere else in the code. And the file system caching is also still there …

With this set of options, pretty much all requirements for a redundancy-free configuration management can be met.