For the fans of Uncle Bob (aka Robert C. Martin) here is another interesting video.
A while ago Chef Software announced that they would move all source code to the Apache 2.0 license (see announcement for details), which is something I welcome. Not so much welcomed by many was the fact that they also announced to stop “free binary distributions”. In the past you could freely download and use the core parts of their offering, if that was sufficient for your needs. What upset many people was that the heads-up period for this change was rather short and many answers were left open. It also did not help that naturally their web site held many references to the old model, so people were confused.
In the meantime it seems that Chef has loosened their position on binary distributions a bit. There is now a number of binaries that are available under the Apache 2.0 license and they can be found here. This means that you can use Chef freely, if you are willing to compromise on some features. Thanks a lot for this!
This post will describe what I did to set up a fresh Chef environment with only freely available parts. You need just two things to get started with Chef: the server and the administration & development kit. The latter goes by the name of ChefDK and can be installed on all machines on which development and administration work happens. It comes with various command line tools that allow you to perform the tasks needed.
Interestingly, you will find almost no references to ChefDK on the official web pages. Instead its successor “Chef Workstation” will be positioned as the tool to use. There is only one slight problem here: The latest free version is pretty old (v0.4.2) and did not work for me, as well as various other people. That was when I decided to download the latest free version of ChefDK and give it a try. It worked immediately and since I had not needed any of the additional features that come with Chef Workstation, I never looked back.
No GUI is part of those free components. Of course Chef offer such a GUI (web-based) which is named Chef Management Console. It is basically a wrapper over the server’s REST API. Unfortunately the Management Console is “free” only up to 25 nodes. For that reason, but also because its functionality is somewhat limited compared to the command line tools, I decided to not cover it here.
Please check the licenses by yourself, when you follow the instructions below. It is solely your own responsibility to ensure compliance.
Below you will find a description of what I did to get things up and running. If you have a different environment (e.g. use Ubuntu instead of CentOS) you will need to check the details for your needs. But overall the approach should stay the same.
The environment I will use looks like this
- Chef server: Linux VM with CentOS 7 64 bit (minimal selection of programs)
- Chef client 1: Linux VM like for Chef server
- Development and administration: Windows 10 Pro 64bit (v1909)
I am not sure yet whether I will expand this in the future. If you are interested, please drop a comment below.
Please check that your system meets the prerequisites for running Chef server.
The download is a bit tricky, since we don’t want to end up with something that falls under a commercial license. As of this writing (April 2020) the following component binaries are the latest that come under an Apache 2.0 license. I verified the latter by clicking at “License Information” underneath each of the binaries that I plan to use.
- Chef Infra Server: v12.19.31 (go here to check for changes)
- Chef DK: 3.13.1 (go here to check for changes)
As to the download method Chef offer various methods. Typically I would recommend to use the package manager of your Linux distribution, but this will likely cause issues from a license perspective sooner or later.
Server Installation and Initial Setup
So what we will do instead is perform a manual download by executing the following steps (they are a sub-set of the official steps and all I needed to do on my system):
- All steps below assume that you are logged in as root on your designated Chef server. If you use
sudo, please adjust accordingly.
- Ensure required programs are installed
yum install -y curl wget
- Open ports 80 and 443 in the firwall
firewall-cmd --permanent --zone public --add-service http && firewall-cmd --permanent --zone public --add-service https && firewall-cmd --reload
- Disable SELinux
- Download install script from Chef (more information here)
curl -L https://omnitruck.chef.io/install.sh > chef-install.sh
- Make install script executable
chmod 755 chef-install.sh
- Download and install Chef server binary package: The RPM will end up somewhere in
/tmpand be installed automatically for you. This will take a while (the download size is around 243 MB), depending on your Internet connection’s bandwidth.
./chef-install.sh -P chef-server -v "12.19.31"
- Perform initial setup and start all necessary components, this will take quite a while
- Create admin user
chef-server-ctl user-create USERNAME FIRSTNAME LASTNAME EMAIL 'PASSWORD' --filename USERNAME.pem
- Create organization
chef-server-ctl org-create ORG_SHORT_NAME 'Org Full Name' --association-user USERNAME --filename ORG_SHORT_NAME-validator.pem
- Copy both certificates (
ORG_SHORT_NAME-validator.pem) to your Windows machine. I use FileZilla (installers without bloatware can be found here) for such cases.
ChefDK Installation and Initial Setup
What I describe below is a condensed version of what worked for me. More details can be found on the official web pages.
- I use
$HOMEin the context below to refer to the user’s home directory on the Windows machine. You must manually translate it to the correct value (e.g.
C:\Users\chrisin my case).
- Download the latest free version of ChefDK for Windows 10 from here and install it
- Check success of installation by running the following command from a command prompt:
- Create directory and base version of configuration file for connectivity by running
knife configure(it may look like it hangs, just give it some time)
- Add your server’s certificate (self-signed!) to the list of trusted certificates with
knife ssl fetch
- Verify that things work by executing
knife environment list, it should return
_defaultas the only existing environment
- The generated configuration file was named
$HOME/.chef/credentialsin my case and I decided to rename it
config.rb(which is the new name in the official documentation) and also update the contents:
- Remove the line with
[default]at the beginning which seemed to cause issues
knife[:editor] = '"C:\Program Files\Notepad++\notepad++.exe" -nosession -multiInst'as the Windows equivalent of setting the
EDITORenvironment variable on Linux.
- Remove the line with
We will create a very simple project here
- Go into the directory where you want all your Chef development work to reside (I use
$HOME/src; the comment regarding the use of
$HOMEfrom above still applies) and open a command prompt
- Create a new Chef repo (where all development files live)
chef generate repo chef-repo(chef-repo is the name, you can of course change that)
- You will see that a new directory (
$HOME/src/chef-repo) has been created with a number of files in it. Among them is
./cookbooks/example, which we will upload as a first test. Cookbooks are where instructions are stored in Chef.
- To be able to upload it, the cookbook path must be configured, so you need to add to
$HOME/.chef/config.rbthe following line:
- You can now upload the cookbook via
knife cookbook upload example
In order to have the cookbook executed you must now add it to the recipe list (they take the cooking theme seriously at Chef) of the machines, where you want it to run. But first you must bootstrap this machine for Chef.
- The bootstrap happens with the following command (I recommend to check all possible options by via
knife bootstrap --help) executed on your Windows machine :
knife bootstrap MACHINE_FQDN --node-name MACHINE_NAME_IN_CHEF --ssh-user root --ssh-password ROOT_PASSWORD
- You can now add the recipe to the client’s run-list for execution:
knife node run_list add MACHINE_NAME_IN_CHEF example
and should get a message similar to
- You can now check the execution by logging into your client and execute
root. It will also be executed about every 30 minutes or so. But checking the result directly is always a good idea after you changed something.
Congratulation, you can now maintain your machines in a fully automated fashion!
Another video that I found interesting
Very interesting talk that puts emphasis not only on technical but also organizational and social aspects.
In this post I look at things to consider when an organization wants to introduce Continuous Integration (CI). As in so many other situations the non-technical challenges are more difficult to solve than some nitty-gritty details.
Start Small Right Now
If ever there was a place for the proverb “the better is the enemy of the good” it is here. Waiting days, weeks, or months because you have not sorted out all details is the worst you can do. Instead you should start immediately by just installing a CI server (Jenkins is the de facto standard) and set up a simple job that does nothing but check out the source code from the VCS and compile it.
Most development teams that have not used CI so far are probably operating in a more or less non-agile fashion. That is fine and can stay as it is! Because while CI is virtually a prerequisite for agile development, that does absolutely not mean that teams following a waterfall model will not benefit considerably from CI.
So establishing CI can but does not have to be the first step of moving towards agile development. In fact I would argue that introducing CI is a large-enough step for an existing development organization. Only when this has been “digested”, you should think about moving towards agile. Otherwise too many things would be changed in parallel, similar to combining a new release of your own software with an upgrade of the underlying platform, e.g. the database server.
Frequency of Builds
This is the only part where I strongly recommend that you start at full throttle. What I mean by that is that you resist the temptation to run your builds only once a day or even less frequently. Ideally, every commit into the VCS triggers a build via a post-commit hook (here is more information for Git and Subversion). But polling the VCS every e.g. 10 minutes is a good-enough approximation in most cases. And it is also a little bit easier to set up when you just start on the whole topic.
Why am I so adamant on this particular point? I think that almost-instant feedback is at the very core of CI and the only way to deliver it is by running the build. All the points below change the amount of details that are provided or reduce the risk of introducing bugs into the code. But this hugely powerful feeling you get after your first commit triggers a build, is the important aspect for successful adoption in my view.
Start with “compilation works” as the lowest common denominator. When you want to start adding the use of “proper” test frameworks, feel free to do so. But is nothing you need on day one.
When you are ready to do more, you need to focus on those parts of your code that are most relevant for the business. Resist the temptation of striving for large test coverage of your code for the sake of it (having a KPI on this is a really bad idea). Otherwise people will start writing test for trivial helper functions, testing which on their own is of low relevance.
Instead take the critical parts of the business logic and develop a way to test them end-to-end (if possible without the GUI yet). With this approach you will implicitly cover all the lower-level stuff underneath automatically. Unless you have someone on your team with practical experience on integration testing frameworks (e.g. Citrus), I would not start with a full-blown approach but rather develop a few custom scripts.
The point in time when to start with more advanced topics, especially automated performance tests, depends on your individual situation and I will not make recommendations about it here. But what you should do as soon as possible, is read up on the subject and get an understanding about the different types of test and what they are good for. You do not need to implement everything now, but this will allow you to make informed judgements about the path you choose.
You should now have an idea how to get started with CI quickly and in a way that delivers positive results pretty much from day one. Gaining traction in the organization should be your first priority in the beginning. There is a widespread misconception that things like CI, while theoretically the right to do, slow developers down. Nothing could be further from the truth. But unless you fight this impression fiercely, sooner or later management will ask for by-passing that “nice new thing” and get code out of the code faster using the old way.
Test-Driven Development (TDD) is something I have long had difficulties with. Not because I consider it a bad concept, but found it very difficult to start doing. In hindsight it appears that the advice given in the respective books and online articles was not suitable. So here is the approach that finally worked for me.
It boils down to deviating from the pure doctrine. Instead of writing a test before starting on a new piece of code, I start with the actual code right away. Yes, that violates the core principle, although only for a while. But I have found that in most cases my understanding of the problem is still somewhat vague when I start working on it. So for my brain it is better if I do not have to split its capacity between solving the actual problem and thinking about how to devise a proper test and what all that means for the structure of the future code.
Once the initial version of the working code is there and manually validated, I do add the test. From then on I am in a position to refactor the code without the risk of breaking something. And of course this refactoring is needed because the first version of any code is never really good. While you could write “better” initial code, this would require spending more time upfront than you otherwise need for refactoring later. And it also ignores the fact that you only really understand the problem, when you have finished implementing the solution.
What I later realized was that my approach also helped me to write more testable code. But instead of consciously having to work on it, this sneaked in as a by-product of my modified way of doing TDD. For me this is a more natural way of learning and the results are typically better than following some formal approach.
Every so often I get asked about what to consider when introducing Continuous Integration (CI) to an organization. Interestingly though, most of the details discussed are about working with a version control system (VCS) and not CI itself. That is understandable because the VCS is the “gateway” for all developers. So here are my recommendations.
Use of Branches
It is important to distinguish between the goal (Continuous Integration) and the means (trunk-based development). Yes, it is possible to implement a system that facilitates frequent integration of code from various branches. On the other hand it is a considerably more complex approach than to simply work off
trunk. So in most cases I would argue that simpler is better.
In any case I recommend to also look at using branches and can recommend this video on YouTube as a starting point. Whatever path you choose, it will always improve your understanding of the subject and you do not have to take my word for it.
Number of Commits
Most people that do not use a VCS will typically work through the day and create a file copy (snapshot-like) of their project in the evening just before they leave for the day. So it is a natural conclusion to transfer this approach like-for-like to the VCS. In practical terms this would mean to perform a single commit every day just before you go home. And the commit message would be similar to “Work for <DATE>” or “WIP”.
But instead of doing so, developers should commit as often as possible. In my experience 5 to 15 times for a full day of development work is a good rule-of-thumb. There will be exceptions, of course. But whenever you are far enough outside this ballpark-figure, you should analyze why that is.
Time to Commit
Instead of looking at time intervals, people should commit whenever the code has reached a stable state. Or in other words: It does not make sense to have people commit every 30 to 45 minutes. They should rather do this after e.g. having fixed a small bug (e.g. correction of a threshold). But for changes that require more than roughly 60 minutes of work, things need to be broken down. This will be looked at in detail in the next bullet point.
Especially when starting with a VCS, people will quite often miss to commit when they have completed a somewhat discrete piece of work. That is normal and happens to everybody. Even today, with more than ten years of experience on the subject, I still sometimes miss the point. Adding the step of committing a set of changes to your work routine, is something that really takes time. It is a bit like re-ordering your morning routine in the bathroom. Most people do things in the exact same order every day. Changing something there is just as difficult as performing a commit “automatically”.
What to do when you realize your miss, depends on the circumstances. If this is your personal pet project, you may just virtually slap yourself on the head and continue or do the infamous “WIP” commit. But if this a critical project for you organization and you collaborate with others, you need to undo the last couple of changes until you are back where you should have performed the commit in the first place. Yes, this is cumbersome and feels like a waste of time, especially if you are working under time pressure, i.e. always.
But there is no alternative and anyone who says differently (typically project managers without a solid background in software development) is just completely wrong. Because you need to be able to understand exactly who performed what change to the code base and when. But with messy commits this will not work in practice. Or to rephrase in management speak: It is much more time-consuming and error-prone to go through untidy changes every single time you try find something in the VCS, than to spend the effort only once and correct things.
Split Up Larger Work Items
In many cases the effort to implement a new feature or fix a really nasty bug will exceed let’s say 60 minutes. In those cases the developer should have a rough a plan how the overall work be structured. For a new feature this could mean something like:
- Add test-cases that pass for the current implementation
- Re-factor in preparation without changing behavior
- Add test-cases for new feature
- Implement first half of new feature but ensure that it cannot be executed yet (think feature-toggle here)
- Finish new feature and enable execution
The example above for how to structure the implementation of something larger has a critical aspect to it. Which is that at every point in time the code in the VCS must be in a consistent and operational (=deployable) state. If things look different (i.e. some parts are not working every now and then) in your development environment, as opposed to the VCS, that is ok. Although it has proven to make life easier when both the VCS and your environment do not stray too far apart from each other.
What I discovered for myself is that the approach has a really nice by-product: cleaner and more stable code. In hindsight I cannot say when this materialized for me. So there is a small chance that from a clean code perspective things got worse before they got better. But my gut feeling tells me that this was not the case. Because an always-working code also means a better structured code, which is by definition more stable due to reduced complexity (relative to a messy codebase).
This has been written about many times and I merely mention it for completeness here. Whenever a change breaks the code, and thus causes automated tests to fail, the highest priority is to get things back into a working state. No exceptions ever!
When NOT to Commit
A VCS is not a backup system for your code but a VCS. This also means that you should not simply commit at the end of the day before you go home, unless your code happens to be in a working state. Otherwise, if you feel the need or are obliged to do so, have a backup location and/or script that handles this. But please do not clutter the VCS with backups.
At least in the early days of CI (the early 2000s) it was a somewhat common phenomenon at the beginning of projects that at the end of the day people checked in whatever they had done so far and went home. In many cases this broke the code and tests failed on the CI server. Until the next morning it was not possible for others to work effectively because you cannot reasonably integrate further changes with an already broken codebase. That is bad enough if people are located in one timezone. But think about the effect it has on an organization that works with a follow-the-sun approach.
The reason for commit messages, in addition to the technical details that the VCS records anyway, is to describe the intent of the change. It does not make sense to list technical details, because those can always be retrieved with much more precision from the VCS log. But why you performed the sum of those changes is usually hard to extract from the technical delta. So think about how you would describe the change in a way that allows you to understand things when you look at them in six months.
These are just a few point I learned over the years and have been able to validate with various projects. They are practical and provide, in my view, a good balance between the ideal world and the reality you find in many larger organizations. Please let know if you agree or (more importantly!) disagree.
Keeping track of changes is a critical functionality in every configuration management system because there are legal requirements like SOX (Sarbanes-Oxley Act) that require it. It can be accomplished in several ways. Basically you can either use an existing tool like a VCS (version control system) or have something custom-built.
When possible, I tend to prefer a VCS because it is (hopefully) already part of your process and governance approach. A typical workflow is that the underlying assets (i.e. configuration files) will be changed and then the VCS client be used to commit the change. The commit message allows to record the intent here, which is the critical information.
But there are cases when you need to be able to track things outside the VCS. In all cases I have seen so far the reason was that some information should not be maintained within the VCS for security or operational reasons. While organizations are often relaxed about data like host names in non-PROD environments, this changes abruptly when PROD comes into play. While I always think “security by obscurity” when I have that discussion, it is also a fight not worth having.
The other reason is operational procedures. The operations team often has a well-established approach that maintains configuration files for many applications in a unified way. The latter typically involves a dedicated location on network storage where configuration data sit. Ideally, there should also be a generic mechanism to track changes here. A dedicated VCS is of course a good option, but operations staff without a development background often (rightly) shy away from that route.
So it comes down to what the configuration management system itself offers. What I have implemented in WxConfig is a system where every operation that changes configuration data results in an audit event that gets persisted to disk. It includes metadata (e.g. what user initiated the change from which IP address), the actual change (e.g. file save from UI or change of value via API), and the old and new version of the affected configuration file.
The downside compared to a well-chosen commit message for VCS is that the system cannot record the intent. But on the other hand no change is lost, because no manual activity is needed. In practice this far outweighs the missing intent, at least for me. Also it has proven to be helpful during development when I had accidentally removed data. It was far easier to restore the latter from an audit record compared to looking them up in their original source.
All audit data get persisted to files and the metadata is recorded as XML. That allows automated processing, if required by e.g. a GRC system (Governance, Risk Management, and Compliance) or legal frameworks like the aforementioned Sarbanes-Oxley Act.
How do I maintain my configuration data? It is one thing to have them stored somewhere and being able to maintain stuff, if you are a developer or a technical person in general. In which case you will be fine with a plain text editor, sometimes even something like vi (I am an Emacs guy 😉 ). But what if you want business users be able to do this by themselves? In fact, quite often these folks will also tell you that they do not want to be dependent on you for the job.
When trying to identify the requirements for a maintenance tools suitable for business users, I came up with the following points (and in that order of priority):
- Familiarity: People do not want to learn an entirely new tool for just maintaining a few configuration values. So exposing them to the complete functionality of a configuration management system will not work. And even if the latter has something like role-based configuration or user profiles, it will typically just hide those parts, that “normal” users are not supposed to see. But it will still, tacitly, require a lot of understanding about the overall system.
- Safety: In many cases business people will tell you that while they want to do changes by themselves, they still expect you (or the system you develop) to check what they entered. In a number of cases the argument went so far that people, from my perspective, were trying to avoid responsibility for their actions altogether. Whether this was just an attempt to play the blame-game or rather an expression of uncertainty, I cannot say. But for sure you need to add some automated checks.
- Auditing: In many cases you must be able to maintain records of changes for legal reasons (e.g. Sarbanes–Oxley Act). But even if that is not the case, you as the technically responsible person absolutely want to have records, what configuration were in the system when.
- Extensibility: Especially today, when most of us try to work in an agile manner, this should not need mentioning. But unfortunately the reality I mostly witness is quite different. Anyway, what we as developers need to do here, is scrutinize what the business folks have brought up as their requirement. It is not their job to tell us about it, but the other way around.
So what is the solution? I see basically two options here: custom-developed web UI and Excel. The Excel approach may surprise you, but it really has a number of benefits, also from a technical perspective. Remember that it is file-based, so you can simply leverage your normal VCS for auditing. How the business users “upload” the updates, may vary. But you can start with something simple like a shell script/batch file that acts as a VCS wrapper, and many people will already be happy. More sophisticated stuff like a web upload and built-in verification is nicer of course.
A web UI has the benefit that you can provide feedback on rule violation easier, compared to the Excel approach. But what is often overlooked is the requirement for mass-updates. It is a big pain to click through 50 records whereas in Excel that would simply be a copy-paste operation. And the latter is less susceptible to errors, by the way.
Sometimes Business Rules Management System (BRMS) and the web UI they usually come with can be an alternative. But if you are not already using it on the project, the overhead of bringing in an entire additional system, will typically outweigh the benefits. If you still want to explore that route, pay particular attention to how the web UI changes can be synced back into the artifacts in VCS the developers work with.
An often overlooked aspect of configuration management is time. There are many scenarios where the correct value is dependent on time (or date for that matter). Typical examples are organizational structures (think “re-org”) or anything that is somehow related to legislation (every change in law or regulation becomes active at a certain time).
There are several approaches to deal with this situation:
- Deployment: This is the simplest way and its ease comes with a number of drawbacks. What you basically do is make sure that a deployment happens exactly at the time when the changes should take effect. Obviously this collides with a typical CD (Continuous Deployment) setup. And while it is certainly possible to work around this, the remaining limitations usually make it not worthwhile. As to the latter, what if you need to process something with historical data? A good example is tax calculation, when you need to re-process an invoice that was issued before the new VAT rate got active. Next, how do you perform testing? Again, nothing impossible but a case of having to find ways around it. And so on…
- Feature toggle: Here the configuration management solution has some kind of scheduler to ensure that at the right point in time the cut-over to the new values is being made. This is conceptually quite similar to the deployment approach, but has the advantage of not colliding with CD. But all the other limitations mentioned above still apply. WxConfig supports this approach.
- Configuration dimension: The key difference here is that time is explicitly modeled into the configuration itself. That means every query not only has the normal input, usually a key or XPath expression, but also a point in time (if left empty just take the current time as default). This gives full flexibility and eliminates the aforementioned limitations. Unfortunately, however, very few people take this into consideration when they start their project. Adding it later is always possible, but of course it comes with an over-proportional effort compared to doing it right from the start. WxConfig will add support for this soon (interestingly no one asked for it so far).
That was just a quick overview, but I hope it provided some insights. Comments are very welcome (as always).