Category Archives: Architecture

DevOps and Ownership

“You build it, you run it” has been my mantra for many years now. A number of times I was approached by management and they asked who should be operating stuff that I had built. Because, allegedly, my time was too precious for doing such a mundane task like operations.

This is to all managers: Operations is neither mundane nor something for junior staff. It is in fact exactly the opposite. Operations is what keeps the organization alive. Operations is where the best people should be, because here the rubber (the developed software) hits the road. Operations is your last line of defense, when (not if) something goes catastrophically wrong. Operations is a key influencing factor on your organization’s ROI. Operations determines your ability to be agile on the market. Operations is key for customer satisfaction. I could go on and on, but likely you long got my point.

Of course there are some aspects to operations that, when things are done the wrong way, are repetitive and far from challenging. But that should mostly be behind us. Yes, in the 1960s we had people who did nothing but enter data. And until not too long ago a lot of operations was just ticking off check boxes on a to-do list. But with things like infrastructure as code (see my recent post on starting with Chef Infra Server), this should really be something from the past. What you need today are people who take pride in running a lean, highly automated, highly resilient IT organization.

And that is where it should be clear to everybody, that DevOps is much more about organization, knowledge, and collaboration beyond traditional “borders”, than about technology.

By the way: My response to management about who should run my stuff, has always been “me”. Because the applications were built to be as maintenance-free as possible. Only the occasional support ticket had to be answered and with proper logging/auditing that is nothing that takes a lot of time. And fixing the occasional bug was not a big deal either, thanks to Clean Code and test-automation.

This allowed me to support 6 business-critical applications as a “side-project”, i.e. no time was officially allocated. Comparable applications operated by other departments had at least three people full-time for support only.

Getting Started with Chef Infra Server

A while ago Chef Software announced that they would move all source code to the Apache 2.0 license (see announcement for details), which is something I welcome. Not so much welcomed by many was the fact that they also announced to stop “free binary distributions”. In the past you could freely download and use the core parts of their offering, if that was sufficient for your needs. What upset many people was that the heads-up period for this change was rather short and many answers were left open. It also did not help that naturally their web site held many references to the old model, so people were confused.

In the meantime it seems that Chef has loosened their position on binary distributions a bit. There is now a number of binaries that are available under the Apache 2.0 license and they can be found here. This means that you can use Chef freely, if you are willing to compromise on some features. Thanks a lot for this!

This post will describe what I did to set up a fresh Chef environment with only freely available parts. You need just two things to get started with Chef: the server and the administration & development kit. The latter goes by the name of ChefDK and can be installed on all machines on which development and administration work happens. It comes with various command line tools that allow you to perform the tasks needed.

Interestingly, you will find almost no references to ChefDK on the official web pages. Instead its successor “Chef Workstation” will be positioned as the tool to use. There is only one slight problem here: The latest free version is pretty old (v0.4.2) and did not work for me, as well as various other people. That was when I decided to download the latest free version of ChefDK and give it a try. It worked immediately and since I had not needed any of the additional features that come with Chef Workstation, I never looked back.

No GUI is part of those free components. Of course Chef offer such a GUI (web-based) which is named Chef Management Console. It is basically a wrapper over the server’s REST API. Unfortunately the Management Console is “free” only up to 25 nodes. For that reason, but also because its functionality is somewhat limited compared to the command line tools, I decided to not cover it here.

Please check the licenses by yourself, when you follow the instructions below. It is solely your own responsibility to ensure compliance.

Below you will find a description of what I did to get things up and running. If you have a different environment (e.g. use Ubuntu instead of CentOS) you will need to check the details for your needs. But overall the approach should stay the same.

Environment

The environment I will use looks like this

  • Chef server: Linux VM with CentOS 7 64 bit (minimal selection of programs)
  • Chef client 1: Linux VM like for Chef server
  • Development and administration: Windows 10 Pro 64bit (v1909)

I am not sure yet whether I will expand this in the future. If you are interested, please drop a comment below.

Please check that your system meets the prerequisites for running Chef server.

Component Versions

The download is a bit tricky, since we don’t want to end up with something that falls under a commercial license. As of this writing (April 2020) the following component binaries are the latest that come under an Apache 2.0 license. I verified the latter by clicking at “License Information” underneath each of the binaries that I plan to use.

  • Chef Infra Server: v12.19.31 (go here to check for changes)
  • Chef DK: 3.13.1 (go here to check for changes)

As to the download method Chef offer various methods. Typically I would recommend to use the package manager of your Linux distribution, but this will likely cause issues from a license perspective sooner or later.

Server Installation and Initial Setup

So what we will do instead is perform a manual download by executing the following steps (they are a sub-set of the official steps and all I needed to do on my system):

  • All steps below assume that you are logged in as root on your designated Chef server. If you use sudo, please adjust accordingly.
  • Ensure required programs are installed
    yum install -y curl wget
  • Open ports 80 and 443 in the firwall
    firewall-cmd --permanent --zone public --add-service http && firewall-cmd --permanent --zone public --add-service https && firewall-cmd --reload
  • Disable SELinux
    setenforce Permissive
  • Download install script from Chef (more information here)
    curl -L https://omnitruck.chef.io/install.sh > chef-install.sh
  • Make install script executable
    chmod 755 chef-install.sh
  • Download and install Chef server binary package: The RPM will end up somewhere in /tmp and be installed automatically for you. This will take a while (the download size is around 243 MB), depending on your Internet connection’s bandwidth.
    ./chef-install.sh -P chef-server -v "12.19.31"
  • Perform initial setup and start all necessary components, this will take quite a while
    chef-server-ctl reconfigure
  • Create admin user
    chef-server-ctl user-create USERNAME FIRSTNAME LASTNAME EMAIL 'PASSWORD' --filename USERNAME.pem
  • Create organization
    chef-server-ctl org-create ORG_SHORT_NAME 'Org Full Name' --association-user USERNAME --filename ORG_SHORT_NAME-validator.pem
  • Copy both certificates (USERNAME.pem and ORG_SHORT_NAME-validator.pem) to your Windows machine. I use FileZilla (installers without bloatware can be found here) for such cases.
ChefDK Installation and Initial Setup

What I describe below is a condensed version of what worked for me. More details can be found on the official web pages.

  • I use $HOME in the context below to refer to the user’s home directory on the Windows machine.  You must manually translate it to the correct value (e.g. C:\Users\chris in my case).
  • Download the latest free version of ChefDK for Windows 10 from here and install it
  • Check success of installation by running the following command from a command prompt:
    chef -v
  • Create directory and base version of configuration file for connectivity by running
    knife configure (it may look like it hangs, just give it some time)
  • Copy USERNAME.pem and SHORTNAME-validator.pem to $HOME/.chef
  • Add your server’s certificate (self-signed!) to the list of trusted certificates with
    knife ssl fetch
  • Verify that things work by executing knife environment list, it should return _default as the only existing environment
  • The generated configuration file was named $HOME/.chef/credentials in my case and I decided to rename it config.rb (which is the new name in the official documentation) and also update the contents:
    • Remove the line with [default] at the beginning which seemed to cause issues
    • Add knife[:editor] = '"C:\Program Files\Notepad++\notepad++.exe" -nosession -multiInst' as the Windows equivalent of setting the EDITOR environment variable on Linux.
Initial Project

We will  create a very simple project here

  • Go into the directory where you want all your Chef development work to reside (I use $HOME/src; the comment regarding the use of $HOME from above still applies) and open a command prompt
  • Create a new Chef repo (where all development files live)
    chef generate repo chef-repo (chef-repo is the name, you can of course change that)
  • You will see that  a new directory ($HOME/src/chef-repo) has been created with a number of files in it. Among them is  ./cookbooks/example , which we will upload as a first test. Cookbooks are where instructions are stored in Chef.
  • To be able to upload it, the cookbook path must be configured, so you need to add to $HOME/.chef/config.rb the following line:
           cookbook_path   ["$HOME/src/chef-repo/cookbooks"]
    (example:  cookbook_path ["c:/Users/chris/src/chef-repo/cookbooks"])
  • You can now upload the cookbook via knife cookbook upload example
Client Setup

In order to have the cookbook executed you must now add it to the recipe list (they take the cooking theme seriously at Chef) of the machines, where you want it to run. But first you must bootstrap this machine for Chef.

  • The bootstrap happens with the following command (I recommend to check all possible options by via knife bootstrap --help) executed on your Windows machine :
    knife bootstrap MACHINE_FQDN --node-name MACHINE_NAME_IN_CHEF --ssh-user root --ssh-password ROOT_PASSWORD
  • You can now add the recipe to the client’s run-list for execution:
       knife node run_list add MACHINE_NAME_IN_CHEF example
    and should get a message similar to
      MACHINE_NAME_IN_CHEF :
        run_list:
          recipe[example]
  • You can now check the execution by logging into your client and execute chef-client as root.  It will also be executed about every 30 minutes or so. But checking the result directly is always a good idea after you changed something.

Congratulation, you can now maintain your machines in a fully automated fashion!

Performance and Thread Count

I have seen many mails and forum posts, which sort-of imply that increasing the number of threads working on a task will speed things up overall. This is a dangerous path to follow in many cases, because it has the major drawback of adding overhead.

In some cases there is a range where the increase of parallelism indeed improves the situation. But that is only because of very particular performance characteristics. Basically what needs to fit is the ration of in-system execution time vs. wait time for other systems. And the critical “side” condition for the other systems is that there is no resource congestion/conflict there: I.e. if you decide to add more load (by increasing the number of threads on your side) to an external database that is already maxed out, how do you expect your code to run faster?

Lastly, there is neither a one-size-fits-all number of threads to recommend, nor can one always say that adding threads will make things better at all. The opposite can happen quite easily and it always needs proper testing with full load. Putting the latter onto the system is harder than most people think, unfortunately. But the benefit will very often be a profoundly improved understanding about the application and especially its TRUE bottlenecks. Parallelism will almost always yield results very different from a single-threaded execution (e.g. when run from your IDE).

Let me close by saying that in a surprisingly high number of scenarios, and again there is a number of pre-conditions associated, a single-threaded execution will give the highest performance (which we have not defined as a term at all here, by the way). This is because the overhead mentioned at the beginning kicks in. If you manage to have your code run such that you reduce or even eliminate cache misses on the CPU, you can see improvements by a factor of 10,000+ (and that is really not percent but a factor). But in general the effort to achieve that outweighs the usefulness; unless we talk about things like high-frequency trading of course.

webMethods Integration Server: How to Structure Packages

For the webMethods Integration Server (IS) packages are “containers” that bundle together various assets and get deployed as sort-of atomic units; this makes them more or less comparable to web apps in a servlet container.  So if you are given a certain task to solve, you need to figure out a clever way to organize the various parts of the overall implementation. Unless we talk about something trivial and short-lived (ever seen?), this is more complicated than it might appear at a first glance.

I did my first project on Integration Server 4.6 in early 2002 and can now openly confess, that the result, while working fine from a business perspective, was not something I would still be proud of today. To a small extent, the hardware equipment can be blamed. I was working with a notebook that had 256 MB of RAM, a single-core 800 MHz mobile Pentium III, a 20 GB hard disk with a whopping 4200 RPM, and a 14 inch display with a 1024×768 resolution.

You may now ask, how this can affect code quality. Most important in my case was performance. If pressing the save button literally sends the machine into a 2+ minutes period of frantic disk activity, you think more than twice before doing this. (Eventually, I got a memory upgrade to 512 MB.) And this is only the tip of the iceberg. In summary, refactoring is very hard if your concentration is constantly drawn away, because you have to wait for your machine to be ready again. Luckily we are past those issues today.

For the sake of this article let’s work with the following scenario, which is close to something I worked on about two years ago:

The business requirement is to automatically perform modifications to certain sales opportunities in the CRM system and report the result to the account manager.

Of course you can put everything in one package. There will be no need to think about dependency management and much less work for setting up the DEV environment, the CI server, deployment scripts (Chef in my case), etc. And in fact this is how I started. Pretty soon I moved some really generic utility services like data type conversion or date calculation into a separate utilities package. But other than that things stayed quite monolithic.

It didn’t last long, though. When I was half-way through with the initial implementation, the next, more or less separate business scenario that involved the CRM system showed up. Obviously, there was no point in having connection details, services, and other configurations related to the latter multiple times. So I moved all CRM-related stuff to yet another package.

To be clear: I was not taken by surprise when this happened. Instead it was a deliberate decision to be very careful with upfront framework building. This has been one of my personal rules for a while and it has the great advantage of being efficient as well as effective. The efficiency comes from the fact that no time is spent on things that later turn out not to be needed a second time. More important, however, is the effectiveness. It means that the outcome is useful. If you ever came across an API that is cumbersome to use, you know what I’m talking about.

Coming to an end, here is the abstract description of how I structure my IS packages. There are certainly other ways, but this works really well for me and not once in many years has caused issues or created that gut-feeling, which tells you that something is not right, although you cannot say what it is.

  • Business logic: Here is where specifics of the business side are handled. What are the criteria to select an opportunity for processing? What changes shall be applied? Technical details of the CRM system are not in scope here.
  • System details: How to perform a query against the CRM system using the criteria coming from the business logic level above? What are the connection details for a given environment (DEV, TEST, PROD)? Not in scope is e.g. the implementation of auditing.
  • Common topics: How to send notifications (e.g. where is your mail server)? Auditing, dealing with data structures, date manipulation, logging, etc.

In terms of naming conventions I suggest the following

<COMPANY_INITIALS>_<TYPE>_<DESCRIPTION>

Here are a few examples for the famous Acme Corp.

  • AC_APP_CrmOppCheck   : Application that performs checks on sales opportunities
  • AC_CMN_Audit                     : Auditing
  • AC_CMN_Util                         : Utilities
  • AC_SYS_CRM                         : Connectivity and services for CRM system
  • AC_SYS_ERP                          : Connectivity and services for ERP system

If you have questions, please do not hesitate to ask in the comments.