Category Archives: Configuration Management

Chef Infra Server Moving to Cloud

As part of a blog post about the new v14 of Chef Infra Server, it was announced that from now on existing functionality will be deprecated in favor of the cloud version. It will be interesting to see how this works out. Personally, I have never been a friend of forcing customers off an existing product. It is a dangerous move that bears the risk of customers switching the vendor entirely. Especially so, if it comes with a major architectural shift like from on-premise to cloud.

I have been a happy user of Chef Server for about five years now, although only for a very small number of machines (single digit). The decision for Chef had been made at a time when Ansible was still in its early stages. But with this latest development I will need to move away from Chef. It is pity, because I really like the tool and have done various custom extensions.

Unique IDs in Programming

Most people have probably come across what is usually called a UUID (universally unique ID) while using software. UUIDs are typically a cryptic combination of alphanumeric characters and do not make any sense to the human brain.  But why are they such a critical aspect to most computer programs?

Their purpose is pretty obvious: be able to identify a set of data (money transfer, customer, product, order, etc.) on a low technical level. The human brain, for most scenarios, does not need such an artificial construct but works nicely with the underlying “real” data. We identify a customer by looking at first name and surname. And if we have multiple customers “Mike Smith”, we add the date of birth. If that is still not enough, then there is the current address. And so on.

For the purpose of this discussion a customer’s UUIDs is not to be mixed up with the customer number but exists in addition. This may seem like overhead, but think about what happens when an organization buys a competitor. With a bit of “luck” there will be overlap between the customer numbers. Without a UUID already in place, all sorts of ugly workarounds need to be implemented under great time pressure, to be able to merge the customer lists then. If that happens, there is considerable risk of something going wrong, resulting in the loss of customers.

It would of course be possible to replicate the human brain’s approach of looking at data in their individual context. But that would make things unnecessarily complex, plus require a different approach for each type of data. So we help ourselves with a technical ID that is guaranteed to be unique. Generating such an ID is surprisingly complex, once you realize what the algorithm needs to accomplish:

  • Be fast: There are many scenarios where you need to create tens of thousands of UUIDs per second (e.g. high-frequency trading, payments processing, telco billing, etc.). But “randomness” usually requires the use of cryptographic functions, which are notoriously expensive operations. In recent years this has become less of a concern, though, since many CPU now offer dedicated support here.
  • Be unique across all computers that are involved with the application: While it is probably rarely a problem if two identical IDs are issued for two completely disparate organizations (ignoring scenarios like EDI), there are many cases where it is still highly relevant. Most critical applications run on more than one computer for high-availability and load-balancing purposes. So obviously there must never be a case where IDs clash. Also, it would likely cause problems if the same ID existed not only on the production system but also on a development or test system.
  • Be relatively short: Many UUIDs are between 30 and 40 characters long, which is really not long, given that it is guaranteed there will never be a clash.

Let’s now look into the use of UUIDs. Apart from pretty obvious things like the aforementioned customer etc., they are used in very many systems for internal purposes. A good example are relational database managements systems, where each record (aka row) has its own ID. The same is true for messaging system (think JMS or MQTT).

The two core use-cases I see for those internal IDs are fault diagnosis and linking data. In today’s world most systems are highly distributed, even without the use of a micro-services architecture (which increases the level of distribution by orders of magnitude). To track a business transaction across multiple systems, you need to be able to identify these sub-transactions and the means for this are UUIDs. Ideally you have an operations console that automatically connects things between systems. In reality, though, there is often a lot of manual work to be done.

Another example of linking data together is master data management (MDM). Many organizations have done something in that area and most have failed. The core reason in my view is the approach. It is a business problem that is very closely linked with many technical challenges. And most organizations are bad at dealing with such a combination. There are more aspects, but I will cover those in a separate article.

Back to UUIDs. It might be tempting to leverage internal IDs (e.g. from a database system) for your application. But be warned, this is a very dangerous road. Those IDs are guaranteed to be unique only in the context, for which they are created, but not outside. Even more critical is using just a part of the IDs, because the rest seems to be a fixed value. I have seen a business-critical end-user application where part of the database’s row ID  (Oracle Database v7) was used. Later the database was migrated to a higher version (Oracle Database v8) where the UUID algorithm had be changed. So the sub-string of the row ID was suddenly not unique anymore. The end-user application did not expect duplicates and crashed immediately after starting.

While at the subject of databases, there are people who like to use sequences as UUIDs. Sequences are numbers, which the database auto-increments and they seem a convenient and efficient way to obtain a unique ID. But there are various problems with that approach. Firstly, the ID is only unique within a single database instance. This typically creates all sorts of problems for testing the code, and also when moving it to production. Secondly, this kind of feature, while available in many database systems, is a proprietary extension of SQL. So you create yourself unnecessary problems for using different systems. Many organizations have standardized on one database system for production use. Having to use this also for DEV, CI, SIT, UAT, etc. may make things more difficult than necessary. More importantly, though, it increases the vendor lock-in with all the associated issues.

Let me finish with timestamps. They are the original sin of UUIDs. Really. People like them because they are human-readable, allow easy sorting of transaction into the order of processing, and just seem to be THE obvious way to go. But they are not unique! If your development machine is slow enough, relative to the transaction’s processing time, you may indeed not have issues. But that is only because at least one millisecond (you don’t use a resolution of seconds, do you?) goes by between transactions. A production machine, however, will likely be much faster. And what if multiple machines are working in parallel?

In one case I have seen there was considerable data loss, because someone had been clever enough to use a timestamp with a resolution of only seconds as the filename for writing PDFs into a directory. From there an archiving solution then picked them up for storage to fulfill a legal requirement. This guy’s notebook had been slow enough (it was in the early 2000s) that all files had been several seconds “apart”. But the production machine was a beefy server and it took several weeks until someone realized what had happened. Tens of thousands of documents were lost forever.

I hope this quick overview provided some value to you and will help you in the next discussion on why you really need a proper UUID.

Configuration Management – Part 9: The Audit Trail

Keeping track of  changes is a critical functionality in every configuration management system because there are legal requirements like  SOX (Sarbanes-Oxley Act) that require it. It can be accomplished in several ways. Basically you can either use an existing tool like a VCS (version control system) or have something custom-built.

When possible, I tend to prefer a VCS because it is (hopefully) already part of your process and governance approach. A typical workflow is that the underlying assets (i.e. configuration files) will be changed and then the VCS client be used to commit the change. The commit message allows to record the intent here, which is the critical information.

But there are cases when you need to be able to track things outside the VCS. In all cases I have seen so far the reason was that some information should not be maintained within the VCS for security or operational reasons. While organizations are often relaxed about data like host names in non-PROD environments, this changes abruptly when PROD comes into play. While I always think “security by obscurity” when I have that discussion, it is also a fight not worth having.

The other reason is operational procedures. The operations team often has a well-established approach that maintains configuration files for many applications in a unified way. The latter typically involves a dedicated location on network storage where configuration data sit. Ideally, there should also be a generic mechanism to track changes here. A dedicated VCS is of course a good option, but operations staff without a development background often (rightly) shy away from that route.

So it comes down to what the configuration management system itself offers. What I have implemented in WxConfig is a system where every operation that changes configuration data results in an audit event that gets persisted to disk. It includes metadata (e.g. what user initiated the change from which IP address), the actual change (e.g. file save from UI or change of value via API), and the old and new version of the affected configuration file.

The downside compared to a well-chosen commit message for VCS is that the system cannot record the intent. But on the other hand no change is lost, because no manual activity is needed. In practice this far outweighs the missing intent, at least for me. Also it has proven to be helpful during development when I had accidentally removed data. It was far easier to restore the latter from an audit record compared to looking them up in their original source.

All audit data get persisted to files and the metadata is recorded as XML. That allows automated processing, if required by e.g. a GRC system (Governance, Risk Management, and Compliance) or legal frameworks like the aforementioned Sarbanes-Oxley Act.

Configuration Management – Part 8: The Maintenance

How do I maintain my configuration data? It is one thing to have them stored somewhere and being able to maintain stuff, if you are a developer or a technical person in general. In which case you will be fine with a plain text editor, sometimes even something like vi (I am an Emacs guy 😉 ). But what if you want business users be able to do this by themselves? In fact, quite often these folks will also tell you that they do not want to be dependent on you for the job.

When trying to identify the requirements for a maintenance tools suitable for business users, I came up with the following points (and in that order of priority):

  • Familiarity: People do not want to learn an entirely new tool for just maintaining a few configuration values. So exposing them to the complete functionality of a configuration management system will not work. And even if the latter has something like role-based configuration or user profiles, it will typically just hide those parts, that “normal” users are not supposed to see. But it will still, tacitly, require a lot of understanding about the overall system.
  • Safety: In many cases business people will tell you that while they want to do changes by themselves, they still expect you (or the system you develop) to check what they entered. In a number of cases the argument went so far that people, from my perspective, were trying to avoid responsibility for their actions altogether. Whether this was just an attempt to play the blame-game or rather an expression of uncertainty, I cannot say. But for sure you need to add some automated checks.
  • Auditing: In many cases you must be able to maintain records of changes for legal reasons (e.g. Sarbanes–Oxley Act). But even if that is not the case, you as the technically responsible person absolutely want to have records, what configuration were in the system when.
  • Extensibility: Especially today, when most of us try to work in an agile manner, this should not need mentioning. But unfortunately the reality I mostly witness is quite different. Anyway, what we as developers need to do here, is scrutinize what the business folks have brought up as their requirement. It is not their job to tell us about it, but the other way around.

So what is the solution? I see basically two options here: custom-developed web UI and Excel. The Excel approach may surprise you, but it really has a number of benefits, also from a technical perspective. Remember that it is file-based, so you can simply leverage your normal VCS for auditing. How the business users “upload” the updates, may vary. But you can start with something simple like a shell script/batch file that acts as a VCS wrapper, and many people will already be happy. More sophisticated stuff like a web upload and built-in verification is nicer of course.

A web UI has the benefit that you can provide feedback on rule violation easier, compared to the Excel approach. But what is often overlooked is the requirement for mass-updates. It is a big pain to click through 50 records whereas in Excel that would simply be a copy-paste operation. And the latter is less susceptible to errors, by the way.

Sometimes Business Rules Management System (BRMS) and the web UI they usually come with can be an alternative. But if you are not already using it on the project, the overhead of bringing in an entire additional system, will typically outweigh the benefits. If you still want to explore that route, pay particular attention to how the web UI changes can be synced back into the artifacts in VCS the developers work with.

Configuration Management – Part 7: The Time

An often overlooked aspect of configuration management is time. There are many scenarios where the correct value is dependent on time (or date for that matter). Typical examples are organizational structures (think “re-org”) or anything that is somehow related to legislation (every change in law or regulation becomes active at a certain time).

There are several approaches to deal with this situation:

  • Deployment: This is the simplest way and its ease comes with a number of drawbacks. What you basically do is make sure that a deployment happens exactly at the time when the changes should take effect. Obviously this collides with a typical CD (Continuous Deployment) setup. And while it is certainly possible to work around this, the remaining limitations usually make it not worthwhile. As to the latter, what if you need to process something with historical data? A good example is tax calculation, when you need to re-process an invoice that was issued before the new VAT rate got active. Next, how do you perform testing? Again, nothing impossible but  a case of having to find ways around it. And so on…
  • Feature toggle: Here the configuration management solution has some kind of scheduler to ensure that at the right point in time the cut-over to the new values is being made. This is conceptually quite similar to the deployment approach, but has the advantage of not colliding with CD. But all the other limitations mentioned above still apply. WxConfig supports this approach.
  • Configuration dimension: The key difference here is that time is explicitly modeled into the configuration itself. That means every query not only has the normal input, usually a key or XPath expression, but also a point in time (if left empty just take the current time as default). This gives full flexibility and eliminates the aforementioned limitations. Unfortunately, however, very few people take this into consideration when they start their project. Adding it later is always possible, but of course it comes with an over-proportional effort compared to doing it right from the start. WxConfig will add support for this soon (interestingly no one asked for it so far).

That was just a quick overview, but I hope it provided some insights. Comments are very welcome (as always).

 

Configuration Management – Part 6: The Secrets

Every non-trivial application needs to deal with configuration data that require special protection. In most cases they are password or something similar. Putting those items into configuration files in clear is a pretty bad idea. Especially so, because these configuration files are almost always stored in a VCS (version control system), that many people have access to.

Other systems replace clear-text passwords found in files automatically with an encrypted version. (You may have seen cryptic values starting with something like {AES} in the past.) But apart from the conceptual issue that parts of the files are changed outside the developer’s control, this also not exactly an easy thing to implement. How do you tell the system, which values to encrypt? What about those time periods that passwords exist in clear text on disk, especially on production systems?

My approach was to leverage the built-in password manager facility of webMethods Integration Server instead. This is an encrypted data store that can be secured on multiple levels, up to HSMs (hardware security module). You can look at it as an associative array (in Java usually referred to as map) where a handle is used to retrieve the actual secret value. With a special syntax (e.g. secretValue=[[encrypted:handleToSecretValue]]) you declare the encrypted value. Once you have done that, this “pointer”will of course return no value, because you still need to actually define it in the password manager. This can be done via web UI, a service, or by importing a flat file. The flat file import, by the way, works really well with general purpose configuration management systems like Chef, Puppet, Ansible etc.

A nice side-effect of storing the actual value outside the regular configuration file is that within your configuration files you do not need to bother with the various environments (add that aspect to the complexity when looking at in-file encryption from the second paragraph). Because the part that is environment-specific is the actual value; the handle can, and in fact should, be the same across all environments. And since you define the specific value directly within in each system, you are already done.

Configuration Management – Part 5: The External World

A standard requirement in configuration management is using values that already exist somewhere else. The most obvious places are environment variables (defined by the operating system) and Java system properties. Other are existing (!) databases, e.g. from an ERP system, or files.

The reason for referencing values directly from their original sources instead of duplicating them in a copy-paste fashion is the Don’t-Repeat-Yourself (DRY) principle. Although the latter is typically discussed in the context of code, it applies to configuration at least as well. We all know those “great” applications, which require manual updates of the same value in different places. And if you miss only one, all hell breaks loose.

For re-use of values within a file, the standard approach is variable interpolation. A well-known syntax for that comes from the build tool Apache Ant, where ${variableName} can be placed anywhere into a value assignment. Apache Commons Configuration, and therefore also WxConfig, support this syntax. In WxConfig you can even reference values from other files that belong to the same Integration Server package.

But what to do for other sources? Well, for the aforementioned environment variables and Java system properties, the respective interpolators from Apache Commons Configuration can also be used in WxConfig. But the latter also defines several interpolators of its own.

  • Cross package: Assuming you have an Integration Server package that holds some general values (often referred to as “global”), those can be referenced. (There are multiple ways to deal with truly global values and the specifics really determine which one is used best.)
  • Current date/time: Gets the current date and/or time, which is admittedly a bit of an edge-case. One could argue that this is not really a configuration value, and that is correct. However, there are scenarios when the ability to quite easily have such a value comes in very handy. Think about files that get created during processing of data. Instead of manually concatenating the path, the base filename, the date/time stamp and the file suffix in the code, you could just have something like this in your configuration file: workerFile=${tmpDir}/appFile_${date:yyyyMMdd-HHmmssSSS}.dat Doesn’t that make things a lot cleaner to read?
  • File content: Similar to date/time this was added primarily for convenience and clarity reasons. The typical use-case are templates, where a file (possibly maintained by another application) is just read directly into a configuration value.
  • Code invocation: When all else fails, you can have an arbitrary piece of logic be executed and the result be placed into the configuration value.

It is worth noting that all interpolators resolve at invocation. So you always get current results. In the case of the file content interpolator, this has the downside of file I/O; but if that really becomes a bottleneck, you are still not worse off than when having the read somewhere else in the code. And the file system caching is also still there …

With this set of options, pretty much all requirements for a redundancy-free configuration management can be met.