Category Archives: Technical Stuff

Unique IDs in Programming

Most people have probably come across what is usually called a UUID (universally unique ID) while using software. UUIDs are typically a cryptic combination of alphanumeric characters and do not make any sense to the human brain.  But why are they such a critical aspect to most computer programs?

Their purpose is pretty obvious: be able to identify a set of data (money transfer, customer, product, order, etc.) on a low technical level. The human brain, for most scenarios, does not need such an artificial construct but works nicely with the underlying “real” data. We identify a customer by looking at first name and surname. And if we have multiple customers “Mike Smith”, we add the date of birth. If that is still not enough, then there is the current address. And so on.

For the purpose of this discussion a customer’s UUIDs is not to be mixed up with the customer number but exists in addition. This may seem like overhead, but think about what happens when an organization buys a competitor. With a bit of “luck” there will be overlap between the customer numbers. Without a UUID already in place, all sorts of ugly workarounds need to be implemented under great time pressure, to be able to merge the customer lists then. If that happens, there is considerable risk of something going wrong, resulting in the loss of customers.

It would of course be possible to replicate the human brain’s approach of looking at data in their individual context. But that would make things unnecessarily complex, plus require a different approach for each type of data. So we help ourselves with a technical ID that is guaranteed to be unique. Generating such an ID is surprisingly complex, once you realize what the algorithm needs to accomplish:

  • Be fast: There are many scenarios where you need to create tens of thousands of UUIDs per second (e.g. high-frequency trading, payments processing, telco billing, etc.). But “randomness” usually requires the use of cryptographic functions, which are notoriously expensive operations. In recent years this has become less of a concern, though, since many CPU now offer dedicated support here.
  • Be unique across all computers that are involved with the application: While it is probably rarely a problem if two identical IDs are issued for two completely disparate organizations (ignoring scenarios like EDI), there are many cases where it is still highly relevant. Most critical applications run on more than one computer for high-availability and load-balancing purposes. So obviously there must never be a case where IDs clash. Also, it would likely cause problems if the same ID existed not only on the production system but also on a development or test system.
  • Be relatively short: Many UUIDs are between 30 and 40 characters long, which is really not long, given that it is guaranteed there will never be a clash.

Let’s now look into the use of UUIDs. Apart from pretty obvious things like the aforementioned customer etc., they are used in very many systems for internal purposes. A good example are relational database managements systems, where each record (aka row) has its own ID. The same is true for messaging system (think JMS or MQTT).

The two core use-cases I see for those internal IDs are fault diagnosis and linking data. In today’s world most systems are highly distributed, even without the use of a micro-services architecture (which increases the level of distribution by orders of magnitude). To track a business transaction across multiple systems, you need to be able to identify these sub-transactions and the means for this are UUIDs. Ideally you have an operations console that automatically connects things between systems. In reality, though, there is often a lot of manual work to be done.

Another example of linking data together is master data management (MDM). Many organizations have done something in that area and most have failed. The core reason in my view is the approach. It is a business problem that is very closely linked with many technical challenges. And most organizations are bad at dealing with such a combination. There are more aspects, but I will cover those in a separate article.

Back to UUIDs. It might be tempting to leverage internal IDs (e.g. from a database system) for your application. But be warned, this is a very dangerous road. Those IDs are guaranteed to be unique only in the context, for which they are created, but not outside. Even more critical is using just a part of the IDs, because the rest seems to be a fixed value. I have seen a business-critical end-user application where part of the database’s row ID  (Oracle Database v7) was used. Later the database was migrated to a higher version (Oracle Database v8) where the UUID algorithm had be changed. So the sub-string of the row ID was suddenly not unique anymore. The end-user application did not expect duplicates and crashed immediately after starting.

While at the subject of databases, there are people who like to use sequences as UUIDs. Sequences are numbers, which the database auto-increments and they seem a convenient and efficient way to obtain a unique ID. But there are various problems with that approach. Firstly, the ID is only unique within a single database instance. This typically creates all sorts of problems for testing the code, and also when moving it to production. Secondly, this kind of feature, while available in many database systems, is a proprietary extension of SQL. So you create yourself unnecessary problems for using different systems. Many organizations have standardized on one database system for production use. Having to use this also for DEV, CI, SIT, UAT, etc. may make things more difficult than necessary. More importantly, though, it increases the vendor lock-in with all the associated issues.

Let me finish with timestamps. They are the original sin of UUIDs. Really. People like them because they are human-readable, allow easy sorting of transaction into the order of processing, and just seem to be THE obvious way to go. But they are not unique! If your development machine is slow enough, relative to the transaction’s processing time, you may indeed not have issues. But that is only because at least one millisecond (you don’t use a resolution of seconds, do you?) goes by between transactions. A production machine, however, will likely be much faster. And what if multiple machines are working in parallel?

In one case I have seen there was considerable data loss, because someone had been clever enough to use a timestamp with a resolution of only seconds as the filename for writing PDFs into a directory. From there an archiving solution then picked them up for storage to fulfill a legal requirement. This guy’s notebook had been slow enough (it was in the early 2000s) that all files had been several seconds “apart”. But the production machine was a beefy server and it took several weeks until someone realized what had happened. Tens of thousands of documents were lost forever.

I hope this quick overview provided some value to you and will help you in the next discussion on why you really need a proper UUID.

My 2020 Setup for LaTeX

Here is a short write-up of my current LaTeX setup. Since I sometimes need to process documents on Linux systems (usually in a CI/CD context) the natural choice for me these days is TeX Live on Windows.

My preferred editor is probably less common, especially on Windows: Emacs. I have been using it for more than 20 years and with the right add-ons (AUCTeX and RefTeX) it is still the best LaTeX editor for me. Would I recommend it to someone today who does not already know how to use Emacs? Probably not, given the learning curve. But in the late 1990s there was no real alternative on Linux. And LaTeX on Linux it had be for creating high-quality graphics with Xfig and replace text in the EPS files with full-blown LaTeX code for amazing formulas etc.

But let’s go back to the present time. Here is what I did:

  • Download Windows installer for TeX Live
  • Start installer with administrator rights (right-click) and accept all default settings, then wait a really long time (more than three hours on an old Lenovo Thinkpad T520)
  • Install Emacs. I still have EmacsW32 lying around (you need to fix some security settings), but it is no longer available for download. If you look for an alternative, perhaps you find something here.
  • Install Sumatra PDF. The critical feature for me is that it does not hold a write-lock on the file. So when the output PDF is updated in the background by latexmk, it does not cause any problems. I did the installation as administrator and changed the location to C:\Program Files\SumatraPDF because I personally prefer it that way.

That’s all. Enjoy writing 🙂

Getting Started with Chef Infra Server

A while ago Chef Software announced that they would move all source code to the Apache 2.0 license (see announcement for details), which is something I welcome. Not so much welcomed by many was the fact that they also announced to stop “free binary distributions”. In the past you could freely download and use the core parts of their offering, if that was sufficient for your needs. What upset many people was that the heads-up period for this change was rather short and many answers were left open. It also did not help that naturally their web site held many references to the old model, so people were confused.

In the meantime it seems that Chef has loosened their position on binary distributions a bit. There is now a number of binaries that are available under the Apache 2.0 license and they can be found here. This means that you can use Chef freely, if you are willing to compromise on some features. Thanks a lot for this!

This post will describe what I did to set up a fresh Chef environment with only freely available parts. You need just two things to get started with Chef: the server and the administration & development kit. The latter goes by the name of ChefDK and can be installed on all machines on which development and administration work happens. It comes with various command line tools that allow you to perform the tasks needed.

Interestingly, you will find almost no references to ChefDK on the official web pages. Instead its successor “Chef Workstation” will be positioned as the tool to use. There is only one slight problem here: The latest free version is pretty old (v0.4.2) and did not work for me, as well as various other people. That was when I decided to download the latest free version of ChefDK and give it a try. It worked immediately and since I had not needed any of the additional features that come with Chef Workstation, I never looked back.

No GUI is part of those free components. Of course Chef offer such a GUI (web-based) which is named Chef Management Console. It is basically a wrapper over the server’s REST API. Unfortunately the Management Console is “free” only up to 25 nodes. For that reason, but also because its functionality is somewhat limited compared to the command line tools, I decided to not cover it here.

Please check the licenses by yourself, when you follow the instructions below. It is solely your own responsibility to ensure compliance.

Below you will find a description of what I did to get things up and running. If you have a different environment (e.g. use Ubuntu instead of CentOS) you will need to check the details for your needs. But overall the approach should stay the same.

Environment

The environment I will use looks like this

  • Chef server: Linux VM with CentOS 7 64 bit (minimal selection of programs)
  • Chef client 1: Linux VM like for Chef server
  • Development and administration: Windows 10 Pro 64bit (v1909)

I am not sure yet whether I will expand this in the future. If you are interested, please drop a comment below.

Please check that your system meets the prerequisites for running Chef server.

Component Versions

The download is a bit tricky, since we don’t want to end up with something that falls under a commercial license. As of this writing (April 2020) the following component binaries are the latest that come under an Apache 2.0 license. I verified the latter by clicking at “License Information” underneath each of the binaries that I plan to use.

  • Chef Infra Server: v12.19.31 (go here to check for changes)
  • Chef DK: 3.13.1 (go here to check for changes)

As to the download method Chef offer various methods. Typically I would recommend to use the package manager of your Linux distribution, but this will likely cause issues from a license perspective sooner or later.

Server Installation and Initial Setup

So what we will do instead is perform a manual download by executing the following steps (they are a sub-set of the official steps and all I needed to do on my system):

  • All steps below assume that you are logged in as root on your designated Chef server. If you use sudo, please adjust accordingly.
  • Ensure required programs are installed
    yum install -y curl wget
  • Open ports 80 and 443 in the firwall
    firewall-cmd --permanent --zone public --add-service http && firewall-cmd --permanent --zone public --add-service https && firewall-cmd --reload
  • Disable SELinux
    setenforce Permissive
  • Download install script from Chef (more information here)
    curl -L https://omnitruck.chef.io/install.sh > chef-install.sh
  • Make install script executable
    chmod 755 chef-install.sh
  • Download and install Chef server binary package: The RPM will end up somewhere in /tmp and be installed automatically for you. This will take a while (the download size is around 243 MB), depending on your Internet connection’s bandwidth.
    ./chef-install.sh -P chef-server -v "12.19.31"
  • Perform initial setup and start all necessary components, this will take quite a while
    chef-server-ctl reconfigure
  • Create admin user
    chef-server-ctl user-create USERNAME FIRSTNAME LASTNAME EMAIL 'PASSWORD' --filename USERNAME.pem
  • Create organization
    chef-server-ctl org-create ORG_SHORT_NAME 'Org Full Name' --association-user USERNAME --filename ORG_SHORT_NAME-validator.pem
  • Copy both certificates (USERNAME.pem and ORG_SHORT_NAME-validator.pem) to your Windows machine. I use FileZilla (installers without bloatware can be found here) for such cases.
ChefDK Installation and Initial Setup

What I describe below is a condensed version of what worked for me. More details can be found on the official web pages.

  • I use $HOME in the context below to refer to the user’s home directory on the Windows machine.  You must manually translate it to the correct value (e.g. C:\Users\chris in my case).
  • Download the latest free version of ChefDK for Windows 10 from here and install it
  • Check success of installation by running the following command from a command prompt:
    chef -v
  • Create directory and base version of configuration file for connectivity by running
    knife configure (it may look like it hangs, just give it some time)
  • Copy USERNAME.pem and SHORTNAME-validator.pem to $HOME/.chef
  • Add your server’s certificate (self-signed!) to the list of trusted certificates with
    knife ssl fetch
  • Verify that things work by executing knife environment list, it should return _default as the only existing environment
  • The generated configuration file was named $HOME/.chef/credentials in my case and I decided to rename it config.rb (which is the new name in the official documentation) and also update the contents:
    • Remove the line with [default] at the beginning which seemed to cause issues
    • Add knife[:editor] = '"C:\Program Files\Notepad++\notepad++.exe" -nosession -multiInst' as the Windows equivalent of setting the EDITOR environment variable on Linux.
Initial Project

We will  create a very simple project here

  • Go into the directory where you want all your Chef development work to reside (I use $HOME/src; the comment regarding the use of $HOME from above still applies) and open a command prompt
  • Create a new Chef repo (where all development files live)
    chef generate repo chef-repo (chef-repo is the name, you can of course change that)
  • You will see that  a new directory ($HOME/src/chef-repo) has been created with a number of files in it. Among them is  ./cookbooks/example , which we will upload as a first test. Cookbooks are where instructions are stored in Chef.
  • To be able to upload it, the cookbook path must be configured, so you need to add to $HOME/.chef/config.rb the following line:
           cookbook_path   ["$HOME/src/chef-repo/cookbooks"]
    (example:  cookbook_path ["c:/Users/chris/src/chef-repo/cookbooks"])
  • You can now upload the cookbook via knife cookbook upload example
Client Setup

In order to have the cookbook executed you must now add it to the recipe list (they take the cooking theme seriously at Chef) of the machines, where you want it to run. But first you must bootstrap this machine for Chef.

  • The bootstrap happens with the following command (I recommend to check all possible options by via knife bootstrap --help) executed on your Windows machine :
    knife bootstrap MACHINE_FQDN --node-name MACHINE_NAME_IN_CHEF --ssh-user root --ssh-password ROOT_PASSWORD
  • You can now add the recipe to the client’s run-list for execution:
       knife node run_list add MACHINE_NAME_IN_CHEF example
    and should get a message similar to
      MACHINE_NAME_IN_CHEF :
        run_list:
          recipe[example]
  • You can now check the execution by logging into your client and execute chef-client as root.  It will also be executed about every 30 minutes or so. But checking the result directly is always a good idea after you changed something.

Congratulation, you can now maintain your machines in a fully automated fashion!