Update on LaTeX Setup

This is a quick follow-up to my recent post on the LaTeX setup for 2020. I wanted to let you know that I have recently switched from Emacs to VS Code with the LaTeX Workshop extension as my primary LaTeX editor. I truly cannot remember what made me look into this direction, but I am happy that I did.

The main reason for switching was that the file management is so much easier with VS Code. My current project has a number of files spread over many sub-directories and the way LaTeX Workshop handles things makes me much more productive. I somehow miss AUCTeX, but overall I will certainly not go back.

Giving Space to New Team Members

What is management about? According to my favorite podcast, Manager Tools, as a manger you need to achieve results and retention. Pretty obvious on the one hand. But terribly difficult to implement, especially if you want to balance things. One aspect is how to integrate new members into the team. If you have not read my post on what actually makes a team, please go here first.

When you join a team you are the “freshman”, at least in terms of team dynamics. Not so long ago I had switched teams myself and after quite a few years found myself in that role again. I had gone through this process quite a few times, either within an organization or combined with a complete change of employer. Different this time was that from day one I was officially in charge of two knowledge areas (architecture and DevOps) in a global function. 

This was an interesting experience, given the combination of being team freshman and subject matter expert at the same time. So I had to balance what I consider appropriate behavior for someone new to a team, with demonstrating thought leadership in my areas of expertise. I knew most folks from previous interaction and regarded them very highly for what they had delivered in the past. And seeing how they treated those team members that I did not know yet, I was quickly convinced that those were top performers, too.

From the receiving end, I was welcomed very friendly and that included the same amount of teasing everybody else received and gave. It was clearly a warm welcome for me and I truly appreciate(d) it. My colleagues also gave me the distinct feeling that my input was seen as valuable to the team as a whole. Or in other words: They gave me space to define and fill my role, which is much more than your official job description.

The tone for this was set by management and, as with so many other aspects, followed by the others. This is classic leadership by example. Of course, if you have a jerk on the team it will probably not help very much. But it is absolutely the manager who sets the tone. A few years ago I first hand experienced how a new boss literally killed a weekly team call that had gone successfully for years in just a fortnight.

An additional aspect for people who just started their career is letting them establish themselves in the organization. Especially for engineers who tend to be more reserved and have a somewhat introvert personality, which from a scientific perspective is different from being reserved (for more details, I have linked a video in this post). These people need to be given opportunities where they can demonstrate their capabilities in a “safe environment” and then be recognized for it. They will flourish in such a setup and typically deliver much more than you expect.

The worst thing you can do with such folks is to shout them down in meetings or conference calls. Very quickly they will go silent, suffer quietly, and start looking for somewhere else to go. This is one of the reason why leading a group of software developers is very different from e.g. marketing folks. But that is for a different post.

Unique IDs in Programming

Most people have probably come across what is usually called a UUID (universally unique ID) while using software. UUIDs are typically a cryptic combination of alphanumeric characters and do not make any sense to the human brain.  But why are they such a critical aspect to most computer programs?

Their purpose is pretty obvious: be able to identify a set of data (money transfer, customer, product, order, etc.) on a low technical level. The human brain, for most scenarios, does not need such an artificial construct but works nicely with the underlying “real” data. We identify a customer by looking at first name and surname. And if we have multiple customers “Mike Smith”, we add the date of birth. If that is still not enough, then there is the current address. And so on.

For the purpose of this discussion a customer’s UUIDs is not to be mixed up with the customer number but exists in addition. This may seem like overhead, but think about what happens when an organization buys a competitor. With a bit of “luck” there will be overlap between the customer numbers. Without a UUID already in place, all sorts of ugly workarounds need to be implemented under great time pressure, to be able to merge the customer lists then. If that happens, there is considerable risk of something going wrong, resulting in the loss of customers.

It would of course be possible to replicate the human brain’s approach of looking at data in their individual context. But that would make things unnecessarily complex, plus require a different approach for each type of data. So we help ourselves with a technical ID that is guaranteed to be unique. Generating such an ID is surprisingly complex, once you realize what the algorithm needs to accomplish:

  • Be fast: There are many scenarios where you need to create tens of thousands of UUIDs per second (e.g. high-frequency trading, payments processing, telco billing, etc.). But “randomness” usually requires the use of cryptographic functions, which are notoriously expensive operations. In recent years this has become less of a concern, though, since many CPU now offer dedicated support here.
  • Be unique across all computers that are involved with the application: While it is probably rarely a problem if two identical IDs are issued for two completely disparate organizations (ignoring scenarios like EDI), there are many cases where it is still highly relevant. Most critical applications run on more than one computer for high-availability and load-balancing purposes. So obviously there must never be a case where IDs clash. Also, it would likely cause problems if the same ID existed not only on the production system but also on a development or test system.
  • Be relatively short: Many UUIDs are between 30 and 40 characters long, which is really not long, given that it is guaranteed there will never be a clash.

Let’s now look into the use of UUIDs. Apart from pretty obvious things like the aforementioned customer etc., they are used in very many systems for internal purposes. A good example are relational database managements systems, where each record (aka row) has its own ID. The same is true for messaging system (think JMS or MQTT).

The two core use-cases I see for those internal IDs are fault diagnosis and linking data. In today’s world most systems are highly distributed, even without the use of a micro-services architecture (which increases the level of distribution by orders of magnitude). To track a business transaction across multiple systems, you need to be able to identify these sub-transactions and the means for this are UUIDs. Ideally you have an operations console that automatically connects things between systems. In reality, though, there is often a lot of manual work to be done.

Another example of linking data together is master data management (MDM). Many organizations have done something in that area and most have failed. The core reason in my view is the approach. It is a business problem that is very closely linked with many technical challenges. And most organizations are bad at dealing with such a combination. There are more aspects, but I will cover those in a separate article.

Back to UUIDs. It might be tempting to leverage internal IDs (e.g. from a database system) for your application. But be warned, this is a very dangerous road. Those IDs are guaranteed to be unique only in the context, for which they are created, but not outside. Even more critical is using just a part of the IDs, because the rest seems to be a fixed value. I have seen a business-critical end-user application where part of the database’s row ID  (Oracle Database v7) was used. Later the database was migrated to a higher version (Oracle Database v8) where the UUID algorithm had be changed. So the sub-string of the row ID was suddenly not unique anymore. The end-user application did not expect duplicates and crashed immediately after starting.

While at the subject of databases, there are people who like to use sequences as UUIDs. Sequences are numbers, which the database auto-increments and they seem a convenient and efficient way to obtain a unique ID. But there are various problems with that approach. Firstly, the ID is only unique within a single database instance. This typically creates all sorts of problems for testing the code, and also when moving it to production. Secondly, this kind of feature, while available in many database systems, is a proprietary extension of SQL. So you create yourself unnecessary problems for using different systems. Many organizations have standardized on one database system for production use. Having to use this also for DEV, CI, SIT, UAT, etc. may make things more difficult than necessary. More importantly, though, it increases the vendor lock-in with all the associated issues.

Let me finish with timestamps. They are the original sin of UUIDs. Really. People like them because they are human-readable, allow easy sorting of transaction into the order of processing, and just seem to be THE obvious way to go. But they are not unique! If your development machine is slow enough, relative to the transaction’s processing time, you may indeed not have issues. But that is only because at least one millisecond (you don’t use a resolution of seconds, do you?) goes by between transactions. A production machine, however, will likely be much faster. And what if multiple machines are working in parallel?

In one case I have seen there was considerable data loss, because someone had been clever enough to use a timestamp with a resolution of only seconds as the filename for writing PDFs into a directory. From there an archiving solution then picked them up for storage to fulfill a legal requirement. This guy’s notebook had been slow enough (it was in the early 2000s) that all files had been several seconds “apart”. But the production machine was a beefy server and it took several weeks until someone realized what had happened. Tens of thousands of documents were lost forever.

I hope this quick overview provided some value to you and will help you in the next discussion on why you really need a proper UUID.

DevOps and Ownership

“You build it, you run it” has been my mantra for many years now. A number of times I was approached by management and they asked who should be operating stuff that I had built. Because, allegedly, my time was too precious for doing such a mundane task like operations.

This is to all managers: Operations is neither mundane nor something for junior staff. It is in fact exactly the opposite. Operations is what keeps the organization alive. Operations is where the best people should be, because here the rubber (the developed software) hits the road. Operations is your last line of defense, when (not if) something goes catastrophically wrong. Operations is a key influencing factor on your organization’s ROI. Operations determines your ability to be agile on the market. Operations is key for customer satisfaction. I could go on and on, but likely you long got my point.

Of course there are some aspects to operations that, when things are done the wrong way, are repetitive and far from challenging. But that should mostly be behind us. Yes, in the 1960s we had people who did nothing but enter data. And until not too long ago a lot of operations was just ticking off check boxes on a to-do list. But with things like infrastructure as code (see my recent post on starting with Chef Infra Server), this should really be something from the past. What you need today are people who take pride in running a lean, highly automated, highly resilient IT organization.

And that is where it should be clear to everybody, that DevOps is much more about organization, knowledge, and collaboration beyond traditional “borders”, than about technology.

By the way: My response to management about who should run my stuff, has always been “me”. Because the applications were built to be as maintenance-free as possible. Only the occasional support ticket had to be answered and with proper logging/auditing that is nothing that takes a lot of time. And fixing the occasional bug was not a big deal either, thanks to Clean Code and test-automation.

This allowed me to support 6 business-critical applications as a “side-project”, i.e. no time was officially allocated. Comparable applications operated by other departments had at least three people full-time for support only.

My 2020 Setup for LaTeX

Here is a short write-up of my current LaTeX setup. Since I sometimes need to process documents on Linux systems (usually in a CI/CD context) the natural choice for me these days is TeX Live on Windows.

My preferred editor is probably less common, especially on Windows: Emacs. I have been using it for more than 20 years and with the right add-ons (AUCTeX and RefTeX) it is still the best LaTeX editor for me. Would I recommend it to someone today who does not already know how to use Emacs? Probably not, given the learning curve. But in the late 1990s there was no real alternative on Linux. And LaTeX on Linux it had be for creating high-quality graphics with Xfig and replace text in the EPS files with full-blown LaTeX code for amazing formulas etc.

But let’s go back to the present time. Here is what I did:

  • Download Windows installer for TeX Live
  • Start installer with administrator rights (right-click) and accept all default settings, then wait a really long time (more than three hours on an old Lenovo Thinkpad T520)
  • Install Emacs. I still have EmacsW32 lying around (you need to fix some security settings), but it is no longer available for download. If you look for an alternative, perhaps you find something here.
  • Install Sumatra PDF. The critical feature for me is that it does not hold a write-lock on the file. So when the output PDF is updated in the background by latexmk, it does not cause any problems. I did the installation as administrator and changed the location to C:\Program Files\SumatraPDF because I personally prefer it that way.

That’s all. Enjoy writing 🙂