Tag Archives: Architecture

Re-Writing Software from Scratch

It is not uncommon that an existing piece of non-trivial software gets re-written from scratch. Personally, I find this preference for a greenfield approach interesting, because it is not something I share. In fact, I believe that it is a fundamentally flawed approach in many  cases, so why are people still doing this today?

Probably because they just like it more and also it is perceived, at least when things get started, as more glorious. But when you dig deeper the picture changes dramatically.  Disclaimer: Of course there are situations when re-starting is the only option. But it typically involves aspects like required special hardware not being available any more.

When I started writing this post with making some notes, it all started with technical arguments. They are of course relevant, but the business side is much more critical. Just re-phrase the initial statement about re-writing to something like

Instead of gradually improving the existing software and learn along the way, we spend an enormous amount of money on something that delivers no additional value compared to what we have now. For this period, which we currently estimate to be 2 years (it is very likely to be longer), apart from very minor additions, the business will not get anything new, even if the market requires it; so long-term we risk the existence of the organization. And yes, we may actually loose some key members of staff along the way. Also, it is not certain that the new software ever works as expected. But should it really do, the market has changed so much, that the software is not very useful for doing business anyway and we can start all over again.

Is there anyone who still thinks that re-writing is generally a good idea?

Let us now change perspective and look at it from a software vendor’s point of view. Because the scenario above was written with an in-house application in mind. What is different, when we look at a company that develops and sells enterprise software? For this text the main properties of enterprise software are that it is used in large organizations to support critical business processes. How keen will customers be to bet their own future on something new, i.e. not tested? But even if they waited long enough for things to stabilize, there would be the migration effort. And if that effort comes towards them anyway, they may just as well look at the market for alternatives. So you would actively encourage your customer base to turn to the competition. Brilliant idea, right?

What becomes clear looking at things like that, is what the core value proposition of enterprise software is: investment protection. That is why we will, for the foreseeable future, continue to have mainframes with decades-old software running on them. Yes, these machines are expensive. But the alternatives are more expensive and in addition pose enormous risk.

In the context of commercial software vendors one argument for a re-write is that of additional revenue. It is often seen as easier to get a given amount of money for a new product than an improved version of something already out there. But that is the one-off view. What you typically want as a software vendor is a happy customer that pays maintenance every year and, whenever they need something new, first turns to you rather than the competition for options. Also, such a happy customer is by far the best marketing you can get. It may not look as sexy as getting new customers all the time, but it certainly drives the financial bottom line.

Switching over to the technical side, there are a few arguments that are typically made in favor of a restart. My favorite is the better architecture of the new solution, which will be the basis for future flexibility, growth, etc. I believe that most people are sincere here and think they can come up with something better. But the truth is that unless someone has done something similar successfully in the past, there is a big chance that the effort is hugely underestimated. Yes, technology has advanced and we have better languages and frameworks. But on the other hand the requirements have also grown dramatically. Think about high availability, scalability, performance and all the others. Even more important, though, is the business side. With something brand new people will have greatly increased expectations. So giving them something like-for-like will probably not be seen as success.

The not-invented-here syndrome is also relevant in this context and particularly with more junior teams. I have seen a case when an established framework used in more than 9,000 business-critical (i.e. direct impact on revenue) installations was dismissed in favor of something the team wanted to develop themselves. And I can tell you that the latter was a really bad implementation. Whether it was a misguided sense of pride or a fundamental lack of knowledge I cannot say. But while certainly being the most extreme incarnation I have seen so far, it was certainly not the only one.

So far my thoughts on the subject. Please let me know what you think about this topic.

Revisiting Software Architecture

Quite recently I heard a statement similar to

“The application works, so there is no need to consider changing the architecture.”

I was a bit surprised and must admit that in this situation had no proper response for someone who obviously had a view so different from everything I believe in. But when you think about it, there is obviously a number of reasons why this statement was a bit premature. Let’s have a look at this in more detail.

There are several assumptions and implicit connotations, which in our case did not hold true. The very first is that the application actually works, and at the time that was not entirely clear. We had just gone through a rather bumpy go-live and there had not yet been a single work item processed by the system from start to finish, let alone all the edge cases covered. (We had done a sanity test with a limited set of data, but that had been executed by folks long on the project and not real end users.) So with all the issues that had surfaced during the project, nobody really knew how well the application would work in the real world.

The second assumption is that the chosen architecture is a good fit for the requirements. From a communication theory point of view this actually means “a good fit for what I understand the requirements to be”. So you could turn the statement in question around and say “You have not learned anything new about the requirements since you started the implementation?”. Because that is what it really means: I never look back and challenge my own thoughts or decisions. Rather dumb, isn’t it?

Interestingly, the statement was made in the context of a discussion about additional requirements. So there is a new situation and of course I should re-evaluate my options. It might indeed be tempting to just continue “the old way” until you really hit a wall. But if that happens you have consciously increased sunk costs. And even if you can “avoid the wall”, there is still a chance that a fresh look at things could have fostered a better result. So apart from the saved effort (and that is only the analysis, not a code change yet) you can only loose.

The next reason are difficulties with the original approach and of that there had been plenty in our case. Of course people are happy that things finally sort-of work. But the more difficulties there have been along the way, the bigger the risk that the current implementation is either fragile or still has some hidden issues.

And last but not least there are new tools that have become available in the meantime. Whether they have an architectural impact obviously depends on the specific circumstances. And it is a fine line, because there is always temptation to go for the new, cool thing. But does it provide enough added value to accept the risks that come with such a switch? Moving from a relational database to one that is graph-based, is one example that lends itself quite well to this discussion. When your use-case is about “objects” and their relationships with one another (social networks are the standard example here), the change away from a relational database is probably a serious option. If you deal with financial transactions, things look a bit different.

So in a nutshell here are the situations when you should explicitly re-evaluate your application’s architecture:

  • Improved understanding of the original requirements (e.g. after the first release has gone live)
  • New requirements
  • Difficulties faced with the initial approach
  • New alternatives available

So even if you are not such a big fan of re-factoring in the context of architecture, I could hopefully show you some reasons why it is usually the way to go.

Choosing a Technology

I recently started a new hobby project (it is still in stealth mode, so no details yet) and went through the exercise to really carefully think about what technology to use for it. On a very high level the requirements are fairly standard: Web UI, persistence layer, API focus, cross-platform, cloud-ready, continuous delivery, test automation, logging, user and role management, and all the other things.

Initially I was wondering about the programming language, but quickly settled for Java. I have reasonable experience with other languages, but Java is definitely where most of my knowledge lies these days. So much for the easy part, because the next question proved to be “slightly” more difficult to answer.

Looking at my requirements it was obvious that developing everything from the ground up would be nonsense. The world does not need yet another persistence framework and I would not see any tangible result for years to come, thus loosing interest too soon. So I started looking around and first went to Spring. There is a plethora of tutorials out there and they show impressive results really quickly. Java EE was not really on my screen then, probably because I still hear some former colleagues complain about J2EE 1.4 in the back of my mind. More importantly, though, my concern was more with agility (Spring) over standards (JEE). My perception with too many Java standards is that they never outgrow infancy, simply because they lack adoption in the real world. On the other hand Spring was created to solve real-world problems in the first place.

But then, when answering a colleague’s question about something totally different, I made the following statement:

I tend to avoid convenience layers, unless I am 100% certain that they can cope with all future requirements.

All to often I have seen that some first quick results were paid for later, when the framework proved not to be flexible enough (I call this the 4GL trap). So this cautioned myself and I more or less went back to the drawing board: What are the driving questions for technology selection?

  • Requirements: At the beginning of any non-trivial software project the requirements are never understood in detail. So unless your project falls into a specific category, for which there is proven standard set of technology, you must keep your options open.
  • Future proof: This is a bit like crystal ball gazing, but you can limit the risks. The chances are bigger that a tier-3 Apache project dies than an established (!) Java standard to disappear. And of course this means that any somewhat new and fancy piece must undergo extreme scrutiny before selecting it; and you better have a migration strategy, just in case.
  • Body of knowledge: Sooner or later you will need help, because the documentation (you had checked what is available, right?) does not cover it. Having a wealth of information available, typically by means of your favorite search engine, will make all the difference. Of course proper commercial support from a vendor is also critical for non-hobby projects.
  • Environment: Related to the last aspect is how the “landscape” surrounding your project looks like. This entails technology but even more importantly the organization which has evolved around that technology. The synergies from staying with what is established will often outweigh the benefits that something new might have when looked at in isolation.

On a strategic level these are the critical questions in my opinion. Yes, there are quite a few others, but they are more concerned with specifics.

The Non-SOA View on SOA: Part 1

There is a bunch of folks out there that don’t like SOA (Service-Oriented Architecture) for various reasons. So I try to look at things without all the buzz and distill out a few aspects that in my view provide value. The goal is to provide an alternative perspective that is hopefully hype-free.

I want to split this in two parts: First (in this post) comes the conceptual side of things, which looks at what SOA in my view is about at its core. Second are the more practical benefits we get from the way SOA is approached in real life. This will be a separate post.


As a “preface” let me point out that I look at things mostly from a technical perspective. So everything along the lines of “SOA is about business and not technology” gets pretty much ignored. There are two reasons for that: Firstly I am a technical person and more interested in the technical aspects. Secondly, the business argument is probably the one most resented by skeptics. So let’s get started …

The core thing about SOA is that all functionality is made available as a service (surprise, surprise). This is really trivial and hardly worth mentioning. However, it has far-reaching consequences once you dig a bit deeper. And it’s those secondary aspects that provide the advancements.

  • The right level of slicing up things: Of course there were many other approaches and technologies before (e.g. OO and  CORBA). However, none of those has kept its promise. Admittedly it remains to be seen to what extent SOA can fulfill expectations. On the other hand, those expectations are still in flux as we all continue to improve our understanding. So there is a chance that expectations and reality will actually meet some time (where?). Also, the criticism I am aware of is not about this aspect. In fact it seems pretty much undisputed, at least I have never heard anything from customers or prospects.
  • The service is the application: A direct consequence of the previous point is that the service gets elevated to the place that was formerly held by an entire application; at least from a user’s perspective. Whether the implementation reflects this or not is irrelevant to the consumer.
    For new development it is usually desirable that the internals match the services exposed to the outside. For “legacy” stuff that just gets a service interface, the wrapper logic takes this place. In either case, however, the exposed logic is much smaller than an entire application.
  • State management: There has been a lot of talk about loose coupling. This principle can be applied at many levels, transport protocol and data format being the obvious ones. A slightly more subtle place is the handling of state, which pretty much depends on the aforementioned transport protocol and data format.
    The interface, at least of the initial operation, must be exposed in a way that it can just be called.  In other words it is kind of stateless. Of course from that point on everything could be state-full. There is no difference between SOA and traditional applications here. In reality, however, people pay much more attention to keeping things stateless when working in a SOA context.
  • Life cycle management: This one is strongly related to the point “the service is the application”. The notion of a life cycle has formerly mostly been on the level of entire applications. The obvious disadvantage is that this huge scope almost demands big bang-approaches for new versions. With the effort and costs associated for bringing a new version into production, there are only very few such releases and much stuff gets crammed into it. The risks increase, business agility is close to zero and nobody is really happy. With a service as the “unit to be delivered”, the scope is drastically reduced and so are things like risk, complexity etc. Thus I think the concept will be brought to much more practical relevance than before.
  • Standards: Although a real SOA can be built using an entirely proprietary technology, in reality a set of standards has established itself. Yes, there are probably to many WS-* things around that only work together in particular versions. So I would not recommend to jump onto too many things. But with the basic things you are certainly on the right track.

There is probably a whole lot more to say, but let’s keep it here for now. I am looking forward to any comments on the subject!

Michael T. Nygard: Release It! Design and Deploy Production-Ready Software

I bought this book about a year ago and -shame on me- only just read it. It’s really great for everyone that is interested in designing and developing robust software. So in that sense a must-read for all of us.

The book is organized in four general sections: Stability, capacity, general design issues and operations. For all of them a number of typical scenarios are described and general approaches discussed. The author seems to have a pretty strong Java and web application background (at least those are the areas of most of his examples), but the patterns and solutions are great for all systems, languages and use-cases.

So overall what we have here is a book that is fun to read and at the same time offers great insight into large-scale software systems. In my view everyone who works in this field can benefit from this book.

The author is also blogging on Amazon.com and seems to cover quite a few interesting topics.

Lifecycle Management with SOA and BPM: 1+1> 2

For some time the topics of SOA Governance and BPM have been looked at as if they were two relatively unrelated things. And this perception is correct in the sense that you don’t have to have them together. However, more and more people realize what huge additional benefits are in for them if they combine the two things. In many cases the idea is that you need some logic to govern the actual work (design, development, testing etc.) for a process that has been modeled in a nice fancy tool.

But you can also do it the other way around: Think about what you would get if you could govern your whole IT lifecycle management from one tool. The idea goes like this: You store all relevant information about “objects” that are relevant for your organization in a central repository, and the different attributes that describes those objects (aka assets) are completely freely configurable. You probably need to attach additional information to them, like existing documentation etc. So in a way you can think of these information in the repository as a way to store the knowledge about all relevant aspects of the organization and then leverage this knowledge.

Now based on that groundwork, whenever a request for a new a feature in the IT landscape comes in, you can have it go through a “workflow”. The first steps would probably be about an approval chain. So people from various functions (e.g. product management, operations, security, marketing etc.) would need to either approve or reject this. How the final outcome is determined can be a bit tricky (and is much more a political topic than a technical one).

Then come steps like gathering requirements, signing them off, doing the development etc. You probably also want to integrate this whole thing with your development chain (automated testing, continuous integration etc.). At any given point in time you know where your development stands in terms of the project plan.

So if you step back a bit and look at what you get, we are not talking about development tools any more. Instead this is true, real-time end-to-end visibility. There are clear responsibilities for assigning tasks (a human being has to decide on something) and you no longer need to fear emails that are lost in the inbox of someone’s email program. Instead you get a view into the currently open tasks, their due dates etc. Other advantages existing but for now those are the critical ones. The reason for this is that these functionalities allow you to have automatically generated documentation that satisfies your compliance requirements. In most organizations these things eat up enormous amounts of resources and affect processes that should deliver value to the organization.

Let’s leave it here for now. I am quite interested in your comments on this, so please let me know.

David A. Fisher “An Emergent Perspective on Interoperation in Systems of Systems”

When David A. Fisher wrote this paper in 2006, the hype around Web Services and SOA had just begun. The point that struck me most when reading the executive summary, was that Fisher does not limit his thoughts to technical systems. Instead he accepts the fact that the involved people are also an important part of the equation. This aspect seems to be ignored by most authors and is -in my view- a major reason why many theories seem to be so far from reality.

The paper is more than 60 pages long, so nothing for a quick lunch break reading. However, I recommend reading at least the executive summary. It made me curious enough to schedule special time for the rest of the document.

Is a SAN Really the Silver Bullet for Your Performance Requirements?

All too often I have heard things like “You need fast storage? Use SAN”. While this is certainly correct, most people have a relatively vague understanding about SANs in general and what they offer performance-wise compared to local disks in particular. I will not go into the details of things like the various protocols etc. Rather I want to highlight where this requirement for high performance comes from and why SANs are in many cases a good approach to it.

I am looking at all this from the point of view of commercial applications only. What they are doing -in a nutshell- is retrieving, processing and sending data. By the way, if you never touched upon the semantics of terms like data, information, knowledge and how they are related to one another, I suggest to do so. It is an interesting topic and knowing the differences can be helpful in many discussions.

But back to things …. For most commercial applications the overall performance is determined not so much by CPU power but the throughput with which information can be received or sent (or read and written from/to disk when speaking more of persistance). In technical terms this is commonly referred to as I/O throughput. And that is, by the way, a major reason why mainframes will most likely stay around for quite a few more years. They have been optimized for I/O performance for literally decades. This includes hardware (special I/O subsystems that free the CPUs from processing I/O requests), the operating system and also the applications (because people have been aware of this much more than in the open systems’ world).

But I did not want to focus on mainframes today. Rather this post is about a rather common misconception when it comes to I/O or more specifically storage performance. Many people, whenever they hear “We need high performance storage” instantly think about putting the respective data onto a SAN. While this may be the right choice in many cases, the reasoning is not so often correct. For many folks SAN is just a synonym for “some centralized ultra-fast storage pool”. It is in my view important to understand how a SAN is related to other storage devices.

There is no reason that local hard disks are “by definition” slower than their counterpart in the SAN. At the end of the day the main difference between a SAN and local hard disks is that the connection from the computer to the disks is different. For SAN this is either fiber channel (FC) or iSCSI (basically SCSI over IP). For local disks it can be SCSI (serial or good old parallel), Serial ATA or whatever. What determines the speed is the number and individual performance of the physical disks that make up a {en:RAID}. The more disks you have and the faster they each are, the faster your logical drive will be. This is generally indepedent from whether they are connected to a local RAID controller or “simply” sit in the SAN somewhere in your data center.

So as soon as you compare the SAN to a local RAID system (and I am not talking about the RAID-capabilities many motherboards offer these days but dedicated RAID controllers) the perceived performance gap goes pretty much away. You may even have an advantage with local disks because you have direct control over how many drives your file system spreads while with a SAN this may be a bit more difficult to achieve.

I will leave it here for now. Hopefully this somewhat demystifies things for you.

Asynchronous Communication

In the context of SOA (Service-Oriented Architecture) there has been a revival of the asynchronous communication pattern. And that is really what it is: a pattern. First and foremost we are not talking about a specific product, protocol or API but simply a way how to design systems and applications. After this has (hopefully) been clarified between us, let’s look at some of the in my opinion important aspects. I start off with a (certainly non-academic) definition:

Asynchronous communication simply means that when making a call into an “external system”, my program/component/service etc. does not expect an immediate response with the actual result (this would be synchronous communication). I am only interested in getting a positive acknowledgement that my request has been received (but not processed) correctly. I don’t care about any further logic that is going to be executed in some other program. Instead things continue on my side and at some other place the result of my request may be received and processed further. (There are quite a few use-cases when I don’t expect a result at all.)

At a first glance this may seem way more complicated than simply making a call, wait for the result and then continue. And to a degree this perception is correct, although I must say that in my view people exaggerate greatly here. In all the cases I have seen so far the really bad problems were not caused by the pattern itself. Rather it was a cumbersome integration on the tool-level. If you have to bother with e.g. complicated mappings before you can send data over to your message broker from the “regular” code, this will certainly inhibit the use of the pattern (and rightly so). But you should blame your tool set and not the pattern in this case!

What is indeed a bit challenging when you first start using this pattern, is that it requires a fundamental shift of your mindset for designing your system or application. What you need is a loose coupling of your components. Technically speaking this means that there must be another component that is ready to receive a response matching your original request and then continues working on it.

(If you followed the discussion around SOA lately, you may have come across that term “loose coupling” more than once already. So you can think of asynchronous communication as a means for reaching this design goal. Bear in mind though, that we only look at the communication layer here! Loose coupling should also be concerned about the semantics, so in technical terms the data model needs to support this as well. However, I wanted to discuss asynchronous communication here, so let’s leave it with loose coupling for now.)

With one component sending out a request and another one handling the response, we have a solid foundation for a highly distributed system . Yes, you can also design a distributed system with synchronous communication but this is probably more difficult. What comes to mind when talking about distributed systems is scalability. More specifically, we are talking about horizontal scalability (scale out), which is spreading the load over more machines instead of putting more resources into a single machine. Having many relatively small systems work on things typically has two main advantages compared to going for bigger machines: Firstly the approach scales further and secondly it is cheaper because you can use standard hardware.

Another big plus of asynchronous communication is robustness, provided you are using a message broker that offers guaranteed delivery. Once your message broker has confirmed the receipt of the request you can be sure that it will be delivered to the receiver. (You can use the publish-subscribe pattern to allow the sender not having to know about receivers. More on that in a separate post later.) And it is certainly easier to only have to make the message broker highly available than to do the same for each and every piece of code. So by having the message broker as a central piece of infrastructure you can reach a high level of high availability relatively easily.

That’s it for now, there will be more posts on related topics.