I quickly wanted to kick around an idea about how to make distributed development within a project easier. Project for the purpose of this writing is not a typical open source project but something more along the lines of a commercial environment. So you will usually have 10-50 people working on a solution with access to common infrastructure (corporate network, central Version Control System, etc.).
So far we basically tell people that they can solve this either on an organizational or technical level. Organizational more or less means phone or email, while technical means pessimistic locking in the Version Control System (VCS). Both have a number of disadvantages and will increasingly be challenged.
I will start with looking at the VCS-based approach. This can be seen as doing the coordination ex post, meaning that upon check-in people realize that someone else made conflicting changes or objects are locked in VCS. The result is unnecessary additional work using diff/merge functionality. So we can happily start a diff/merge exercise and try to make our work fit into the other changes that were done. The common view is that this is perfectly acceptable. What is often overlooked though, is that many such statements are/were made in the context of open source projects, where there is no chance to coordinate upfront. So I look at it as a last resort and not as something that should occur on a regular base because two people accidentally worked on the same thing. Rather diff/merge would be needed for porting stuff between branches and apart from that or similar scenarios should in general be avoided.
With the pessimistic locking approach we are (ab)using a VCS on a conceptual level to coordinate work. And technically not all VCS support pessimistic locking. This becomes increasingly important when we speak about distributed VCS like Git or Mercurial. One other issue with pessimistic locking is that it usually does not prevent you from checking out but from checking in again. And if you forget to acquire a lock, you will have spent time on some work only to find out later that you are basically screwed because someone else also worked on it and did not forget to acquire the lock.
So when you look at all these points, the question raises why the VCS locking approach is still favored by most folks compared to the upfront coordination. The latter would be done by letting everyone know that I will now start working on something. My educated guess is that the most important factor for preferring the VCS locking is that it’s integrated in the toolchain and hence automated. Also, I don’t need to think about who needs the information to populate my email’s To: field nor do I have to switch to another program.
So what if we could combine the ex ante aspect of informing people with the integrated and publish-subscribe nature of the VCS lock? You actually often find this in server-based development environments, where people do not work against a local workspace but a central development instance of the system. These environments usually offer a command to lock certain objects on the server; the lock request is not expressed towards the VCS but the “live” system and I cannot perform any change without having acquired such a lock. I have worked with such a system for many years and while there are certain drawbacks to the shared nature of it, in most circumstances and from a productivity point of view it is just great.
So we need to find a way to incorporate this “live locking” into a setup with many disparate development environments. In terms of implementation we could probably leverage the Eclipse Communication Framework (ECF) and e.g. an XMPP-based IM server. The workflow would roughly look like this:
- After installation the user configures a “Coordination Server” (which would be an XMPP server) and his/her account there
- For each Eclipse project there is a “chat room” or something similar that basically plays the role of a topic (in JMS terms). Whenever someone opens a project of that name, the respective Eclipse instance will be added to that chat room.
- There is a “lock/unlock” entry in the context menu of all objects (e.g. classes). Whenever someone clicks on of those a respective message is sent to the chat room and picked up by all subscribers.
- The open question for me is how to persist that message in a stateful manner and all the associated questions around conflict resolution etc. In general I would favour a mostly manual approach here, because it would make the design/implementation a whole lot easier.
- These machine-generated messages adhere to some naming convention, so that they can be processed/filtered easily. All other messages go through and can be used for human-to-human communication.
These are my initial thoughts and I look forward to your comments, so please feel free to share them.
Recently, a few colleagues and I had a very interesting discussion about what should go into a Version Control System (VCS) and what should not. In particular we were arguing as to whether things like documents or project plans should go in. Here are a few things that I came up with in that context.
I guess the usage of VCS (and other repositories) somehow comes down to a few general desires (aka use-cases):
- Single source of truth
- History/time machine
- Automation of builds etc.
In today’s world with its many different repositories you can either go for a mix (best-of-breed) or the lowest common denominator which is usually the VCS. So what’s stopping people from doing it properly (=best of breed)?
- Lack of conceptual understanding:
- Most people involved in those kinds of discussion usually come from a (Java) development background. So there is a “natural” tendency to think VCS. What this leaves out is that other repositories, which are often DB-based, offer additional capabilities. In particular there are all sorts of cross checks and other constraints which are being enforced. Also, given their underlying architecture, they are usually easier to integrate with in therms of process-driven approaches.
- Non-technical folks are mostly used to do versioning-by-filename and require education to see the need for more.
- Lack of repository integration: Interdependent artefacts spread over multiple repositories require interaction, esp. synchronisation. Unless some kind of standard has emerged, it is a tedious task to do custom development for these kinds of interfaces. Interestingly, this goes back to my post about ALM needing middleware.
- Different repositories have clients working fundamentally differently, both in terms of UI and underlying workflow (the latter is less obvious but far-reaching in consequence). Trying to understand all this is really hard. BTW: This already starts with different VCS! As an example just compare SVN, TFS and Git (complexity increasing in that order, too) and have “fun”.
- Lack of process: Multiple repositories asking for interaction between themselves also means that there is, at least implicitly, a process behind all this. Admittedly, there is also a process behind a VCS-only approach, but it’s less obvious and its evolvement often ad-hoc in nature. With multiple repositories a more coordinated approach is required to the process development, also because often this means crossing organisational boundaries.
Overall, this means that there is considerable work to be done in this area. I will continue to post my ideas here and look forward to your comments!
The idea to write this post was triggered when I read an article called “Choosing Agile-true application lifecycle management (ALM)” and in particular by it saying that many ALM tools come as a package that covers multiple processes in the lifecycle of an application. Although strictly speaking this is not a “we-cover-everything” approach, it still strongly reminds me of the take that ERP software has made initially. Its promise, put simply, was that an entire organization with all its activities (to avoid the term process here) could be represented without the need to develop custom software. This was a huge step forward and some companies made and still make a lot of money with it.
Of course, the reality is a bit more complex and so organizations that embrace ERP software have to choose between two options: either change the software to fit the organization, or change the organization to fit the software. This is not meant to be ERP bashing, it simply underlines the point that the one-size-fits-all approach has limits. And a direct consequence of this is that although the implementation effort can be reduced considerably, there is still a lot of work to be done. This is usually referred to as customizing. The more effort needs to go there, the more the ERP software is changing into something individual. So the distinction between a COTS (commercial off-the-shelf) software, the ERP, and something developed individually gets blurred. This can reduce the advantages of ERP, and especially the cost advantage, to an extent.
And another aspect is crucial here, too. An ERP system, pretty much by definition, is a commodity in the sense that the activity it supports is nothing that gives the organization a competitive advantage. In today’s times some of the key influencing factors for the latter are time-to-market and, related to that, agility and flexibility. ERP systems usually have multiple, tightly integrated components and a complex data model to support all this. So every change needs careful analysis so that it doesn’t break something else. No agility, no flexibility, no short time-to-market. And in addition all organizations I have come across so far, need things that their ERP does not provide. So there is always a strong requirement to integrate the ERP world with the rest, be it other systems (incl .mainframe) or trading partners. Middleware vendors have addressed this need for many years.
And now I am finally coming back to my initial point. In my view ALM tools do usually cover one or several aspects of the whole thing but never everything. And if they do, nobody these days starts on a green field. So also here we need to embrace reality and accept that something like ALM middleware is required.
A hot topic of the last few years has been the debate as to whether traditional (aka waterfall-like) methodologies or agile ones (XP, SCRUM, etc.) deliver better results. Much of the discussion that I am aware of has focused on things like
- Which approach fits the organization?
- How strategic or tactical (both terms usually go undefined) is the project and how does this affect the suitability of one approach over the other?
- What legal and compliance requirements must be taken into account?
- How large and distributed is the development team?
This is all very important stuff and thinking about it is vital. Interestingly, though, what has largely been ignored, at least in the articles I have come across, is the tooling aspect. A methodology without proper tool support has relatively little practical value. Well, of course the tools exist. But can they effectively be used in the project? In my experience this is mostly not the case, when we speak about the “usual suspects” for requirements and test management. The reason for that is simply money. It comes in many incarnations:
- Few organizations have enterprise licenses for the respective tools and normally no budget is available for buying extra licenses for the project. The reason for the latter is either that this part of the budget was rejected, or that it was forgotten altogether.
- Even if people are willing to invest for the project, here comes the purchasing process, which in itself can be quite prohibitive.
- If there are licenses, most of these comprehensive tools have a steep learning curve (no blame meant, this is a complicated subject).
- No project manager, unless career-wise suicidal, is willing to have his budget pay for people getting to know this software.
- Even if there was budget (in terms of cash-flow), it takes time and often more than one project to obtain proficiency with the tools.
Let’s be clear, this is not product or methodology bashing. It is simply my personal, 100% subjective experience from many projects.
Now let’s compare this with the situation for Version Control Systems (VCS). Here the situation looks quite different. Products like Subversion (SVN) are well-established and widely used. Their value is not questioned and every non-trivial project uses them. Why are things so different here and since when? (The second part of the question is very important.) VCSes have been around for many years (RCS, CVS and many commercial ones) but none of them really gained the acceptance that SVN has today. I cannot present a scientific study here but my gut feeling is that the following points were crucial for this:
- Freely available
- Very simple to use, compared to other VCS. This causes issues for more advanced use-cases, especially merging, but allows for a fast start. And this is certainly better than avoiding a VCS in the first place.
- Good tool suppport (e.g. TortoiseSVN for Windows)
Many people started using SVN under the covers for the aforementioned reasons and from there it gradually made its way into the official corporate arena. It is now widely accepted as the standard. A similar pattern can be observed for unit-testing (as opposed to full-blown integrating and user acceptance testing): Many people use JUnit or something comparable with huge success. Or look at Continuous Integration with Hudson. Cruise Control was around quite a bit longer but its configuration was perceived to be cumbersome. And on top of its ease-of-use Hudson added something else: extensibility via plug-ins. The Hudson guys accepted upfront that people would want to do more than what the core product could deliver.
All these tools were designed bottom-up coming from people who knew exactly what they needed. And by “sheer coincidence” much of this stuff is what’s needed for an agile approach. My hypothesis is that more and more of these tools (narrow scope, free, extensible) will be coming and moving up the value chain. A good example is the Framework for Integrated Test that addresses user acceptance tests. As this happens and integration of the various tools at different levels progresses, the different methodologies will also converge.
There is a bunch of folks out there that don’t like SOA (Service-Oriented Architecture) for various reasons. So I try to look at things without all the buzz and distill out a few aspects that in my view provide value. The goal is to provide an alternative perspective that is hopefully hype-free.
I want to split this in two parts: First (in this post) comes the conceptual side of things, which looks at what SOA in my view is about at its core. Second are the more practical benefits we get from the way SOA is approached in real life. This will be a separate post.
As a “preface” let me point out that I look at things mostly from a technical perspective. So everything along the lines of “SOA is about business and not technology” gets pretty much ignored. There are two reasons for that: Firstly I am a technical person and more interested in the technical aspects. Secondly, the business argument is probably the one most resented by skeptics. So let’s get started …
The core thing about SOA is that all functionality is made available as a service (surprise, surprise). This is really trivial and hardly worth mentioning. However, it has far-reaching consequences once you dig a bit deeper. And it’s those secondary aspects that provide the advancements.
- The right level of slicing up things: Of course there were many other approaches and technologies before (e.g. OO and CORBA). However, none of those has kept its promise. Admittedly it remains to be seen to what extent SOA can fulfill expectations. On the other hand, those expectations are still in flux as we all continue to improve our understanding. So there is a chance that expectations and reality will actually meet some time (where?). Also, the criticism I am aware of is not about this aspect. In fact it seems pretty much undisputed, at least I have never heard anything from customers or prospects.
- The service is the application: A direct consequence of the previous point is that the service gets elevated to the place that was formerly held by an entire application; at least from a user’s perspective. Whether the implementation reflects this or not is irrelevant to the consumer.
For new development it is usually desirable that the internals match the services exposed to the outside. For “legacy” stuff that just gets a service interface, the wrapper logic takes this place. In either case, however, the exposed logic is much smaller than an entire application.
- State management: There has been a lot of talk about loose coupling. This principle can be applied at many levels, transport protocol and data format being the obvious ones. A slightly more subtle place is the handling of state, which pretty much depends on the aforementioned transport protocol and data format.
The interface, at least of the initial operation, must be exposed in a way that it can just be called. In other words it is kind of stateless. Of course from that point on everything could be state-full. There is no difference between SOA and traditional applications here. In reality, however, people pay much more attention to keeping things stateless when working in a SOA context.
- Life cycle management: This one is strongly related to the point “the service is the application”. The notion of a life cycle has formerly mostly been on the level of entire applications. The obvious disadvantage is that this huge scope almost demands big bang-approaches for new versions. With the effort and costs associated for bringing a new version into production, there are only very few such releases and much stuff gets crammed into it. The risks increase, business agility is close to zero and nobody is really happy. With a service as the “unit to be delivered”, the scope is drastically reduced and so are things like risk, complexity etc. Thus I think the concept will be brought to much more practical relevance than before.
- Standards: Although a real SOA can be built using an entirely proprietary technology, in reality a set of standards has established itself. Yes, there are probably to many WS-* things around that only work together in particular versions. So I would not recommend to jump onto too many things. But with the basic things you are certainly on the right track.
There is probably a whole lot more to say, but let’s keep it here for now. I am looking forward to any comments on the subject!
One response I got for my post on Lifecycle Management was the following:
Are you saying that one idea is that an organisation can run its whole development lifecycle by modelling it inside of a BPM tool, using it like a workflow manager?
Yes, exactly. Many people are currently thinking about how they can streamline the approval chains they have around their software development processes. The challenge here is in many cases that the tools (project management, spreadsheets, …) are disconnected from the layer where the actual work happens. And we all know what happens in such a case sooner rather than later…
So the idea is to connect the various layers that have a role here. The top-level of such a process is usually about the “stage” of a software asset (e.g. new application, enhancement, bug fix etc.). What you can then do is manage the transition from one stage to another (e.g. from development to testing) by means of a BPM system. This is the first level of visibility.
To take things a step further would then mean to also connect the stuff that gets you a deeper insight. Those would typically be the technical systems like issue tracking (these have a broader scope than just bug tracking), built systems, results from automated testing etc.
So at the end it all comes back to the idea of end-to-end visibility. And here, once more, the advantage of a universal BPM system (as opposed to something that comes together with a specific application like CRM or ERP) kicks in: You can connect pretty much everything quickly and then model the process you want.
And for management the whole thing comes with reporting and business activity monitoring (BAM) out-of-the-box. So you not only know what’s going on an individual level but also get the complete picture in real-time whenever you want. I would think that this is what many managers are longing for.
For some time the topics of SOA Governance and BPM have been looked at as if they were two relatively unrelated things. And this perception is correct in the sense that you don’t have to have them together. However, more and more people realize what huge additional benefits are in for them if they combine the two things. In many cases the idea is that you need some logic to govern the actual work (design, development, testing etc.) for a process that has been modeled in a nice fancy tool.
But you can also do it the other way around: Think about what you would get if you could govern your whole IT lifecycle management from one tool. The idea goes like this: You store all relevant information about “objects” that are relevant for your organization in a central repository, and the different attributes that describes those objects (aka assets) are completely freely configurable. You probably need to attach additional information to them, like existing documentation etc. So in a way you can think of these information in the repository as a way to store the knowledge about all relevant aspects of the organization and then leverage this knowledge.
Now based on that groundwork, whenever a request for a new a feature in the IT landscape comes in, you can have it go through a “workflow”. The first steps would probably be about an approval chain. So people from various functions (e.g. product management, operations, security, marketing etc.) would need to either approve or reject this. How the final outcome is determined can be a bit tricky (and is much more a political topic than a technical one).
Then come steps like gathering requirements, signing them off, doing the development etc. You probably also want to integrate this whole thing with your development chain (automated testing, continuous integration etc.). At any given point in time you know where your development stands in terms of the project plan.
So if you step back a bit and look at what you get, we are not talking about development tools any more. Instead this is true, real-time end-to-end visibility. There are clear responsibilities for assigning tasks (a human being has to decide on something) and you no longer need to fear emails that are lost in the inbox of someone’s email program. Instead you get a view into the currently open tasks, their due dates etc. Other advantages existing but for now those are the critical ones. The reason for this is that these functionalities allow you to have automatically generated documentation that satisfies your compliance requirements. In most organizations these things eat up enormous amounts of resources and affect processes that should deliver value to the organization.
Let’s leave it here for now. I am quite interested in your comments on this, so please let me know.