Technical debt, and the Death of Design: Part 2
This is the second article that I wrote for the Scrum Alliance . It was orginally published on July 24 and can be found here. The article presented below is the original unedited version with all my grammar and spelling mistakes!
My last article dealt with Technical Debt and Design Death. This article is going to expand upon those ideas. I’d like to discuss what happens once you already have legacy code. What are your options for dealing with it, and where exactly the point of no return?
This is a far-from-academic discussion. Any reasonably large bank or insurance company will have a multitude of legacy systems, many of them “Green Screens” [ie, dumb terminals connected to an IBM mainframe running some Cobol database application.] There are many hundreds of banks in the US alone. Even comparatively new companies (less and 20 years old) can have legacy systems although they often use the term core rather than legacy.
Anyone working on these legacy or core applications will identify with the characteristics identified by Ken and which I outline in part 1. To recap, these are:
- The code is considered part of a core or legacy system.
- There is either no testing, or minimal testing surrounding the code.
- There is highly compartmentized knowledge regarding the core/legacy system, and it may be supported by only one or two people in the company.
- The legacy system is not in a know state. By that I mean it can be difficult (if not impossible) to determine the state of the system at any given point in time. Installing the system and recovering after an failure is often considered to be some form of black art.
I would add one additional characteristic and that is:
“legacy system (n): A computer system or application program which continues to be used because of the cost of replacing or redesigning it and often despite its poor competitiveness and compatibility with modern equivalents. The implication is that the system is large, monolithic and difficult to modify.
If legacy software only runs on antiquated hardware the cost of maintaining this may eventually outweigh the cost of replacing both the software and hardware unless some form of emulation or backward compatibility allows the software to run on new hardware.” – The Free On-line Dictionary of Computing
Entropy is a term borrowed from thermodynamics, but when applied to software it can be considered a measure of disorder. Entropy in software comes from changes to the codebase, such as bug fixes, updates to existing functionality and the addition of new functionality. Over a period of time these small changes amount to a system that can be difficult to change, overly connected to external systems and with no clear delineation of functionality .
Competition drives down the value of existing software. In order to remain competitive new functionality must be constantly added just to maintain the value of existing software. At this point, I was going to outline a fictional scenario to demonstrate how this happens, but as it happens I came across an article describing an actual situation. This article  demonstrates the relentless nature of competition far better than I ever could.
In order to remain relevant, software [or the service provided by that software] needs to continually increased in value. This implies that in order to remain relevant software needs to be continually changed (i.e., updated). But the very act of this change increases the entropy of the system thereby increasing the cost of change.
In the last six months alone I’ve talked to at least three different companies who are planning on re-writing the systems from the ground up. These are three very different businesses with different markets and business models. When asked why they were doing this, the common response was that the cost of change was too high. [Sidebar: Interestingly, two of the three also mentioned that the current EJB framework is too heavy-weight and they are looking for a lighter-weigh framework. They're both evaluating EJB 3 in addition of open source frameworks such as a combination of Struts with Hibernate.] Automated tests (unit tests, acceptance test, FIT/FITness test etc) help decrease the cost of change and help the system reach a known state, but without an adequate automated testing framework the task of changing the legacy system becomes increasingly expensive.
Let us consider a fictitious company that has failed to add value to their product. We’ll call the company NFI. The following graph shows the gradual decline in revenue. These numbers are fictional.
Choices for legacy systems
The management at NFI knows that they need to increase the functionality to their system in order to remain competitive. They have three options which are:
- Add the functionality to the core system. This would be prohibitively expensive, as the ongoing addition of new functionality would exponentially raise the cost of the software. This is often not a practical [cost effective] solution.
- Introduce a temporary solution that would allow NFI to use the existing legacy system in addition to the new functionality. This would not address the underlying problem, but may give the company more time. One way in which this is commonly done is by building a web service layer on top of the existing legacy API. New functionality is then constructed along side the existing system and they are both integrated at the web services level. What happens when new data needs to be added to the legacy data model?
- Reconstruct the existing functionality using new platforms and technology. This solution addresses the underlying problem. It is however more expensive than option 2 [but less expensive than option 1].
How does NFI decide which option is most suitable in their particular situation?
Knowing when it’s too late
For the sake of argument, let us assume that NFI have decided to rewrite existing functionality. We can draw a graph of Revenue (declining) and Functionality (increasing) over time (below).
I’ve used Story Points as a measure of functionality but provided that the units of measurement are consistent they can be anything. I’ve also made the assumption that any rewritten system is easier to maintain with a lower cost of change. I feel that these are reasonable assumptions, if some Extreme Programming (XP) practices [such as Continuous Integration (CI), Test Driven Development (TDD), Refactoring etc] are used.
From the graph above you can see that the rewritten functionality will be completed in mid-2004. This situation is a viable option for NFI. If the company has time then this is one approach that should be investigated. But what happens when the cost of rewriting the most basic functionality of the legacy system is projected to take longer than the company has positive revenue?
It’s pretty clear that a company in this situation has some difficult decisions ahead: There maybe some temporary solution that would allow NFI to use the existing system while building a new product; NFI may decide to borrow money to fund the rewrite, or NFI may want to consider returning any remaining value to their shareholders. Whatever the final decision, by constructing some simple graphs it’s possible to present management with a number of different options for which to choose a course of action.
 This is a good article about the decline of Parametric Technologies (PTC) in part due to some of the issue that are discussed here.
 The Big Ball of Mud pattern.
 “Working effectively with Legacy Code“, Michael Feathers