How much do embedded systems development errors actually cost?
Ever since I started my career in the semiconductor industry, the seemingly tacit knowledge that errors found later in a project cost exponentially more to fix seemed to be a staple of every marketing person's slide-set. The aim, it seemed, was to instil fear into all potential customers that, if they were to opt for the "cheaper" (read "inferior") competitor product, then there would be costs to pay later on during the product's development. You can sort of imagine it as a pirate treasure map with a "Here be dragons!" scrawled in blood in the top right corner, just where X marks the spot.
Such graphs stem from books such as "The Mythical Man Month" by Frederick Brooks who, back in 1975, published his observations as the manager of the development of the OS/360 operating system at IBM. The central theme was that "adding manpower to a late software project makes it later", which is also known as Brooks' law. However, it was actually work by Barry Boehm in his book "Software Engineering Economics" that looked in detail at the economics behind software development and the costs associated with both developing code and fixing errors.
So, although such graphs of doom can seem like marketing fluff in the eyes of the professional development engineer, they are actually grounded in real research of complex projects. The books by Brooks and Boehm are large tomes (over 1000 pages when added together) requiring a certain commitment to consume them. However, there are other sources of information that are easier and faster to read. Here we will look at two that focus on two differing ends of the same problems:
- Error cost escalation through the project life cycle - NASA Johnson Space Center
- Cost of delay - Eric Graves
Error Cost Escalation Through the Project Life Cycle
This paper from NASA looks at several real projects developing a hardware/software system, the type of project that accurately reflects embedded systems development. Three different methods were used to post-analyse the cost of fixing errors depending on when they were discovered in the project's life cycle. The results in many of the graphs are normalised to consider that making a change to fix the problem during the project's requirement phase would require an effort of 1, whilst the effort at later stages would be equal to or greater than 1. The projects drew upon spacecraft, aircraft and satellite development respectively as follows:
- Bottom-up cost - The finished project was analysed to determine how much each phase of the project has cost from requirements through to operation. This was then used to calculate the cost factor associated with an error correction at the later stages. The results show a linear relationship between the increase in cost factor and the project phase, rather than the more exponential explosion seen in Boehm's analysis of pure software projects.
- Total costs breakdown - In this study, a twenty year aircraft development project was analysed to see what errors had occurred, where they were in the development process and the cost to fix them. This study focused solely on hardware errors. The results more closely reflected the exponential cost growth as shown by Boehm et al.
- Top-down hypothetical project - Using cost estimations for the development of a communications satellite, this final method used a MATLAB model to determine the costs to fix various errors found at different points in the project life cycle. Negating the enormous costs for fixing costs in operation (satellites are not typically repaired or serviced in space) the cost growth was again shown to be exponential with project phase. In addition, the MATLAB model showed how dramatically costs to fix errors can be that impact surrounding subsystems.
|Project Phase||Method 1||Method 2||Method 3|
|Design||8x||3 - 4x||4x|
|Build||16x||13 - 16x||7x|
|Test||21x||61 - 78x||28x|
|Operations||29x||157 - 186x||1615x|
Reproduction of Table 12 - Comparison of Method 1, 2 and 3 cost factor results
Thus it would seem that all those seemingly knocked-together graphs of costs exploding are grounded in sound analysis or scientific models based upon real projects.
Cost of Delay (COD)
With all the focus on the economics of fixing problems during development, the reason for starting the project in the first place is can often be forgotten: namely, it made economic sense. In order to commit the time, resources and financing for a new development, product marketing is required to provide an economic forecast for the return on investment (ROI) that can be expected. This typically factors in items such as:
- Available market size for the solution
- Market conditions in that market space
- Competition analysis
- Expected sales
- Price, profit margin and break-even point
- Upper and lower limits for the above that factor in various scenarios
To summarise, a manager or management team commit resources based upon a forecast that shows a better-than-reasonable return on investment with well though through potential risks and the mitigation thereof.
This forecast and the associated analysis will make assumptions based upon several factors, such as:
- When the market window opens
- How quickly production can ramp to meet customer demand for the product
- How much product can be expected to be sold during the lifetime of the product
- When the market window closes again
Graves states that any delay in bringing the final project to the market can have a significant impact on this business case, as well as the resulting ROI, due to the following factors:
- The market window will most likely still open at the time specified, since many competitors are also trying to hit the same market window, the demand has been determined to exist and will be fulfilled by someone.
- If customers are not won and supplied during the identified ramp up phase, they will probably turn to alternative solutions from competitors if they are not provided with a satisfactory solution within their project's time frame.
- The market window will, most likely, still close at the time specified. This is because technology advances will make the current product obsolete.
The result of this is that the "time to sell" becomes shorter, due to late market entry. In addition, and perhaps more worrying, is that the estimated peak sales will no longer be attained since the customers that they were attributed to turn to other suppliers for the duration of their projects for which the solution is useful. The result is that the total sales that made up the ROI of marketing's forecast, the area under the curve, becomes successively smaller the later the project becomes.
Although the oft cited "cost explosion" may look like another recycled marketing slide, the reality is that the later an error is found in a development project, the more costly it becomes to resolve it. The NASA report also showed how errors that impact other subsystems can have a significantly larger cost impact than those whose mistakes are confined to the subsystem where it occurred. In addition, it isn't just the R&D team that bear the burden of the costs. The COD analysis described by Graves highlights how a companies entire financial performance can be impacted by a delayed market entry as customers look elsewhere for alternative, even inferior, solutions so as not to delay their products.
The integration of a testing strategy into the development of embedded systems can seen to make sense, regardless of whether you consider the cost of fixing errors with respect to project time line, or with respect to the potential impact on sales for the finished product. With that in mind, we wish you Happy BugHunting!