New edition Software Metrics

Software metrics are measures that make the quantity, complexity and quality of software measurable, from requirements to architecture and code to test results. They are used for project management, effort estimation and quality assurance. Without them, a project moves like a car without a speedometer, fuel gauge and navigation system.

Key Takeaways

Despite their clear benefits, software metrics are still used too rarely in practice, even though simple measures such as error trends or test case numbers can significantly improve project management.
Individual metrics only provide reliable statements when they are put in relation to each other: Ten test cases for 500,000 lines of code immediately show that the ratio is not right.
Software metrics can be applied not only to code, but also to requirements documents, architectural designs and test results, which most projects completely ignore.
If you combine historical defect data with estimated test case numbers, you can predict defect quantities for later project phases with astonishing accuracy, as a project example with plus/minus ten percent deviation shows.
You don’t have to start with complex formulas to get started with metrics: A simple list of business case, complexity level and expected number of test cases is enough to generate an initial solid planning value.

Why software metrics are often neglected

Software metrics are hardly used in many projects, even though software development is an engineering field. There are two reasons for this. Collecting metrics means work, and they sometimes bring unpleasant facts to light.

Manfred Baumgartner has observed this approach over many years. Even simple metrics are often not used sensibly, even though they would be easy to obtain. Compared to other production processes and engineering techniques, surprisingly little is measured in software development.

Part of the problem lies in the industry’s self-image. Software is seen by many as something artistic that defies engineering. It is precisely this self-image that leads to quantities, complexity and quality remaining unmeasured.

A project without metrics is a car without a speedometer

Metrics are the sensors of a project. Without them, you are driving blind. A car without a speedometer, fuel gauge and navigation system only gives you a vague feeling for speed, but no reliable information about your location.

The image can be transferred directly. How much budget do I have left, how much time, how far can I get with it? These questions are just as relevant in the course of a project as the fuel gauge in your car. Nevertheless, many projects run as if they don’t have a single one of these sensors.

40 km/h is fast in town and slow on the highway. A number only takes on its meaning through context. That’s exactly why it’s not enough to collect a metric. You need to know what it stands for.

Three dimensions: Quantity, complexity, quality

Metrics can be organized along three dimensions: Quantity, Complexity and Quality. This triad gives structure to an otherwise confusing abundance of key figures.

Quantity metrics are the simplest. You count lines of code, you count test cases, you count artifacts. That alone is a useful statement, for example if you want to estimate the necessary testing effort.

Complexity is more difficult to grasp, but it determines the effort required. How complex a software is drives the testing and retesting effort more than the sheer quantity. Quality is the third aspect, and the term itself is not generally defined, which makes measurement challenging.

A number alone says nothing, the reference makes it useful

Quantity metrics only become meaningful through reference. Ten test cases with 500,000 lines of code immediately show that something does not fit together. The individual number is worthless, the relationship is the finding.

There is no universal rule of thumb. No one can say that you need exactly 150 test cases for 100,000 lines of code. Every piece of software is created differently. Different programming languages generate different amounts of code, and those who work heavily with libraries have little self-developed code, but must have test cases ready for all code.

This has a practical consequence for organizations. Metrics work best as a framework or guideline that is adjusted iteratively. Values can hardly be transferred from one application to the next, but they can be adapted over time within a company.

Not only code can be measured

Far more than just the code is measurable. Requirements, design, architecture, code and testing each form their own measurement objects with their own metrics.

Measurement of requirements has become more difficult. Where there used to be requirement and functional specifications, today there are often only stories in agile projects. It is difficult to derive a quantity or complexity from pure text. However, where there are described or model-driven requirements, metrics are possible.

Code metrics are what most people think of first. In the past, GOTO statements were counted, today class calls and lines of code. There are also test metrics with a focus on test deliverables and test results.

The design can be checked surprisingly well. You can count the possible forms in a design document. If there is a “maybe”, “it could” or “under certain circumstances” in every second sentence, implementation will be difficult, especially in the test. Developers often have fewer problems with this because vague formulations give them leeway.

What you use a metric for decides everything

The question of purpose comes before every metric. What do you want to do with it? Only the answer to this question turns a number into a useful tool.

Design metrics can be used to derive effort, such as how much development and testing effort a design is expected to require. This is part of the planning of projects or sub-projects. Other metrics are used for quality assurance of the test object itself, i.e. to check whether standards and criteria have been met.

When migrating software, it is helpful to know the volume and internal quality of the passed system in advance. Whether a system has been developed cleanly and straightforwardly or is a mess makes a big difference for refactoring. Function point measurement provides a standardized measure of the functional scope that can be calculated from the existing code.

The trend counts more than the individual measured value

For legacy projects, the trend direction is often more valuable than the absolute value. A static analysis of grown code quickly produces staggeringly high figures. The more honest and useful guideline is then: it must not get worse.

This approach takes the pressure off. You don’t have to rebuild all legacy code. It is sufficient to keep the value at its current level while new code is added and existing code is rebuilt.

Metrics also help with architectural decisions. If several systems in different programming languages do the same thing, complexity and quality metrics show which system is suitable for further development and which is better to phase out.

What is measured in practice and what is missing

From a testing perspective, error and error trend evaluation is the classic method. It is widespread, but there is often room for improvement when it comes to interpretation.

In agile projects, the data situation is blurred. Not every bug ends up explicitly in a bug management tool, and some get lost in a backlog that cannot be analyzed. Some companies continue to document consistently, others lose track.

Measurement does take place at the technical level. Metrics from the CI/CD process and code coverage at unit testing level are available. The analysis of what these figures say about actual quality and scope often remains too thin.

Productivity measurement also often fails due to a lack of data. If the specialist department is heavily involved in test projects but its hours are not recorded in terms of test processes, it is not possible to calculate reliable productivity. Burn-down and burn-up charts are drawn up, but disappear in the hustle and bustle without anyone deriving any measures from them.

A simple metric that has worked in real projects

A metric doesn’t have to be complex to be effective. A pragmatic approach from a real project passed three columns: Business Case, Complexity, Number of Test Cases.

The estimation logic was simple. Low complexity meant two to three test cases, medium about ten, high about twenty, uncertain cases potentially hundreds. This resulted in an initial effort estimate: number of test cases times days per test case.

The decisive factor was the adjustment after each sprint. Planned test cases were compared with those actually designed and the estimated values were adjusted for each complexity level. The plan was defined based on numbers and sharpened based on content.

The result was amazingly accurate. The procedure was also backed up with an expected number of errors per test case, based on historical data. At the end of the project, the actual number of errors was around ten percent lower than the estimate made months earlier. Using the law of large numbers, the values approached the planned value.

You don’t have to hope that the next sprints will be significantly better. You can have hope, but it is usually unjustified.
Manfred Baumgartner

This projection into the future is the real value. If you see early on that the errors are statistically distributed instead of lumped together at the beginning, you can take action: pull additional test resources or give a warning signal to the project management in good time.

How to work with a metrics compendium

You don’t read a compendium from cover to cover, but selectively. The introductory chapters on awareness of the three dimensions of quantity, complexity and quality are worth reading in one go. From there, the index helps you to access the information you need.

For a dimension such as design, there are several approaches to measurement, for example according to Tom Gilb or Card and Glass. There is no single measurement that is uniformly defined by a standardization institute. This diversity is an invitation to choose the approach that suits your own environment.

The first step is always the same. Get an overview, then look specifically for what you can use in the project. None of these metrics come ready-made from the tool. You have to do something about it yourself.