Measuring quality

Software metrics are targeted measurement parameters that provide a data basis for decisions in software development. Meaningful metrics fit the context, show trends rather than absolute numbers and are accessible to the person who needs them. Quality, productivity and performance are closely linked and trends can be identified early on.

Key Takeaways

If you collect metrics but don’t use them to make decisions, you can save yourself the work: Data without consequence is wasted capacity.
Software quality and development speed are directly related: Poorer code quality extends turnaround times because changes become more costly.
Throughput times are not fixed numbers, but random variables: They can be represented as a histogram and used for short-term forecasts with Monte Carlo simulations.
Team comparisons via productivity metrics create internal competition that weakens teams instead of strengthening them, because different contexts do not allow fair comparisons.
Metrics can be thought of in concentric circles: some are only useful for the individual, others for the team, others for management, depending on who knows which context.

Metrics are not an end in themselves, but a decision-making aid

Metrics are intended to help you make faster and better decisions in your day-to-day business. That is their actual purpose, not to fill a dashboard. In many companies, discussions about procedures and priorities are primarily based on opinions. With the increasing complexity of software development, this is no longer enough.

Maik Wojcieszak sums it up with a sentence from W. Edwards Deming: “If you don’t have data, you’re just another person with an opinion. This is exactly where metrics come in. They turn an assertion into a verifiable statement.

If you have no data, you are just another person with an opinion. Maik Wojcieszak

The crux of the matter is not the availability of data. Today, tools collect and display all kinds of figures. What remains weak is their use. Data ends up in dashboards that nobody uses to make concrete decisions.

Why overloaded dashboards make you stupid

Too much information is just as harmful as too little. There are two ways to keep a person incapacitated: You give them too little information or you overwhelm them with too much. Growing, ever more colorful dashboards fall into the second category.

The task is to filter out exactly the information that is relevant to the person’s work. Everything else is noise. A flood of information overwhelms instead of clarifying.

A comparison from everyday life shows what a good metric should look like. A pilot who had to read his instrument data in an Excel spreadsheet would not have quick accessibility. The speedometer in the car, on the other hand, is no longer noticeable, but is constantly used without thinking about it. That’s how intuitive metrics should be in everyday work.

There is no universal standard set

Which metrics are useful depends on the specific case, not on a universal list. A small project needs fewer metrics than a large one. A single team needs fewer than an organization with many teams. A metric that is useful for one use case may be worthless for another.

Maik gives a clear example of this. The DORA metrics from the book “Accelerate” by Jez Humble, Gene Kim and Nicole Forsgren are popular in the DevOps environment. However, if you don’t run DevOps at all, for example when developing mobile applications, you won’t gain anything from the deployment rate. The metric simply does not fit the context.

This leads to a simple rule: first understand the use case, then select the appropriate metrics. Not the other way around.

Measure quality: Reliability, recovery, speed of change

Software quality consists of many aspects, several of which can be measured directly. The reliability of software can be measured using error rates. The average recovery time also provides useful information and trends.

There is one aspect that many teams underestimate: the time it takes to make a change to the software. Most of what is considered internal quality, i.e. clean code and clean architecture, is aimed precisely at being able to make changes more quickly.

Ideally, measurement starts with the requirements. Requirements have an enormous influence on how many rounds a team has to go through before a specification becomes what the customer really needs.

Quality data belongs in the IDE, not in reporting

Quality metrics only unfold their value when developers see them directly when writing the code. Static code analysis and linter results should appear immediately in the IDE and lead to an immediate change in behavior.

The reason is simple. If you don’t build in poor quality in the first place, you save yourself the expensive repair later. If you build it in, it gradually gets worse until you end up with a situation that costs a lot of money and a lot of time.

The reality is often different. Tools deliver mountains of defects that nobody looks at because it is too much and there is not enough time. This is the opposite of intuitive use. The lever: find points where things could obviously be better and start the conversation from there with concrete data. In this way, metrics become a means of communication that also provides managers with a reliable basis for their decisions.

Performance via trends instead of individual values

Performance metrics show their benefits when viewed over time from build to build. Classic performance testers run directly in the pipeline. If you plot the results over several builds, you can recognize trends and react immediately if performance deteriorates. The actual effect of an optimization can also be checked in this way.

Trends are often more meaningful than absolute figures. A single figure can jump in individual cases and not represent a real trend at all. The direction of the arrow, i.e. better or worse, provides information that is relevant for action.

Why team comparisons are misleading

Absolute metrics for comparing teams are rarely useful and difficult to collect cleanly. If you measure productivity by how long it takes a team to complete a certain number of requirements, for example, a lower number says nothing about speed or quality. The context may be completely different.

Such comparisons create internal competition, which is damaging. Maik compares it to crab pots. With a single crab, the fishermen can leave the lid open and it will crawl out. If there are two in there, one keeps pulling the other down. A lot of movement, no progress.

Throughput time is a random variable, not a deadline

Lead time is easy to measure: from the start of a requirement to its completion. However, those who use it must avoid two mistakes.

Firstly, software development must not be viewed as a linear system. Anyone who does is wrong from the start.

Secondly, the completion of a requirement cannot be predicted in advance, only determined retrospectively. This means that the lead time has the character of a random variable. A random variable is not represented as an absolute number, but as a histogram.

As soon as you do this, several things become visible. A skewed distribution arises because some tasks take an unexpectedly long time. This makes long-term predictions difficult or even impossible. For short-term forecasts, on the other hand, the metric is excellent, for example in conjunction with a Monte Carlo simulation.

Important: micromanagement is useless. Nobody has to write down minutes. The lead time provides the necessary information without tipping over into minutiae.

Three things a metric project needs

If you want to approach metrics consciously, you should first clarify the purpose: What goal are you pursuing with it? Only then come the three building blocks.

Building block	What it’s about
Finding the right metrics	Selecting the right metrics for the use case and providing them correctly
Learning to evaluate	Understanding correlations, for example that poor quality increases processing time
Apply	Metrics must actually influence decisions, otherwise the work is in vain

Performance and quality are closely related. If the quality deteriorates, the processing time increases. Conclusions can therefore be drawn from one metric to the other. However, performance depends on many other factors, which is why you have to look at several metrics together in order to recognize the connections.

The most common mistake is in the application. If you collect metrics but don’t let them influence your decisions, you can save yourself the trouble. With a clear goal, selecting the right metrics becomes much easier.

Start with yourself

The best place to start with metrics is your own work, not the big team program. You don’t have to set up a project for the whole organization to benefit. As a developer, tester or in whatever role, you can use and automate metrics for yourself first.

Maik describes this as concentric circles with the person in the center. Some metrics are just for you. You can use them to locate yourself, nobody else needs to see them. Then there are team metrics that help the team to categorize themselves without management needing to know them or the context. On this good basis, further metrics can be derived that management uses for its decisions.

Metrics are good at showing one thing above all else: where you currently stand. In the midfield, at the upper or lower limit. This self-positioning is motivation enough to improve. Not in competition with others, but for your own work.