Court expert

A court-certified software expert examines whether software is state of the art and what quality defects are present. State of the art is not a technical term, but a legal one: a solution must not only work, but also be proven, targeted and suitable for the specific area of application.

Key Takeaways

If the defect density of a software exceeds two bugs per 1,000 lines of code, it is considered immature and is not yet suitable for productive use.
“State of the art” is not a technical term, but a legal one: a technology must not only be new and widespread, but must also be tried and tested in precisely the area of application in question.
Writing more unit tests to increase low code coverage often makes the situation worse because it cements existing misbehavior that the code already relies on.
Agility increases speed and productivity, but according to Sebastian Dietrich’s observations, it does not lead to higher software quality than earlier, more complex development approaches.
Lawsuits over software defects rarely actually end up in court because public proceedings expose clients to their customers and both parties therefore prefer out-of-court settlements.

What a forensic software expert examines

A court-certified expert for software assesses whether a software corresponds to the state of the art and what quality it actually has. The full title is “generally sworn and court-certified expert”. Sworn means that the expert has taken an oath once in court and does not have to be sworn in again for each procedure. Certified means that he has passed an examination before a judge and two expert examiners in his specialist field.

The idea behind this is simple. If two parties in a civil case both insist on their rights and the judge is not technically familiar with the matter, he proposes an expert from a list. If both sides agree on this person, they believe the expert’s opinion. The expert provides clarity where the court itself cannot judge.

Why software disputes rarely end up in court

Most software disputes are settled out of court, not in court. The reason is discretion. A company that is unhappy with purchased software does not want its customers to know about the problem. Court proceedings are public, so the parties prefer a quiet settlement.

One exception is state-affiliated companies. Here, the Court of Auditors could critically scrutinize an out-of-court settlement and suspect improper cash flows. These companies therefore deliberately seek a court settlement, even though the actual dispute has often long since been settled. In civil proceedings, the judge first asks whether an out-of-court settlement would be possible. The answer is then no, but the documents have already been prepared and only need to be signed.

State of the art is a legal term, not a technical one

State of the art does not mean the latest and coolest, but the tried and tested. This is the most important correction for every developer who reflexively reaches for the latest technology. The term comes from the law and appears in many laws. If software does not correspond to the state of the art, this has consequences for warranty and compensation.

Four criteria make up the state of the art. A technology must be up-to-date, it must be progressive, it must be tried and tested in the respective field of application, and it must solve the task effectively and efficiently. New and celebrated at conferences is not enough. What counts is that it has proven itself in actual use.

One example is many JavaScript web technologies. They are often not suitable for software that is intended to run for ten or twenty years because they change too quickly and do not offer reliable long-term support. A new version is released and six months later the old version is no longer maintained. In a short-lived web context, this is tolerable. With a long runtime, it even blocks security-critical updates at some point.

State of the art is not a technical term at all, but a legal one. It is not enough that I can solve the problem. I have to solve it efficiently and effectively, and that means I have to think about the next few years.
Sebastian Dietrich

How defect density can be used as a maturity criterion

The defect density indicates how many errors are still undiscovered in a software, measured per 1000 lines of code. If the defect density is above two defects per 1000 lines of code, the software is considered immature and not suitable for productive use. Where human lives are at stake, the threshold drops to 0.5 defects per 1000 lines of code.

The number of errors originally contained can be roughly estimated using formulas. It depends on the size of the software, its complexity and also on the automated tests. Ongoing automated tests find errors in advance that never end up in a bug tracker, but are fixed directly.

From the estimated total number, the expert subtracts what testers, trial operation or live operation have already found. What remains is the number of undiscovered bugs. In practice, these figures are sometimes surprisingly high, even for safety-critical software, for example between 5,000 and 12,000 bugs in a large system.

Tools provide the overview, not the verdict

When it comes to software with millions of lines of code, nobody checks every line by hand, but instead uses tools to get an overview. The best-known tool is SonarQube. Its advantage is that it works across programming languages and makes size, complexity and technical quality visible.

However, SonarQube remains low-level. It shows code smells, but not architecture or design errors. A code stench goes further than a smell: it is highly likely to lead to a bug. A typical example in Java is the comparison with a single instead of a double equals sign in an if condition. Such code runs, but is almost certainly not written that way deliberately.

Be careful when correcting blindly. Fixing smells across the board can create new bugs because parts of the system may have relied on the previous misbehavior. And no tool can assess whether a framework is suitable for supporting software for the next ten to fifteen years. This requires a look at the history of the framework, the number of committers and the actual support. This is a question for people, not for a tool.

Professional and technical quality are two different things

Software quality is divided into a functional and a technical dimension, which are tested separately. Technical quality asks whether the software does what was required: whether all stories have been implemented, whether the load and requirement specifications have been fulfilled. This is the field of testing.

This is where the view from behind helps. There are many test types, all of which are considered state of the art, and each one is not sufficient on its own. The decisive question is how many errors the respective test type has found. If a test type finds nothing, this does not mean that the software is error-free. Then it is the turn of another test type.

Technical quality also includes non-functional requirements, and these are often vaguely formulated. Usability is rarely mentioned in concrete terms, while performance is simply defined as the application’s performance. What performant means remains open. Two seconds response time is a benchmark, but an annual financial statement can calculate for longer without being non-performant. The sober tester question here is often simply: Have you ever tested usability or performance?

Technical quality revolves around the state of the art and has hundreds of criteria. These range from naming and coding conventions to code smells, dependency cycles, dead code analysis, duplicate code and the question of whether the defined architecture has been adhered to at all. One thing is whether the car is beautiful inside and out and gets you from A to B safely. The other is to open the hood and check whether there is a metric screw or a wood screw.

More unit testing is never the right recommendation

If the test coverage is too low, the recommendation is never to simply rewrite more unit tests. This leads to worse improvement because it cements a bad state. Code coverage says nothing about the intelligence of a test anyway.

The better way is to start with the change process. For every change to the code, a test is written beforehand that checks exactly this change, and it is an intelligent test. This increases quality where the system changes instead of blindly producing coverage.

This logic underpins the entire procedure via so-called quality gates. The principle is always the same: it must not get worse, it must gradually get better. If the coverage is 13 percent, although 70 percent was promised, it must increase with every release. The rules must apply to new or changed code, and the value must never fall for metrics such as coverage or duplicate code.

From Mount Everest to the Zugspitze

An initial quality finding is usually a huge pile, and the right reaction is not shock, but the first step. The picture: You are standing in front of Mount Everest. You climb a mountain by walking and taking care not to fall. Individual measures, keep going, don’t fall back.

The goal doesn’t have to be the summit. At some point, the team stands on the Zugspitze and realizes that the quality is now sufficient, even if the system is still sluggish. If you negotiate a conditional acceptance, you get time to do just that: the software can be used, defined points must be rectified within a year, and a large part of the payment only follows afterwards.

If you are serious about reducing technical debt and continuing to develop in parallel, you should expect it to take three to four years before the software reaches the level you would have liked for acceptance. This is exactly where SonarQube helps again, because it can map the quality gate logic of “better than before”. The switch that activates all rules at once and produces a report that tears everything apart is the wrong way to go.

Agility has increased the speed, not the quality

Agile development has increased speed and productivity, but not quality. This is the uncomfortable observation from many audits across industries. In the past, an enormous amount of time was spent on tons of paper, resulting in the same professional and technical quality that is now achieved with less effort.

The mindset has become more agile, often more than just on paper. People think in a more agile way and no longer stubbornly fill out test reports, but approach things more intelligently. The output itself has become cheaper as a result, but not better.

This also applies in highly regulated fields such as the aviation industry or the healthcare sector. More documentation is required there, many things are less agile and everything is more expensive. This makes little difference to the result. These sectors also only boil with water.

Take a look at what’s left of your software years later

The most honest quality test comes years after the go-live. Five, six, seven years after project completion, go to where your software is being used and see what happens to it. Only then will you know whether a cool project has delivered real value.

Two outcomes are possible. Either the software is still running and delivering value, in which case it was a good project. Or it has long since been thrown away because nobody was able to work with it. User satisfaction alone is no proof of business value. The value is only measured when the system is in live operation.