Soccer analysis meets software testing

Process mining for regression testing means: production data shows which software workflows are actually run and how often. This can be used to derive specific test cases instead of relying on assumptions. If you know the most frequent runs, you can justify test coverage based on data and focus regression testing on the essentials.

Key Takeaways

Process mining makes visible which workflows are actually run in production and thus provides a data-based foundation for the selection of regression test cases instead of gut feeling.
A single workflow can have over 600 different runs, but the top 10 already cover around 80 percent of all production runs.
Workflows that occur frequently in production but are not covered at all in test operations reveal specific risk gaps that remain invisible without production data analysis.
Manual evaluation of production data loses its value as soon as the person carrying it out leaves the project. Only automation ensures the sustainability of the approach.

Test coverage starts with the question of the denominator

Anyone who claims that test coverage is too low must first clarify what this statement refers to. Test coverage is essentially a percentage calculation: numerator by denominator. Without a defined denominator, any judgment about “too little” or “enough” remains an opinion.

This is precisely where Sven Braxein’s approach comes in. In a project to replace a long-lived legacy software with a new system, the accusation was made that the coverage in regression testing was too poor. His first counter-question was: What exactly is the denominator by which you measure this?

This question is more than rhetorical. It forces a team to reveal the denominator against which they are testing. Only when it is clear what should be fully covered can it be quantified how much of it has actually been tested.

What process mining makes visible

Process mining combines classic process analysis with data mining and shows how processes really work in production. Not the drawn target diagram, but the actual situation.

Every click, every action in a digital system leaves a trace. A process mining tool collects these traces and correlates them. The result: a view of what actually happened in the system instead of an assumption about what should happen.

The technology has been around for just over ten years. Athanasios Kallinikidis got to know it during an internship and later introduced it in his company for the new software. The prerequisite was that he knew the system’s data model well enough to link the right data points.

Soccer analysis as a model for test prioritization

The trigger for the approach came from amateur soccer. Athanasios coaches a men’s team in the local league and records their games with a camera and a high tripod. A software program collects statistics from this: Heat maps, running routes, where a shot came from.

The guiding principle behind it: more knowledge, less opinion. If a player claims to have run a lot, the analysis shows the actual running performance. The coach derives what needs to be done in training on Tuesday from the actual game on Sunday.

This logic can be transferred to software development. If a district league team can learn from its match data for training, a large IT project should be able to learn from its production data for testing. You analyze what is actually happening and align your test activity accordingly.

We look at what happens in reality, then derive what we do in testing and thus arrive at a more stable production.
Athanasios Kallinikidis

Why workflows are a better denominator than business processes

In this specific project, it was not the business processes but the workflows that proved to be a suitable denominator. Although business processes are used in production, they are often underestimated in testing and difficult to grasp.

At its core, the replaced software is a large workflow machine, comparable to status workflows in a ticket system. The project has around 160 different workflows: creating a customer, creating, changing, terminating or prematurely replacing a contract. Each of these 160 workflows is modeled and can be run through in different ways.

This defines the denominator: 160 workflows. The numerator results from the workflows that are actually run through in the test. An opinion about coverage becomes a countable quantity.

The top 10 runs beat the complete coverage

Risk-based testing means covering the few important runs instead of all possible ones. Process mining shows which runs occur how often and makes this prioritization justifiable.

An example from the project illustrates the relationship. The most complex workflow has over 600 different runs. The most frequent run alone covers around 20 percent. With the top 10 runs, you reach a total of around 80 percent of all cases.

This leads to a clear decision: If you have test cases for the top 10 runs, you can do without the remaining 590. Not out of convenience, but because the data shows that this is where the relevant proportion of real events takes place. And you can prove why you chose exactly these ten.

Comparing production and testing reveals workflows that no one had on their radar. Some of them are harmless, others are real gaps.

The top 3 real workflows were not on the expected list in the project. The most common workflow is a blacklist check via an interface that decides whether a customer receives a contract. It is precisely this workflow that regularly causes problems in testing because the interface often does not work. It is triggered most frequently in production.

The comparison provides three types of findings:

Green: A workflow runs frequently in production and is adequately covered in testing. For very simple workflows with only two runs, two good test cases are sufficient, not three thousand.
Yellow: A frequent production workflow is hardly represented in the test. It is worth taking a closer look to see if this is correct.
Red: A workflow is one of the top 10 in production, but is not triggered once in the test. This cannot be correct and must be investigated.

If the finding is red, you send someone out specifically. Either the workflow in production is superfluous and nobody needs it anymore. Or it has to run and is missing from the test, which is a serious problem.

Without automation, the benefit is lost again

The greatest leverage of the approach lies not in the one-off analysis, but in the repetition. Anyone who only collects and aggregates data manually is dependent on one person keeping it important enough.

In the project, the analysis is currently carried out manually and regularly. This works as long as someone is running it. If this person leaves the project, the evaluation will probably also end. The manual effort is the real weak point.

Long-term benefits will only arise when the collection, aggregation and correlation of data is automated. Then a continuous improvement process takes effect: the system regularly shows anomalies and you only go where a red finding appears.

Effort and income must remain in balance

More data does not automatically mean more value. This approach is also subject to diminishing marginal utility, and the project has deliberately drawn its sensible boundaries.

The central system, around which other systems are connected, is analyzed. It would be possible to include all surrounding systems and thus achieve more complete measurability. The only question is how much effort this would cost and how much additional information would result.

The existing layout strikes a good balance between effort and yield. The aim is not to achieve maximum completeness, but to extract as much usable information as possible with as little manual effort as possible.

Soccer analysis meets software testing

Key Takeaways

Test coverage starts with the question of the denominator

What process mining makes visible

Soccer analysis as a model for test prioritization

Why workflows are a better denominator than business processes

The top 10 runs beat the complete coverage

Production data reveals blind spots in the test

Without automation, the benefit is lost again

Effort and income must remain in balance

Related Posts

Positive Leadership: What It Is—and What It Isn’t

What AI Really Does to Trust and Team Dynamics

What makes testing actually work?