Multidimensional risk-based testing

Risk-based testing means prioritizing test cases based on their risk value so that the most important tests are executed first in the event of time pressure. A multidimensional approach evaluates five levels separately: technical logic, test history, release scope, tester assignment and code changes. The individual values for each level are combined into an aggregated risk score.

Key Takeaways

Risk-based testing is practiced intuitively in many projects, but without structured risk metrics, the decision on which test cases to include in the release remains a pure gut feeling.
A level model with five perspectives, including business logic, test history, release scope, tester assignment and code changes, provides an aggregated risk score per test case instead of a single rough high-medium-low value.
Fibonacci numbers as a rating scale ensure that a single high risk value dominates the overall score and does not artificially compensate for low values at other levels.
Test case complexity, measured by preconditions and number of steps in relation to other test cases in the project, flows directly into the probability of occurrence of a defect.
A risk cut-off can be used to determine before the test run which test cases will be considered at all, so that in the event of time pressure, the cases that are omitted at the back are demonstrably at the lowest risk.

What multidimensional risk-based testing means

Multidimensional risk-based testing evaluates the risk of a test case from several perspectives simultaneously instead of reducing it to a single value. The classic classification into high, medium and low remains one-dimensional. It says that a risk is high, but not from which direction it comes.

Richard Hönig at MGM has developed a level model for risk analysis that breaks down this abbreviation. The basic idea is that a complex world is difficult to map on a single scale. If you only assess risks globally, you lose the information about why a test case is critical.

Instead, the model looks at each test case through several separate lenses and only brings the results together at the end. What does the technicality say? What does the test history say? What is important for the current release? An aggregated risk score per test case is created from these individual assessments.

Why pure gut feeling decisions are the real problem

Risk-based testing happens in many projects without being called that. Teams intuitively decide which test cases are relevant for a release and which are not. They usually make this decision based on gut feeling, without metrics and without scale.

This is precisely where the weakness lies. The gut feeling of experienced stakeholders is valuable, but it is not comprehensible and not reproducible. As soon as the project grows or people change, there is no common basis for prioritization.

The solution is not to replace the gut feeling, but to underpin it with data. At the beginning of a project, when hardly any data is available, the experience of developers, architects and testers remains the best source. With each test run, data is added that makes this assessment more precise.

The five levels of risk analysis

Richard’s model works with five perspectives that evaluate each test case separately. Each level provides its own value, which is later incorporated into the overall score.

Level	What it evaluates
Technical logic	Probability of occurrence and potential impact of a defect, including test case complexity
Test history	Failure rate of past executions, resulting defect tickets and their priority
Release scope	Link to requirement tickets that play a role in the current release
Tester assignment	Know-how and availability of the person executing the test case
Code change	Which code areas a test case is linked to and what has changed there

The technical logic is based on what risk-based testing has always known: probability of occurrence times impact. What is new is that the test case complexity is included in the probability of occurrence as a measurable factor.

Tester allocation is a level that many models overlook. Those who are involved in several projects at the same time or have little knowledge of a subject area are at a higher risk. This has nothing to do with the test case itself, but it does influence the result.

The code change closes a typical gap. A tester reads a requirement and clearly understands what needs to be tested. Nevertheless, errors occur in places that nobody had on the screen: a refactoring in the background, an updated library. Such changes affect the risk without being visible in the requirement.

How the complexity of a test case matters

The complexity of a test case increases its probability of error because more steps mean more places where something can go wrong. The model looks at how extensive the preconditions are and how many test steps a test case contains.

The decisive factor is the classification in relation to the other test cases in the project. A test case with 60 steps is not absolutely complex, but complex in comparison to the small test cases next to it. If all test cases consist of such clunkers, the value loses its meaningfulness.

This relative evaluation prevents a single powerful test case from automatically dominating everything. The business case, which runs from front to back, touches on a thousand requirements and was often unstable, does not fall through the cracks simply because it is large. It is put into perspective.

Why the Fibonacci series provides the right scale

The model evaluates risks with Fibonacci numbers because they grow exponentially on their own and thus allow high risks to have a greater impact on the overall score. The rough three-way division into low, medium and high offered too little nuance and too little flexibility.

The division follows the number of digits. The single-digit Fibonacci numbers from 1 to 8, i.e. 1, 2, 3, 5 and 8, rate low risks. Two-digit values stand for medium risk, three-digit values for high risk.

The effect can be seen when aggregating. If a test case has a three-digit value at one level and a 5 or 8 at the other levels, the high value does not simply balance out the low values. The mean value remains high because the exponential spread inherently carries through the high risk.

If I have a value of 610, for example, it naturally compensates for the low risk of 5 or 8 at other levels, but you still end up with a very high number.
Richard Hönig

Risk analysis must not generate additional work

A risk analysis is only useful if it draws on data that is already available. As soon as it requires additional documentation, the added value turns into an additional burden.

The necessary information comes together in MGM’s test management tool: linked requirements, linked error tickets, previous test runs. For projects that have their documentation halfway under control, linking is standard and does not cost testers any extra time. The backend automatically calculates the risk values from this data.

No system makes every assessment correctly. This is why the automatic calculation can be overridden for each test case. If you see the risk differently to the algorithm, you can specify the value or individual parameters yourself.

How the risk score controls test prioritization

The aggregated risk score becomes a control tool for the test run, not just a pretty overview. Managers can see the risk distribution in the project at a glance. For testers, the practical benefits are greater.

A cut-off can be set when compiling a test run: Only test cases above a certain risk value are included, the rest are initially left out. Within the run, the test cases are sorted by risk so that the most critical are at the top.

This pays off when time is short. If you test from top to bottom, you always cover the highest risks first. If something falls behind at the end, it is the test cases with the lowest risk: annoying, but manageable.

Industry context and weighting as the next step

Different projects and industries weight the risk levels differently, and a rigid model does not do justice to this. One team considers complexity to be the decisive factor, another the technicality. A freely adjustable weighting of the levels is still in the backlog at MGM.

The temporal context can also shift the risk. In motor insurance, new contracts have to be concluded at the end of the year and the run on forms begins. An error in the same area weighs more heavily in January than in the fall. This kind of seasonality is included in the model.

Further levels such as security are foreseeable. But the real maturity step goes further: providing test managers with the framework they need to define their own levels. They know best where their risks lie, and a tool provider cannot anticipate this for every project.