Skip to main content

Search...

Mutation Testing

Mutation testing is over 50 years old, but hardly anyone uses it. How mutants help to find test gaps and write better code.

7 min read
Cover for Mutation Testing

Mutation testing is a method for checking the quality of a test suite: A tool automatically changes the production code using so-called mutants, for example by replacing a greater-than with greater-equal to, and then checks whether at least one test fails. If no test fails, the mutant survives, which indicates a gap in the tests.

Key Takeaways

  • Mutation testing does not check the production code, but the quality of the test suite: A deliberately introduced error in the code is considered killed as soon as at least one test fails.
  • The Java framework PIT optimizes runtimes through incremental analysis: It compares hash codes of code and tests and only re-executes mutations for places that have actually changed.
  • Mutation testing should not be applied to the entire code base, but rather to the core business logic, because UI or database accesses unnecessarily increase the effort.
  • If you use mutation testing regularly, you will write better tests and better code over time, because the knowledge about typical mutants is already incorporated when writing the code.

What is mutation testing?

Mutation testing is a process you use to test your tests. Instead of just checking whether the production code works, the method checks whether your test suite is able to find errors at all.

The principle is over 50 years old. Richard Lipton described it in a paper in 1971. The basic idea has hardly changed since then.

The process is simple. You have a test suite that is green and you assume that everything is in order. Now you deliberately introduce an error into your code and see if the test suite finds it. If it finds it, it is at least suitable for this position. If it doesn’t find it, you have a gap.

These built-in changes are not called bugs, but mutants. Hence the name mutation testing. A mutant is a targeted change to the production code.

Which mutants are used?

The type of mutant depends in part on the programming language. In the Java world, several categories can be distinguished, some of which function independently of the language.

Conditional boundaries are a common category. A ‘greater than’ becomes a ‘greater than or equal to’, a ‘less than’ becomes a ‘less than or equal to’. The boundaries of a query shift. Other mutators negate entire conditions, for example by changing ‘equal to’ to ‘not equal to’.

Increment mutators exchange plus plus for minus minus. Arithmetic mutators replace addition with subtraction or multiplication with division.

The behavior of entire methods can also be changed. A void method that does something but returns nothing is simply not called by the mutator. For methods with a return value, zero is returned instead of an object; for primitive types, a zero or an empty string is returned.

There are hardly any limits to the imagination. If you wanted to apply all conceivable mutators to an entire code base by hand, you would be busy for a long time. This is precisely why frameworks take over this work.

How does a mutation testing tool work?

A mutation testing tool creates a mutant, runs all tests and checks whether at least one test fails. If a test fails, this is a good sign: The test suite has killed the mutant.

The language around mutation testing is martial. A mutant survives or is killed. Killing the mutant is the desired result in this context.

In the Java world, PIT is a widely used framework. It can be integrated into the build process and applies the mutators to the source code. It then logs how the code behaves. You then evaluate this log.

PIT comes with a pre-selection of mutators in various expansion stages, from a basic set to a stage that applies all the mutators supplied. You can selectively switch individual mutators on and off, because not every mutator is useful for your code and more mutators mean longer runtimes.

Mutation tests take a long time if you don’t limit them

Mutation tests can run for a very long time. For each mutant created, the entire relevant test suite is executed. If you apply this uncontrolled to the entire code base, you will block your build process.

PIT is highly optimized at this point. It writes a history with hash codes of code and tests. When re-executed, the tool checks which tests are affected by a change and only executes these. If nothing has changed, the result does not change either.

This incremental approach makes the difference for integration. If each run took hours, mutation testing could hardly be integrated into a daily build. If it only runs incrementally, you have more opportunities to integrate it into ongoing builds.

In the build process, mutation testing always comes after compilation and the normal test run. The mutation analysis only follows when all tests are green.

Start small, not with the entire code base

If you introduce mutation testing, you should not immediately unleash it on the entire source code. Start with the core business logic.

Checking UI code or database access via mutation testing is overkill at first. Frameworks such as PIT allow you to limit the analysis to specific packages or individual classes. This allows you to decide where mutation testing actually delivers value and avoids excessively long runtimes.

It is also worth starting cautiously with the number of mutators. Only switch on a few at first and work your way forward while analyzing the results in detail and adapting tests or code.

What the results tell you

Mutation testing uncovers three types of vulnerabilities. It shows whether your test data is good and whether it covers the limits. It shows what you haven’t tested yet. And it brings logic problems to light.

If a mutation in the code does not cause a test to fail, you check both sides: the tests and the code. Sometimes it turns out that the tests are green, but they check the wrong logic or miss the point.

This is where the added value comes in. You not only improve your tests, but also your code.

Equivalent mutants and their pitfalls

Equivalent mutants are a well-known problem with mutation testing. These are mutants that do not really change the logic of the code. The code behaves exactly the same after the mutation as before, and accordingly no test fails.

PIT attempts to recognize and avoid equivalent mutants in advance. They cannot be completely prevented. You have to filter out these places yourself.

They can be recognized by the basic principle: A mutant has been created, but all tests remain green. If a mutant remains alive, there is always the possibility that it is an equivalent mutant.

However, depending on the code, equivalent mutants are often rare. Sometimes the code can be rewritten so that the problem disappears. Often they are also an indication that something is wrong with the code itself: If another construct produces the same result, something may have gone wrong with the implementation.

The tool becomes a trainer for better code

The effect of mutation testing shifts with increasing use. Initially, it mainly provides new test ideas. The more often you use it, the stronger the learning effect.

After a while, you know the mutants that are used and react to them when writing the tests. Birgit Kratz describes how difficult it was for her to consciously create sample code for a presentation in which a mutant survives. If you know the tool well, you almost automatically write tests and code that hardly lets anything slip through.

This learning effect can be significantly greater in a team. In Birgit’s opinion, a joint session in which the findings of the mutation tests are reviewed is very useful.

It’s not only good for better testing, but also for better code. Birgit Kratz

In practice, however, this team approach meets with resistance. Many people react to the idea of testing their own tests by asking where it should stop. As a result, it is often only used by individual developers.

The two prerequisites for mutation testing

Before you can use mutation testing, you need two things: tests and green tests.

That sounds obvious, but it’s not. It is precisely these two factors that often fail in practice. Either there are too few tests in the first place, or the existing tests are not green.

Only when both conditions are met can mutation testing be used sensibly, either locally or as a stage in the build pipeline after the normal test run.

Frequently Asked Questions

For effective mutation testing, you should first regularly update and expand your test scenarios to ensure comprehensive test coverage. Use mutants with different operators to cover relevant error types. Automate mutation testing in the CI/CD process to get immediate feedback. Analyze the results thoroughly to identify gaps in the tests. Finally, it is important to prioritize the mutants to use testing resources efficiently and focus on critical parts of the application.

The best way to interpret the results of mutation testing is to analyze the mutation density and mutant survival rate. Identify which tests effectively detect mutations and which do not. Also consider the type of mutations and their impact on the code. Clear documentation of test results helps to identify patterns and improve the quality of test cases. Also consider the context of the project to prioritize improvement opportunities.

Common mistakes in mutation testing are ignoring untested mutations and using inadequate test cases. These often lead to potential vulnerabilities remaining undetected. To avoid this, you should perform a full mutation analysis and ensure that test cases are diverse to cover different scenarios. It is also helpful to carry out regular reviews and adjustments to test strategies. A continuous improvement process significantly increases the effectiveness of mutation testing.

Mutation coverage is a measure in the context of mutation testing that indicates how many of the mutants generated are recognized by existing tests. Mutants are slightly modified versions of the code that simulate errors. To achieve high mutation coverage, tests must successfully identify or kill the mutants. High coverage indicates that the tests are robust and can detect potential errors in the code. Mutation testing therefore helps to improve the quality of test cases and increase the reliability of the software.

Various tools are recommended for effective mutation testing, depending on the programming language. For Java, Pitest and Jester are widely used. In Python, MutPy is suitable, while Stryker is a good choice for JavaScript and TypeScript. For C and C++, Müller can be used. Ruby developers will benefit from Mutant. These tools help to improve test coverage by creating faulty code and checking the robustness of the tests.

Mutation testing improves test coverage by showing whether existing tests are effective by deliberately introducing errors into the code. The advantages are higher test quality and the identification of weak tests. Disadvantages are the high effort and the extended test time, as many tests have to be performed for different mutations. In addition, it can be difficult to generate meaningful mutations that simulate real errors. Overall, mutation testing offers valuable insights, but may require additional resources.

Mutation testing is a test method that improves the quality of software tests. It involves making small changes (mutations) to the code in order to check whether existing tests recognize these errors. There are two main methods: Mutation generation, in which different types of mutations (e.g. changing operators) are generated, and mutation evaluation, which tests whether the tests have detected the mutations. The aim is to increase test coverage and uncover weaknesses in the tests.

Mutation testing and fuzzy testing differ fundamentally in their approach. Mutation testing intentionally creates faulty versions (mutants) of the code to check whether existing tests detect these errors. The focus is on test coverage and test efficiency. Fuzzy testing, on the other hand, sends random or unstructured input to a program to identify unexpected behavior or crashes. While mutation testing evaluates the accuracy and robustness of the tests, fuzzy testing aims to uncover security gaps and vulnerabilities caused by unsystematic inputs.

Mutation testing and regression testing have different goals. Mutation testing evaluates the quality of tests by deliberately introducing small changes in the code to check whether the existing tests recognize these errors. Regression testing, on the other hand, checks whether new changes in the code affect existing functions. While mutation testing focuses on test coverage, regression testing focuses on ensuring the stability of the software product. Both methods are important, but serve different purposes in the software testing process.

Mutation testing and unit testing are two different approaches to improving software quality. While unit testing checks individual components of the code by ensuring that they work as expected, mutation testing tests the robustness of these tests. This involves introducing small changes to the code to check whether existing tests recognize the errors. The goal of mutation testing is to evaluate the effectiveness of the unit tests and ensure that they not only work, but also detect bugs.

Mutation testing is a test method that is used to evaluate the quality of software tests. Small changes, so-called mutations, are made to the source code in order to check whether the existing tests recognize these changes. If a test no longer works, this shows that it is effective; if all mutations cannot be recognized, the tests have weaknesses. This method helps to improve the reliability of test coverage and ensure that faulty parts of the program are detected.

Share this page

Related Posts