Trends in software testing are currently developing along two parallel fields: AI as a tool for testers and the testing of AI-based systems. Survey data shows that AI support in testing is already second only to programming. Agile process models, especially Scrum, now dominate across all industries, including in safety-critical areas such as aviation and medical technology.
Key Takeaways
- AI-supported test generation today functions exclusively as assistance: The output must be checked and revised by experts, blind reliance on generated tests is not justifiable.
- Among the testing practitioners surveyed, programming and testing already use AI the most, while requirements engineering receives significantly less AI support despite obvious use cases such as duplicate detection.
- Scrum clearly dominates as a process model according to the survey data, and even safety-critical industries such as aviation and medical technology have now approved agile procedures in regulatory terms.
- Scenario-based testing for autonomous systems fails in practice less because of the basic idea than because of two specific hurdles: the selection of representative scenarios and the modeling effort for each individual test case.
AI in software testing has two sides that need to be kept apart
The topic of AI in testing involves two different tasks that are developing in parallel and at different speeds. On the one hand, there is the testing of AI-based systems. On the other side is testing with AI, i.e. AI as a tool for the person doing the testing.
Anyone who lumps the two together is confusing maturity levels. One field is still in preparation, the other has already arrived in practice.
Tilo Linz assesses the status of the two fields from the perspective he has gained from customers, at conferences and through the Trends in Testing event series that imbus has been organizing for over ten years.
Where does the testing of AI-based systems stand?
Almost every serious software development company is currently examining which functions in its own product can be improved or enriched by AI. This applies not only to pure software products, but also to hardware-based systems that contain software: machines that are to become more intelligent through AI.
These companies are currently creating prototypes and experiments to find out where AI can actually make their own products better.
For software testing, this means that the task of testing such systems is being prepared, but not yet carried out in real life. People are making themselves smart, attending conferences and reading up on the subject. However, the systems are not yet ready for testing in production and the test is not yet armed.
Tilo Linz expects this moment to come quickly and with force in the next few years. Then the person conducting the test will be faced with the question of whether they have really found everything the system is supposed to do and all the bugs that are in it. His assessment of this is clear: there will be more bugs in such systems than in what testers get on the table today.
When testing with AI, programming takes the lead, with testing following close behind
A survey by Trends in Testing shows a clear order in which AI tools are already being used in software development. Programming is in the lead, testing follows in second place and requirements engineering in third. Project management is far behind.
It was to be expected that programming would lead the way. Common IDEs today have built-in AI assistants that can be used to adapt or question code directly.
The third place for requirements engineering is surprising. Requirements are linguistic documents with linguistic content, i.e. obvious material for analysis using language models. One explanation: the respondents come from a software testing background and may have less insight into what their colleagues in requirements engineering actually do.
The low value for project management is also irritating. Whether agile or hybrid, this is a frequent source of problems if the control system does not run smoothly. Tilo Linz still sees untapped potential here.
AI in testing today means assistance, not autopilot
In testing, AI is used to generate test data and test cases. A test case is more than just a data set: it also includes the procedure for entering the test and the expected result. Both are tested with varying degrees of maturity depending on the company.
One concrete example is the AI-based generation of security tests, which is based on the OWASP criteria catalog. You describe what the application does, transfer the current criteria catalog and have test procedures generated for a specific criterion.
The result is not a finished test. It must be revised by the responsible security testing specialist before it can actually run.
This is precisely the crucial point: these systems work as assistants. You must not rely on the AI along the lines of “if it’s green, the test has run”. The AI is a help to reach a conclusion more quickly.
An interesting dialog develops, and you have to pick out the things that help you and discard the things that make you think it’s strange. Tilo Linz
Where is test automation with AI heading?
The next logical step combines AI-supported generation of test procedures with keyword-based testing. If you already have a building block library in which individual keywords are reliably automated, you can have these building blocks rearranged into procedures again and again by an AI tool.
If this arrangement is combined with the generated test data, the fully automated generation of test procedures is within reach. This is the obvious continuation of today’s building blocks, not their replacement.
Agility is not dead, but has arrived in the routine
The claim that agility is dead cannot be confirmed in practice. Surveys by Trends in Testing from 2023 and 2024 show the opposite: agile approaches dominate, with Scrum clearly in the lead. The survey by the German Testing Board shows the same trend.
Kanban follows Scrum. V-model projects still exist, but these are mostly old projects that are no longer worth converting. If something new is started today, it is usually agile.
This has also arrived in critical industries. In aviation and medical technology, the relevant authorities now officially permit an agile approach, provided the necessary safety nets are put in place. The V-model is no longer passed, as it used to be.
Anyone who concludes that failed agile projects mean the end of agility is overlooking two things. Firstly, it is fair to ask whether more projects are failing than in the past according to waterfall or significantly fewer. If there are fewer, this speaks in favor of the agile approach. Secondly, projects often fail because agility is only introduced superficially.
Those who implement Scrum superficially and do not really use the necessary techniques and practices muddle through just as they used to in the phase-oriented approach. If the project then fails, it is not due to the model. Blaming it on the phase model or the agile approach won’t get you anywhere.
Testing in agile projects still needs the old test levels
The test levels known from the V-model remain relevant in agile work. Unit testing, integration testing and system testing are still necessary as abstraction levels, just in a different context and with a different weighting.
Tilo Linz pursues this perspective in his book “Testing in agile projects”. It looks at agility primarily from the perspective of the person testing, organized according to these test levels, and explains the associated techniques at each level.
The first edition was published around ten years ago. The current edition has had to incorporate some new material, including DevOps and the tool-driven developments of the time. So we are not throwing overboard what was good in the past, but rather reorganizing it.
Requirements engineering has the greatest untapped AI potential in the short term
If the low usage in requirements engineering is not a misperception on the part of the testers surveyed, then this is where the greatest potential lies in the short term. Requirements can be classified, evaluated and compared according to many dimensions as soon as they have been recorded.
A simple, effective use case is the recognition of duplicates. Two people formulate the same requirement slightly differently: a known source of error in development.
The tool should not decide for itself which duplicate remains. It works like a code analysis tool: a warning that two similar requirements exist, with a request to check again or merge. AI as a checker and assistant, not as a decision-maker.
What is scenario-based testing, and why are there two variants?
Scenario-based testing exists in two forms, which should not be confused. The first is the generic term from an ISTQB and GTB perspective: you write down application scenarios like a user story, map process variants A, B, C and check how the test object reacts to them. This is a generalization of the known test procedures and classifications.
The second variant is the special form used for testing autonomous vehicles. Here, the scenario is a traffic scenario, such as a modeled intersection with pedestrians, traffic lights and cars in different directions. The scenario is formally described, placed in a simulator and the vehicle’s control system has to negotiate the intersection without crashing.
This second form is actually the new scenario-based testing and is much more difficult to handle because many autonomous agents move independently of each other.
The two challenges of scenario-based testing of autonomous vehicles
The first challenge is selection. Which scenario is worth modeling? If a vehicle manages one intersection, this does not mean that it will manage all intersections. The question of whether a scenario is relevant and representative corresponds to an abstract form of equivalence class analysis. Various formalisms and standardization initiatives are working on this.
The second challenge is the modeling effort. A scenario must be detailed enough to fulfill the purpose of the test and expose the vehicle to exactly the aspects that are to be tested. At the same time, the effort must not get out of hand so that modeling, review and new versions remain feasible with reasonable effort.
Both of these factors make scenario-based testing challenging. This is why it is not yet used as much as originally hoped. Work is being done on recipes, languages and simulators that make the process possible with less effort.
The process will be relevant beyond cars. It affects every autonomous agent that moves independently, from small robots to delivery drones.


