Generative AI in software testing refers to the targeted use of language models to check test cases, uncover edge cases, generate boilerplate code and perform code reviews in dialog. It brings the greatest benefits when requirements are fed in precisely. Blindly generated unit tests with high code coverage can hide errors in the logic instead of revealing them.
Key Takeaways
- AI-generated unit tests based on existing code do not uncover bugs, they cement them because the model treats the faulty code as truth.
- Generative AI for test automation works best as an entry-level tool: it lowers the inhibition threshold for testers without deep coding experience by adopting boilerplate code and glue code.
- Code completion tools such as Codium know the repository context and suggest code based on their own page objects and helper classes, not on external training data.
- Documentation that is only generated for an AI query and is not read by a human produces a growing layer of unchecked data garbage with no added value.
- The greatest hoped-for contribution of GenAI to software development lies not in speed gains, but in the sustainable reduction of technical debt through better code decisions.
Generative AI has arrived in testing and is here to stay
Generative AI has established itself in software testing and will shape the coming years. Companies that do without it will be slower than the competition. Matthias Zax, test automation engineer at Raiffeisenbank International and engineering coach for several teams, shares this view.
The spectrum ranges from test case design and the generation of feature files in BDD style to support with the actual code. The technology has hardly any fixed limitations. But it does have limitations that you need to be aware of before you use it productively.
It is important to distinguish between sensible use and blind trust. A language model provides output, but validation remains with humans. “When I ask about risks, risks come up. But that doesn’t mean they are risks,” is how Matthias sums up this point.
How AI helps traditional testers get started with automation
The biggest leverage lies where testers lack coding skills. Many come from manual or exploratory testing and are suddenly expected to automate. Since test automation is software development, this leap is difficult for them.
The technical landscape does not make it any easier to get started. CI/CD pipelines, version control with Git, collaboration via pull requests: the overhead is high before a single test case is even run.
This is where AI comes in to provide support. It helps with glue code and boilerplate code, i.e. the framework that brings a locally executable script to life. Matthias describes the model as a constantly available sparring partner that you don’t have to call and that still gives you direct feedback.
Check test cases before you automate them
A well-specified test case can be checked by a language model before automation. The key question: Can this test case be automated at all, is test data missing, are environments missing?
AI is also suitable as a consistency check for the test data itself. In practice, often only the good cases are described, the variant in which the sun is shining and the birds are singing. Edge cases and negative testing are often missing.
There is a real benefit here, if you ask specifically. Simply copying in a user story and asking for feedback only brings generic answers. Instead, ask specifically: Are all edge cases covered? Can the test data be improved? Which acceptance criteria are still open? This accuracy pays off for an entire feature with several acceptance criteria.
Confidentiality data does not belong in an open language model
Copying user data into a public AI is not an option in the financial sector. If you work in a regulated environment, you need a solution where the prompts do not leave the company.
At Raiffeisenbank International, the language model is internalized. The model is bought in, but all prompts remain within the company. This means that real data can also be used, at least to some extent, without it being leaked to the outside world.
For code work, there are code completion tools that can be hosted internally. Then confidentiality is no longer an issue anyway, but only a cost factor. Matthias expects to see a return on investment.
Generated code that compiles is not yet correct
Compilable code is not proof of working code. It can run and still not do what you intended. Those who are not confident programmers easily overlook this gap and consider a successful compilation to be a finished result.
The quality of the models has improved significantly. The early copilots were poor. Matthias even suspects that he was slower with the old versions than without them, because he constantly had to remove or refactor incorrect code suggestions.
Today, the tools also suggest solutions that the experienced developer didn’t even know about. This makes AI a learning tool in its own right, because it always provides an explanation for the code. A new library, an unknown approach: you read the explanation and take something away with you.
In practice, refactoring is usually necessary. The generated code is a starting point, not an end product.
Why generating unit tests automatically is the wrong way to go
Tipping functions into a language model and generating 100 percent code coverage is one of the worst applications ever. The model achieves coverage. But if there is a bug in the software, the generated test covers exactly this bug instead of finding it.
The reason lies in the direction. A test derived from the existing code does not validate whether the code is correct. It only cements what is already there, faulty or not.
Generating tests from the user story is slightly better, but again, an error in the story goes straight into the tests. Matthias prefers the opposite approach: writing the tests himself and having the business code generated.
This results in a vision of the future that takes the principle of test-driven development a step further:
My idea would be that I write my tests in Playwright and the application builds itself behind the scenes. I can only change my application via the tests.
This would take testing to a new level because there would be significantly more testing than today and the application would be generated to a large extent. Look and feel, responsiveness and the concrete behavior of an application make this difficult to imagine. But it is not impossible.
Code Completion knows your context today
Modern code completion tools suggest code based on your own repository, not on external data from the web. The AI knows your page objects, your keywords, your helper classes.
This noticeably changes the way it works. The suggestions come from your project, so you don’t have to rewrite functions. Matthias uses less of a chat interface and more of the completion tools directly in the IDE, such as Codium for open source work. He describes this as the next level of code completion, which was previously known from tools such as IntelliJ.
One anti-pattern is thus migrating from the repositories: excessive commenting. With the older Copilots, it was necessary to write long comments so that the appropriate code followed. The good coding practices of the past still apply. Code should be written in such a way that it requires few comments and only comment the parts that really need it. Otherwise it becomes unreadable.
Generated documentation that nobody reads is an anti-pattern
Generating documentation that nobody reads does not solve a problem, but creates a new one. The pattern: someone copies ten user stories into a model and uses them to write a fifteen-page test strategy, hallucinated gaps included.
The result is a soup of generated text that nobody reads, but which is now searchable. When companies then bring their wikis and repositories into the context of a language model in order to search through them, the AI ends up searching through its own unread output. A self-contained system without substance.
There are useful variants. Readme files are easy to generate because the AI can derive what belongs in them from the Git repository. You then refactor the design. Matthias uses this himself.
Even in regulated industries, this does not justify a 200-page test manual. The regulator does not prescribe 200 pages, but requires that you document how you test. Clear templates with the most important information are sufficient. Architecture as a picture, clearly marked, what is tested and what is not, what is external. What nobody reads helps nobody.
Get a better grip on technical debt
For Matthias, the greatest promise of AI lies in dealing with technical debt. Applications are supposed to be long-lasting, but often can no longer be developed further because nobody knows how they were built and upgrades become impossible.
Legacy often arises at a specific point. Something becomes legacy as soon as the developers who programmed it leave the company. Then nobody can do it anymore.
Static code analysis has helped to make problems visible in recent years. The reaction to this, setting up your own refactoring sprints, is better than nothing, but not a good state of affairs. Nevertheless, the technical debt continues to grow.
The hope is that AI will make engineering work better instead of just speeding it up. The wrong direction would be to downsize teams because they are faster with AI tools. The productive direction: less clone code, no classes with 2000 lines, documented decisions where they count, scrutinized architecture and a modern structure for modern applications, whether microservices or a clean modular monolith.
Understanding regular expressions without memorizing them
One specific everyday benefit deserves its own mention: regular expressions. You enter a regex and ask what it means, or you have it generated from a description.
It’s these little things that reduce daily friction. You no longer have to worry about syntax that you rarely need and never quite remember. Translating something into human language and generating it from human language is the part that makes everyday work noticeably easier.


