GenAI in test automation
AI-generated test code that compiles but tests the wrong thing: Why writing unit testers yourself and having business code generated instead is often the smarter approach.

Generative AI in software testing refers to the targeted use of language models to check test cases, uncover edge cases, generate boilerplate code and perform code reviews in dialog. It brings the greatest benefits when requirements are fed in precisely. Blindly generated unit tests with high code coverage can hide errors in the logic instead of revealing them.
Key Takeaways
- AI-generated unit tests based on existing code do not uncover bugs, they cement them because the model treats the faulty code as truth.
- Generative AI for test automation works best as an entry-level tool: it lowers the inhibition threshold for testers without deep coding experience by adopting boilerplate code and glue code.
- Code completion tools such as Codium know the repository context and suggest code based on their own page objects and helper classes, not on external training data.
- Documentation that is only generated for an AI query and is not read by a human produces a growing layer of unchecked data garbage with no added value.
- The greatest hoped-for contribution of GenAI to software development lies not in speed gains, but in the sustainable reduction of technical debt through better code decisions.
Generative AI has arrived in testing and is here to stay
Generative AI has established itself in software testing and will shape the coming years. Companies that do without it will be slower than the competition. Matthias Zax, test automation engineer at Raiffeisenbank International and engineering coach for several teams, shares this view.
The spectrum ranges from test case design and the generation of feature files in BDD style to support with the actual code. The technology has hardly any fixed limitations. But it does have limitations that you need to be aware of before you use it productively.
It is important to distinguish between sensible use and blind trust. A language model provides output, but validation remains with humans. “When I ask about risks, risks come up. But that doesn’t mean they are risks,” is how Matthias sums up this point.
How AI helps traditional testers get started with automation
The biggest leverage lies where testers lack coding skills. Many come from manual or exploratory testing and are suddenly expected to automate. Since test automation is software development, this leap is difficult for them.
The technical landscape does not make it any easier to get started. CI/CD pipelines, version control with Git, collaboration via pull requests: the overhead is high before a single test case is even run.
This is where AI comes in to provide support. It helps with glue code and boilerplate code, i.e. the framework that brings a locally executable script to life. Matthias describes the model as a constantly available sparring partner that you don’t have to call and that still gives you direct feedback.
Check test cases before you automate them
A well-specified test case can be checked by a language model before automation. The key question: Can this test case be automated at all, is test data missing, are environments missing?
AI is also suitable as a consistency check for the test data itself. In practice, often only the good cases are described, the variant in which the sun is shining and the birds are singing. Edge cases and negative testing are often missing.
There is a real benefit here, if you ask specifically. Simply copying in a user story and asking for feedback only brings generic answers. Instead, ask specifically: Are all edge cases covered? Can the test data be improved? Which acceptance criteria are still open? This accuracy pays off for an entire feature with several acceptance criteria.
Confidentiality data does not belong in an open language model
Copying user data into a public AI is not an option in the financial sector. If you work in a regulated environment, you need a solution where the prompts do not leave the company.
At Raiffeisenbank International, the language model is internalized. The model is bought in, but all prompts remain within the company. This means that real data can also be used, at least to some extent, without it being leaked to the outside world.
For code work, there are code completion tools that can be hosted internally. Then confidentiality is no longer an issue anyway, but only a cost factor. Matthias expects to see a return on investment.
Generated code that compiles is not yet correct
Compilable code is not proof of working code. It can run and still not do what you intended. Those who are not confident programmers easily overlook this gap and consider a successful compilation to be a finished result.
The quality of the models has improved significantly. The early copilots were poor. Matthias even suspects that he was slower with the old versions than without them, because he constantly had to remove or refactor incorrect code suggestions.
Today, the tools also suggest solutions that the experienced developer didn’t even know about. This makes AI a learning tool in its own right, because it always provides an explanation for the code. A new library, an unknown approach: you read the explanation and take something away with you.
In practice, refactoring is usually necessary. The generated code is a starting point, not an end product.
Why generating unit tests automatically is the wrong way to go
Tipping functions into a language model and generating 100 percent code coverage is one of the worst applications ever. The model achieves coverage. But if there is a bug in the software, the generated test covers exactly this bug instead of finding it.
The reason lies in the direction. A test derived from the existing code does not validate whether the code is correct. It only cements what is already there, faulty or not.
Generating tests from the user story is slightly better, but again, an error in the story goes straight into the tests. Matthias prefers the opposite approach: writing the tests himself and having the business code generated.
This results in a vision of the future that takes the principle of test-driven development a step further:
My idea would be that I write my tests in Playwright and the application builds itself behind the scenes. I can only change my application via the tests.
This would take testing to a new level because there would be significantly more testing than today and the application would be generated to a large extent. Look and feel, responsiveness and the concrete behavior of an application make this difficult to imagine. But it is not impossible.
Code Completion knows your context today
Modern code completion tools suggest code based on your own repository, not on external data from the web. The AI knows your page objects, your keywords, your helper classes.
This noticeably changes the way it works. The suggestions come from your project, so you don’t have to rewrite functions. Matthias uses less of a chat interface and more of the completion tools directly in the IDE, such as Codium for open source work. He describes this as the next level of code completion, which was previously known from tools such as IntelliJ.
One anti-pattern is thus migrating from the repositories: excessive commenting. With the older Copilots, it was necessary to write long comments so that the appropriate code followed. The good coding practices of the past still apply. Code should be written in such a way that it requires few comments and only comment the parts that really need it. Otherwise it becomes unreadable.
Generated documentation that nobody reads is an anti-pattern
Generating documentation that nobody reads does not solve a problem, but creates a new one. The pattern: someone copies ten user stories into a model and uses them to write a fifteen-page test strategy, hallucinated gaps included.
The result is a soup of generated text that nobody reads, but which is now searchable. When companies then bring their wikis and repositories into the context of a language model in order to search through them, the AI ends up searching through its own unread output. A self-contained system without substance.
There are useful variants. Readme files are easy to generate because the AI can derive what belongs in them from the Git repository. You then refactor the design. Matthias uses this himself.
Even in regulated industries, this does not justify a 200-page test manual. The regulator does not prescribe 200 pages, but requires that you document how you test. Clear templates with the most important information are sufficient. Architecture as a picture, clearly marked, what is tested and what is not, what is external. What nobody reads helps nobody.
Get a better grip on technical debt
For Matthias, the greatest promise of AI lies in dealing with technical debt. Applications are supposed to be long-lasting, but often can no longer be developed further because nobody knows how they were built and upgrades become impossible.
Legacy often arises at a specific point. Something becomes legacy as soon as the developers who programmed it leave the company. Then nobody can do it anymore.
Static code analysis has helped to make problems visible in recent years. The reaction to this, setting up your own refactoring sprints, is better than nothing, but not a good state of affairs. Nevertheless, the technical debt continues to grow.
The hope is that AI will make engineering work better instead of just speeding it up. The wrong direction would be to downsize teams because they are faster with AI tools. The productive direction: less clone code, no classes with 2000 lines, documented decisions where they count, scrutinized architecture and a modern structure for modern applications, whether microservices or a clean modular monolith.
Understanding regular expressions without memorizing them
One specific everyday benefit deserves its own mention: regular expressions. You enter a regex and ask what it means, or you have it generated from a description.
It’s these little things that reduce daily friction. You no longer have to worry about syntax that you rarely need and never quite remember. Translating something into human language and generating it from human language is the part that makes everyday work noticeably easier.
Frequently Asked Questions
Generative AI test automation significantly improves the quality and efficiency of software tests. It enables the automatic creation of test cases based on the application code and thus increases coverage. Intelligent error analysis reduces the time required for debugging. The generation of test scenarios in real time supports agile development processes, making it possible to react more quickly to change requests. Overall, generative AI leads to faster test cycles and higher software quality.
Test automation tools that use generative AI technologies include Testim.io, mabl and Functionize. These platforms use AI to create, optimize and automatically adapt tests. Generative AI test automation improves efficiency by reducing repetitive tasks and suggesting intelligent test strategies. In addition, they often offer error analysis and reporting functions that promote the quality and speed of software development.
Several specific challenges arise when implementing generative AI in test automation. Firstly, AI requires a comprehensive database in order to generate effective test scenarios. Secondly, the algorithms must be regularly adapted in order to adequately test new software versions. Thirdly, there is a risk that generated tests are inaccurate or irrelevant, which can lead to erroneous results. Finally, integration into existing test frameworks is often technically complex and can require additional resources.
Generative AI test automation can significantly optimize regression tests by automatically creating test cases based on changes in the code. It analyzes the application and identifies relevant functions that need to be tested. It can also generate test data and adapt existing tests to ensure better coverage. This saves development teams time and resources while increasing test quality. Companies can react more quickly to changes and improve the stability of their software.
Generative AI test automation improves the process by converting natural language commands into testable scripts. Natural Language Processing (NLP) allows testers to formulate requirements in clear language, which are then automatically translated into test scenarios. This reduces the effort required for manual test definitions and increases efficiency while promoting accessibility for less technical team members. This makes the entire testing process faster and more flexible.
Generative AI test automation significantly improves error analysis and reporting. It can automatically generate test data, create test scenarios and recognize patterns in errors. By analyzing test results, the AI identifies common problems and suggests solutions. It also creates structured reports that are easy to understand and provide important insights quickly. This process saves time and increases the efficiency of troubleshooting, which ultimately improves the quality of software products.
In generative AI test automation, models such as GPT (Generative Pre-trained Transformer) and BERT (Bidirectional Encoder Representations from Transformers) are the most common. They improve the test process by automatically generating test cases, analyzing error reports and optimizing regression tests. This increases efficiency, reduces manual effort and enables faster releases. They also help to expand test coverage and ensure the quality of the software.
Generative AI test automation can significantly improve manual tests by automatically creating and adapting test scenarios. It analyzes code changes and user behavior to generate targeted test cases. This increases test coverage and shortens test duration. It also allows developers to focus on more complex tasks while repetitive tests are automated. This technology enables errors to be identified more quickly and increases software quality.
Generative AI test automation is revolutionizing software development by automating the creation and maintenance of test cases. It speeds up the testing process, improves test coverage and reduces human error. Through intelligent analysis, it can automatically generate relevant tests based on the latest changes in the code. It also enables faster identification of problems and reduces the cost of testing. Overall, generative AI significantly increases the efficiency and quality of software development.
Generative AI plays a crucial role in test automation by automatically generating and adapting test scenarios. This significantly reduces the effort required to write test cases. It can also use machine learning to recognize patterns in software changes and make recommendations for tests, thereby improving test coverage. Generative AI test automation increases efficiency, speeds up the testing process and minimizes human error, resulting in faster and more reliable software deployments.
Related Posts

Richard Seidl
•Jun 2, 2026
Patient agility: Is agile working dying?

Richard Seidl
•May 26, 2026