ChatGPT for testing

ChatGPT can be used as a productivity tool in software testing: It derives test cases from requirements or user manuals, generates test data, creates scripts in languages such as PowerShell or Gherkin and provides test ideas for exploratory testing. It is crucial to formulate prompts precisely and to evaluate the results expertly, as the model also produces erroneous output.

Key Takeaways

ChatGPT is suitable as a creativity booster for exploratory testing: It provides test ideas that you would not have thought of yourself, such as SEO optimization as a test topic for a website.
ChatGPT derives test cases, test topics and executable scripts in seconds from requirements, user manuals or simple descriptions, including predefined output formats and table structures.
Anyone using ChatGPT needs the expertise to recognize nonsense: The tool sometimes translates acronyms completely incorrectly and hallucinates content without making this recognizable.
Data privacy and governance are the decisive bottleneck for productive project use: in some companies, use is already prohibited because it is unclear who owns the data fed in and the results generated.
Prompting techniques such as putting people in the role of experts or presenting several perspectives against each other noticeably improve the results if the first attempt is not sufficient.

ChatGPT as a tool for testers: getting started with curiosity instead of theory

ChatGPT is best approached for testing tasks from the user side, not from the development of neural networks. If you want to understand the tool, you don’t need data science knowledge. What you need is the willingness to try it out and critically evaluate the results.

Klaudia Dussa-Zieger and Michael Heller have approached the topic by asking three questions: What is roughly in the background, how does the model behave, and what can it be used for in concrete terms. ChatGPT is a language model. It generates the next text module via probabilities and many layers. Knowing these mechanics helps with classification, but is no substitute for experimentation.

A useful image to characterize it comes from a test by Bayerischer Rundfunk: ChatGPT passed the Bavarian Abitur with grades in the 3.5 to 4.0 range. So it’s not a top student, but a useful one. This classification is important because it calibrates expectations.

Why ChatGPT is a language model for testers on multiple levels

ChatGPT not only speaks natural language, but also programming languages and formats. Understanding this opens up many more applications than writing poetry or travel plans.

The model generates Python as well as Cucumber code or Gherkin templates. It returns test cases in predefined table formats with pre- and postconditions, if the prompt so requires. You can control the output form very precisely, from the column structure to the level of detail.

Even emojis are a language in this sense. In an exploratory test strategy, a smiling symbol can add a layer of scoring that would otherwise be manual work. The effort is only worthwhile in certain contexts, but the possibility shows how broadly the term language is to be understood here.

The practical benefit lies in the speed. ChatGPT writes down a series of test cases in one to three minutes that would take longer just to type.

What ChatGPT can be used for in everyday testing

ChatGPT covers a wide range of test tasks, from deriving test cases to preparing the test environment. The following applications have proven themselves in practice:

Deriving test cases from requirements An extensive collection of test cases and test ideas is quickly created from a requirements specification.
**Test cases without requirements ** Test cases can also be derived from a user manual. This works surprisingly well.
**Test specifications to executable scripts ** The chain extends from concept to specification to executable code.
**A first basic framework for a test plan is quickly created.
**If you need a hundred personal data records with certain properties, such as place of residence and marital status, you get them in seconds.
**Auxiliary scripts for the environment: PowerShell, Docker scripts and similar small tools can be generated, even in languages that the tester himself does not speak.

Regular expressions are an example of the speed gain. If you are not familiar with them, you can have an existing expression explained or create a new one from a description. A recurring hurdle disappears.

The entry threshold also shifts when learning foreign programming languages. If you have a good feel for what should work technically, you can leave the actual implementation in PowerShell or a container script to ChatGPT without having to learn the language beforehand.

Exploratory testing benefits the most

ChatGPT’s greatest strength in testing lies in exploratory testing, where it acts as a creativity booster. Exploratory testing is not about whether a single assumption is correct, but whether the search for critical points was creative enough.

ChatGPT expands your own thinking to include ideas that you would not have come up with on your own. An example: when testing topics for a website, the model suggests SEO optimization, an aspect that easily slips through the cracks in the classic division into functional and non-functional.

The model is also familiar with the literature on test tours. If you put it specifically in a corner, such as a FedEx tour on a gaming headset, it derives test ideas from this context. It doesn’t think, but it encourages creativity.

If you formulate the topic abstractly enough, you can also avoid data protection problems. Generating a general list of ideas for a problem is not critical as long as no project-specific content is included.

The weakness: ChatGPT also tells false stories convincingly

ChatGPT sometimes delivers nonsense in the same confident tone as correct answers. The key skill in using the tool is therefore to distinguish the right from the wrong.

One example is technical abbreviations. In a technology stack, the model freely invented an acronym and resolved LGTM as “Looks good to me”. In the specific context, this was harmless, but it shows the pattern: the answer sounds plausible, but it is not.

This leads to a clear way of working. You don’t take the results without checking them, but look at them with an open eye. You think about how you ask, what you ask and how you have to categorize the answer. The first throw is a template, not a final product.

I think you can use it really well, but you have to be able to look at it again and then really sort it out. Klaudia Dussa-Zieger

How to proceed with prompting if the first attempt doesn’t work

Sensible work with ChatGPT is time-limited and curiosity-driven. You try as long as it is faster or better than the manual work, and stop if it doesn’t get any better.

If the first prompt does not produce a suitable result, a few tried and tested techniques can help. You can put the model into an operating mode and give it a role. One variant is to have several expert roles compete against each other and approach the result from different directions.

The behavior of the model is emergent. No one can predict what will come out of a particular formulation. This is precisely why experimentation is not a gimmick, but the method to build up predictive accuracy: Over time, you develop a sense of whether it’s worth trying.

The measure of efficiency is simple. If the tool predominantly delivers faster or better results in the cases in which you use it, then it was worth using it. A single failure does not outweigh this.

Governance determines whether ChatGPT is successful in the project

The biggest obstacle to productive use is not the technology, but the governance. Where is the data stored, who owns the results, who is allowed to see them? As long as these questions remain unanswered, the use of real project data remains tricky.

Feeding in project-specific data in order to obtain customized results is therefore a sensitive step. The approach so far has been a generic one, from the general case to the specific website, without revealing internal content.

In practice, companies handle this very differently. Some ban ChatGPT completely. As soon as someone enters external code for troubleshooting, at least one gray area is reached, because the model speaks all programming languages well enough to be tempting.

The first companies are beginning to set up their own environments in which ChatGPT can also be used with internal data. As soon as this is widespread, the next question arises: whether consistent prompting can be set up for larger, related tasks. Once governance has been resolved, this will be a real efficiency boost.

ChatGPT has opened up the market, but is not the whole of AI

Low accessibility explains ChatGPT’s success, but obscures the view of other AI solutions that have long made sense in testing. ChatGPT triggered the hype because it can be used without any technical hurdles.

For testers, there is AI beyond the chatbot. Object recognition for test automation is a separate process that must reliably recognize an object. Such solutions existed before ChatGPT and they made sense before that too.

One hope is therefore that the attention opened up by ChatGPT will also have an impact on these areas. The comparison of different models, such as ChatGPT versus Bard, is a first step away from the fixation on a single tool.

If you use the paid version, you get a particularly useful ability with code execution in chat. An instruction such as “create an MP3 file with two sine tones for a stereo headphone test” results directly in the finished file. This proximity of normal language to programming makes AI a leveler: the effort required for tasks that you do not master yourself is significantly reduced.

How to get started: don’t be shy, not necessarily with the test

The best way to get started is to simply get going and don’t be afraid to try things out. You don’t have to start with test tasks. Birthday poems or other harmless tasks are enough to get a feel for the tool.

What is important about this feeling is the tension between ease and quality of results. You experience how quickly an initial result is achieved and how much reworking is needed to refine it. Only then is it worth moving on to more serious tasks.

Specifically, you start with an account. A free OpenAI account avoids the additional prompting that other interfaces entail. Then try out what you enjoy most and then read a short research paper on a few prompting techniques. If you keep at it, you won’t lose touch with the technology.