Software testing by and with AI 

 18 February 2021

Like it?

Artificial intelligence (AI) and software testing are two important topics in today's software and system development. The application with or on top of each other holds the opportunity for enormous synergies.

Although artificial intelligence has been in research for decades, it has been on a very media-savvy triumphant march in recent years. All tasks seem solvable, all human intelligence unnecessary, the possible consequences controllable. There are very many convincing demonstrations. One example is AI-controlled computer players beating the world's best GO player Lee Sedol in the form of Alpha GO. GO, by the way, is significantly more complex than chess, so the gameplay here is infinitely more difficult to predict. Other AI applications can unerringly recognize the contents of images. This opens up a wide variety of applications, from early diagnosis of dangerous diseases to surveillance of public spaces. But is it that simple?


Of course, the results presented seem very convincing. But is the path chosen by the AI to achieve the result always really intuitive? Research has shown that some images, for example, were not classified as a horse image based on the actual horses depicted. Instead, on the basis of the piece of forest also present in many horse images. Others were classified by the signature of the photographer (who often takes pictures of horses). Thus the miracle of AI was disenchanted by some prematurely lauded examples. Memories of the horse "Kluger Hans" were awakened, which could only count for appearances.


In addition, failures such as the accident of an UBER vehicle were exploited by the media, so that autonomous vehicles soon had to be considered a danger. It is easy to overlook the fact that in Brandenburg alone, an average of 2 to 3 people are killed in traffic accidents every week. Here, even a not quite perfect AI could definitely offer advantages. But behind this there are also other questions. Accordingly, this technology is sometimes exaggerated and sometimes condemned before the connections are clear.

So for better or worse, I see an exaggeration here. Despite all the hype, AI has a lot of potential, including in safety-critical applications. The prerequisite for this is, of course, that this technology can be well secured.

In this context, one can ask oneself a number of questions. At the following, I will mention some of them concerning different subjects and thus offer an introduction. Concerning deeper questions, science is busy with questions around this subject since decades.

Evaluation of the AI

First, statistics plays a major role here and is used for internal evaluation of situations, images, etc. Using the confusion matrix, prediction and reality can be evaluated for binary classifiers: What is predicted correctly? Where and how does the AI err?

There are various means for evaluating these results. For example, the harmonic mean of accuracy and sensitivity, also called F1 score. In any case, it is clear that the importance attached to the result varies depending on the domain. For example, misdiagnosis on the existence of a tumor (not actually present) is not so bad. However, the non-recognition of a (actually existing) tumour for the life expectancy of the patient is very much so.

Test know-how

Furthermore, a question naturally arises for the experienced tester. Which of the on-board tools from the area of quality assurance that he has been familiar with for many years are applicable here?

  • Are white-box testing procedures useful at all, or is this more akin to the still controversial attempts to determine human intelligence?
  • Does it make sense to divide the test into different test stages as we know it from the V-model? For complex systems that hide one or more AI-based algorithms inside, this makes perfect sense. Does it also make sense for machine learning with a lot of intermediate layers? This leads in the direction of implementation explainability.
  • What do we actually look for in the test? Is it just a matter of the algorithm producing better results than its predecessor, or do we subdivide a little more precisely? By functional tests and non-functional tests? What about IT security? Even minimal changes to the design of road signs can have an impact. If an autonomous vehicle interprets "30" on a kmH sign as "80" and wants to drive through town with appropriate momentum. Equally disastrous can be the effects of inconsistent situations such as the stop sign on the highway.
  • Furthermore, the question arises as to when the self-learning system is actually allowed to learn? Permanently in use? If so, a commuter's self-driving vehicle could very soon be trained on the peculiarities of the daily route. The rest will be "forgotten." Or should AI rather only be allowed to learn in service or development? What are limitations depending on the application domain?

AI for the test

On the other hand, another thought naturally tempts us testers. Namely, to use the unlimited possibilities of artificial intelligence for software testing itself. There are also interesting developments in this area. One area of application that stands out is performance testing. Depending on the input data, the AI can detect abnormalities in the system behavior and the system load. These observations could be used to guide the system closer and closer to the load limit or beyond.

Finding similarities and commonalities can be applied in many other fields. For error messages, test specifications, log files of the test object, the generation of test data based on data format descriptions or test sequences based on code analysis. Another exciting topic is the use of an AI as a test oracle. Here, another question arises. Can't an AI that serves as a test oracle also be used as a system to be tested at the same time? And can it even do better than the original? In addition, the question of limits also arises immediately: What decisions can and do we want to leave to an AI? Some people are reminded of the trolley problem. This is already unsolvable for humans, or at least usually difficult to justify. If a fatal accident is unavoidable and one can still influence the outcome, who may live and who must die?

These and other thoughts are an introduction to this highly interesting topic. It is just economically very interesting will and still have many exciting years ahead.

The article was published in the 01/2020 issue of German Testing Magazin.

Do you like this post? Share it: