Natural intelligence in software testing refers to the human ability to find bugs through curiosity, intuition and contextual knowledge that no AI system would uncover without a specific hint. Exploratory testing, checking and a newly proposed third type called digging together form a complete test strategy. Human judgment remains indispensable for the professional evaluation of AI-generated test results.
Key Takeaways
- Exploratory testing delivers bugs through human curiosity, intuition, and contextual knowledge alone, not through systematic test case generation alone.
- AI-generated unit tests often fall into the same equivalence partition and miss the truly critical input cases, making unchecked adoption a quality risk.
- Besides testing and checking, there needs to be a third category of AI activity: simply digging through trained data without any real understanding of the task.
- Junior testers or developers who never evaluate AI results themselves do not build up any expertise and permanently lose the ability to assess these results at all.
- The decisive question when using AI is not what is technically possible, but whether it makes sense to give an AI precisely this task.
A bug that no AI would have looked for
Some bugs only come to light through human curiosity. In his first project, Jonas Poller was busy exploring a technically complex piece of software. He changed several parameters and clicked back and forth between two states. At some point, a price increased by one cent without him being allowed to do so. The error was reproducible and turned out to be a major problem.
Christian Brandes later called this procedure the “gross-net flick-flick”: two toggles between which someone switches back and forth until something flips. You don’t get an idea like that from a script, but from the need to understand other people’s software.
A second example shows the same mechanics. During a test of input validations, the software intercepted everything Jonas tried for half an hour. Only when the cursor was on the far left and blinking was it suddenly possible to paste using Ctrl+V. There happened to be a comma number in the clipboard. The whole browser crashed when I pasted it. A chain of coincidences that no one would have formulated as a test case in advance.
Why exploratory testing remains human
Exploratory testing thrives on curiosity, intuition and contextual knowledge that no model can provide on its own at the moment. It is precisely this foundation that makes the difference between a planned test case and an observation that nobody has ordered.
An example from years of exploratory testing training makes this tangible. A pre-school learning laptop served as the test object. In all the sessions, a single participant asked whether the device could be operated without a mouse because his child had the laptop on his lap in the car without a place to put the mouse. This one person with this one context found many errors.
An AI could certainly spit out many of these test cases as candidates. But you’d have to take them there first. You would have to prompt them: think of other environments, other usage situations. The impetus doesn’t come automatically.
Is an AI creative or is it just simulating it?
According to both of them, current models are not creative, they just appear to be. The question of whether an AI can develop real creativity in the test design quickly leads to two deeper questions: Is the AI really creative or is it simulating it? And what is creativity anyway? The latter is of the same caliber as the question of what intelligence is.
For the test design, this philosophical question does not need to be clarified conclusively. Even if a model delivers something that feels creative, there is no reason to dispense with human curiosity and intuition. Exploratory testing falls into the category of experience-based testing, and experience cannot simply be rationalized away.
One point remains regardless of the creativity issue: An AI cannot reliably say what is exactly right. In many cases, a human knows with certainty whether an implementation or an output is correct. With AI, it remains an estimate. It can be very reliable, but in the critical area, the question is whether you want to rely on it.
Testing, checking, digging: a third category
What an AI does in test design does not fit into either the testing or the checking category. This distinction is surprisingly little known in the testing environment. When asked in the audience, hardly more than five hands went up.
The two terms can be clearly separated:
| term | basis | character |
|---|---|---|
| testing | intuition, curiosity, experience | human, explorative |
| Checking | Script | mechanical, automatable, repeatable |
| Digging | data, training, probabilities | neither one nor the other |
Christian and Jonas suggest a third term for what AI does: Digging. As things stand, the model does not understand what it is doing. It has training data and tries to use probabilities to transfer learned information to a task in the hope that it will be a hit.
This blind rummaging through test ideas from other projects is the core idea behind the term. A colleague suggested “puzzling” as an alternative. The choice of word is not fixed. “Assisting” is explicitly rejected because it sounds too competent for something that doesn’t know what it’s doing.
Generated unit tests often only test the same equivalence partition
Tests generated by an AI often look clean and yet only cover a fraction. In one example, the generated set included six unit tests that all hit the same equivalence partition. Five representatives of the same case, and of all things, the cases that an experienced tester would have thrown in immediately were completely missing.
This is exactly the trap for beginners. If you don’t know what an equivalence partition is, you can’t evaluate the output. The code looks good, the tests run green, and yet the coverage is weak. Evaluation requires technical and test methodology knowledge that cannot be borrowed from the AI.
This raises an uncomfortable point for training. If you hand everything over to the AI as a junior, you will never get to the point of building up enough knowledge to be able to assess results. There is no way around doing work without AI in order to learn.
I wouldn’t have a problem with a junior developer saying: unit testing, annoying, give me a few ideas and then I’ll get on with it. The problem starts when someone says: I don’t care, do you have unit tests? Yes, there they are.
Christian Brandes
The question of meaning comes before feasibility
Before you give an AI a task, it is worth asking whether it makes sense at all. Joseph Weizenbaum formulated early on in “The Power of Computers and the Powerlessness of Reason” that the decisive factor is not what a computer could do, but whether it makes sense for it to take on a certain task. The same question is on the table today with AI.
Acceptance criteria are a concrete example. Many tools promise to generate perfect user stories including acceptance criteria. However, acceptance criteria should express what a product owner wants to see in order to be convinced. How should an AI answer what would convince a human? And if someone can’t think of anything to accept themselves, then the real problem lies elsewhere.
The market is moving in a different direction. Vibe coding and headlines about AI-generated code are fueling the idea that entire developer roles can be rationalized away. It’s possible that the industry will have to go through a vale of tears: one trip to the wall with untested reused code, only to take two steps back again.
Keep testing, keep the human in the loop
Two guiding principles summarize the attitude. First: Don’t throw away exploratory testing. The subconscious and intuition generate test ideas that no script or model can come up with on its own. Secondly: Keep people in the process.
A process in which only one idea is sketched out and then several AI agents pass the balls to each other, one coding, one testing, one checking, involves too much risk. A human brain should come into play somewhere in the process. Do you want an AI to quality assure the results of another AI?
In practice, this means a clear division of labor. The AI can take on busywork tasks, such as generating mass test data or delivering an initial draft that you don’t start from scratch. As soon as it becomes more complex, the head is involved in order to be able to evaluate the output. Brain work remains with humans.


