Critical Thinking in Software Testing
AI writes the code, AI writes the tests — but who checks if any of it is actually right? Critical thinking is the skill testers cannot afford to outsource.

Critical thinking in AI-assisted testing means actively questioning the outputs that AI tools produce rather than accepting them at face value. AI tools carry biases tied to their reference data, and testers must verify sources, check for gaps, and apply their own analytical judgment. The tester’s core role stays the same: uncover information that lets someone make an informed decision.
Key Takeaways
- AI is a tool for collaboration, not delegation: accepting its output without verifying the source data means the tester has not done the job.
- Testers who offload analytical work to AI risk losing the ability to judge whether an output is correct, and that loss is hard to recover once it compounds over years.
- AI tools carry biases that are not always visible, and without independent verification those biases can produce outputs that systematically disadvantage users outside a narrow demographic bracket.
- The core tester responsibility stays constant regardless of who or what wrote the code: determine whether the software delivers what was expected for the end user.
AI is a collaborator, not a command line
The shift to AI changes how testers work with tools, from instructing to collaborating. Older test automation meant sitting down and writing code in C# or with Selenium, telling the machine exactly what to do. AI tools now write that code for you and offer opinions on what you ask.
That change demands a mental shift. You no longer dictate every step. You ask for input, get a draft, and decide what to do with it. Steve Watson calls this the biggest shift in testing’s recent history, different in kind from the steady evolution of earlier tools.
The shift is exciting, but it does not come naturally to everyone. Adapting to a tool that talks back, that suggests and summarizes, asks more of testers than learning yet another framework.
Why blindly trusting AI output is a tester’s blind spot
AI output is only as good as the information it draws from, and you rarely see that source. The summaries and opinions these tools produce look polished and complete. The data behind them stays hidden.
That gap is where bias creeps in. Tools built on data skewed toward one group will quietly disadvantage everyone outside it. If a system reflects mostly middle-aged, Western users from a narrow socioeconomic bracket, people outside those brackets get worse outcomes, and no one notices unless someone digs.
A tester’s job has always been to uncover the information that lets someone make an informed decision. That work matters more now, not less. When the quality of AI output is questionable, you call it out, you ask where the source data came from, and you verify independently.
There is a clear line worth holding. If you hand AI your own text and ask it to restructure an email on a Friday afternoon, you supplied the reference data, and the risk is low. If you ask AI for an opinion on something where you provided no source, you have a duty to understand where its answer comes from.
Why would you just take something and it tells you as being truthful if you haven’t double checked it yourself? — Steve Watson
Testers already have the skills, they just need repurposing
Critical thinking is not a new skill testers must learn from scratch, it is an existing one they need to point at AI. Testers have always asked what they expect to see, looked back over requirements, and questioned anything that does not make sense.
You do not have to be the end user to judge whether something looks right. Steve works for an airline, and his point is simple: you do not need to be a pilot to apply common sense to what a pilot would see on an iPad. Those questioning abilities transfer directly to AI output.
Picture handing an AI a requirements document and asking for a full test approach. If you take the result, nod, and move on, you have not done your job. The right move is to treat it as a starting point and ask what data it referenced, what it might have missed, what is not covered.
Adam Bacon, a colleague of Steve’s, frames it well: treat AI like a knowledgeable team member, the person you go to for advice. When a teammate tells you something useful, you take most of it and still do your own due diligence. AI deserves the same handling. It is a data source and a reference point, not a verdict.
The real risk is losing the skill through disuse
The danger is not that AI takes jobs, it is that testers stop using the judgment they have until they can no longer use it. Humans take shortcuts. Steve uses the image of a paved path with a worn track cut across the grass corner, the route people take because it is easier.
Right now the balance still favors human judgment, because testers have spent their careers using their brains rather than offloading to a machine. Fast forward five or ten years. If all anyone does is ask AI to do the work, and people no longer know what a correct outcome looks like, the balance tips.
Once those skills erode, getting them back is the hard problem. The cleaner answer is to not lose them in the first place. When AI hands you a polished draft, treat it as the bulk of the work done, then ask what is missing and what you still need to add.
Where AI saves real time, and where it does not
AI pays off most when you control the data set and the task is to find patterns inside it. Steve wanted to learn how people across his organization understood the terms testing and quality engineering. He collected answers from two user groups, then had AI find common themes and differences within each group and compare the two.
The result was three sets of analysis and a presentable format, produced in roughly a couple of hours instead of the days he expected. Because he knew exactly which data the tool was working from, he did not worry about missing external sources. He only had to confirm nothing was overlooked within his own data.
The same approach does not always hold. Working through about 200 requirements to flag ambiguities, Steve handed the task to a tool with a few pointers when the day was getting late. The tool flagged things he was already comfortable with and missed things he was not.
That experience points to a judgment call. Retraining a tool to match what you already know how to spot can cost more than just doing it yourself. Part of the work now is learning what AI is suited for and what it is not, and that comes through trial and error.
| Task | AI fit | Why |
|---|---|---|
| Summarizing a known data set for themes and differences | Strong | You control the source, so missing external data is not a concern |
| Flagging ambiguities across many requirements | Mixed | It can miss what matters and surface what you already accept |
| Restructuring text you supplied yourself | Strong | The reference data is yours, the risk is low |
| Forming an opinion from sources you did not provide | Weak without checks | You cannot verify the basis for the answer |
Code quality still comes down to a human decision
When AI writes both the code and the tests for it, the validation question stays human. The pattern mirrors what already happens: developers write code, testers write code to check it. What changes is who, or what, does the writing.
Whether code came from a person or a tool matters less than whether it delivers what was expected. There is always a user at the end of the chain. Focus on that goal, then ask how you find out if the output matches the intent.
Tools will give you many answers about code quality. The open question is what those answers leave out. Code quality does not stop being an issue just because AI produced the code, so the tester’s focus holds steady even as the source shifts.
The next generation needs the skepticism, not just the tools
Testers who learned the craft before AI carry a responsibility to pass on the questioning instinct. People entering school and college now will treat AI as part of daily life and will pick up the tools faster than anyone who came before them.
That speed creates a gap. Newcomers will know how to use AI but will not have seen how the work was done without it, so they will not inherently carry the skepticism that earlier testers built up. The risk is someone arriving and assuming AI will do everything, with no instinct to question the result.
Experienced testers face a steeper relearning curve, because they have to do familiar work in unfamiliar ways. The next generation skips that, which is exactly why the questioning mindset has to be taught deliberately rather than left to develop on its own.
Invest the freed-up time in analytical skill, not more coding
If AI takes over some of the coding testers do today, that time should go into analytical ability. Steve points to a long-standing imbalance: testers spend most of their training, perhaps 80 or 90 percent, learning to write code. Writing code does not make you a better tester.
The role has never been only about writing tests, running them, and flagging bugs. It has always been broader, and it grows broader still when requirements themselves might be AI-produced. You question them the same way regardless of who or what wrote them.
The task stays the same, the method changes. Testers already use AI tools whether they acknowledge it or not, since the tools sit embedded in everyday software and phones. Staying relevant means showing you keep pace with that change while putting more time, effort, and training into the analytical skills that no tool replaces.
Related Posts

Richard Seidl
•Jun 4, 2026
Why COBOL Developers Prefer Writing Tests in Java

Richard Seidl
•May 28, 2026