4 min read

Man vs. machine: Who judges more fairly?

Man vs. machine: Who judges more fairly?

Trust in decisions does not arise in a vacuum. A study on hiring decisions compared human and algorithmic recommendations, controlled by a built-in gender bias. Participants adjusted their assessments over time, more quickly and more strongly in the case of the algorithm. Trust followed similar patterns as between people: early signals shape expectations, later observations reinforce them. For development and testing, this means more than just measuring accuracy. Ongoing bias monitoring, clean pilot phases, parallel comparisons and transparency of data sources are required. This makes it clear when recommendations are overturned and why. Ultimately, the question is how to design systems that correct errors rather than perpetuating them.

Podcast Episode: Man vs. machine: Who judges more fairly?

In this episode, I talk to Sam Goetjes about trust, bias and fairness in AI-supported decisions. The starting point is her master's thesis: an application setting with human and algorithmic recommendations, plus a manipulated gender bias. Findings: Participants adjusted their ratings over time. Even faster and more strongly in the algorithm. Trust develops according to the same patterns as with people. Powerful, isn't it? For us in testing, this means not only checking models and performance, but also monitoring bias risks, setting up pilot phases properly, running parallel comparisons and disclosing data sources. We can be influenced. So are systems. How do we keep both in check? That's what it's all about.

"And then I asked myself what happens if the AI has a bias, a very strong one, but I deal with the AI as I do with people and trust it" - Sam Goetjes

Sam Goetjes is a senior consultant for quality assurance and test management with a passion for making software not only error-free but also truly user-friendly. After studying psychology with a minor in computer science, she found her way into the tech world through quality assurance for digital health applications. Today, she advises companies at 29FORWARD GmbH on how to make testing more efficient and quality processes smarter. Her combination of technical know-how and psychological understanding helps her build bridges between development, QA, and users. Sam also develops her own apps, which keeps her close to the practical side of things.

apple spotify youtube

Highlights der Episode

  • People adapt their ratings to recommendations, with algorithms faster and stronger
  • Trust in algorithms follows similar patterns to trust in people
  • Manipulated gender bias measurably influences decisions
  • Test bias risks continuously, not just model performance
  • Disclose data sources and increase traceability

Prejudice and fairness: How AI influences our decisions

Man, machine and the power of prejudice

At German Testing Day 2025, a podcast episode focused on a topic that is often overlooked: Are we really as objective as we think we are when working with artificial intelligence (AI)? Richie, the host, met with Sam, an expert in psychology and AI, to shed light not only on the technical but also the human side of biases when dealing with algorithms.

How an AI experiment debunks our trust

Sam conducted a large-scale study focusing on the question: Do we trust algorithms as much as humans? And what happens when an AI shows bias?

The participants were asked to make recommendations on job applications as fictitious employees. Beforehand, they were sometimes shown a recommendation, either from a human (such as an HR employee) or from an algorithm. The trick: the AI had a built-in bias; it always rated people with female first names worse than male first names, even if they had the same skills.

The surprise: almost nobody noticed this error. On the contrary, many people were even quicker to accept the algorithm's biased assessment than that of humans. Sam found out that we trust algorithms as if they were real colleagues. The more often the test subjects saw the recommendations, the more they followed the (understandably incorrect) suggestion - especially when it came from the AI.

Why we are so susceptible to AI bias

During the conversation, it became clear that people often rely on the "objective" reputation of AI. Many people think that algorithms make fewer mistakes than humans. But this assumption is dangerous. Algorithms inherit the errors and biases from the data they have been trained with. If discrimination is already present in the training data - against women in professional life, for example - then the AI will also reflect this.

The difficulty is that it is often not transparent how an AI makes decisions. Users only see the input and output. This makes it more difficult to detect errors. And since we tend to stick to what we are repeatedly shown, a vicious circle is reinforced: human biases end up in the system and the system influences further human decisions. In the end, the problem gets bigger and bigger.

What does this mean for software testing?

Sam says clearly: It's not enough to just check whether an AI "works" technically or whether the data for training was good. In software testing, teams also need to investigate: Does the AI produce unfair or discriminatory results? And how quickly do people adapt to the AI's suggestions?

This means that test managers should take a multi-track approach. It is best to run old and new systems side by side for a while to make differences visible. The results of the AI must be regularly compared with those of humans. Although this costs energy and time, it is important in order to avoid blind spots.

Good monitoring is also necessary: How is the AI developing over time? Are the recommendations drifting in strange directions? If you don't pay attention to this, you run the risk of discrimination becoming firmly anchored in the system.

Trust, control, responsibility

Many believe that AI is more neutral than humans. But Sam's studies show that the opposite can be true - precisely because we don't realize how we are being influenced. Anyone who is responsible for software quality must be aware of this: AI can also make mistakes and even reinforce bad sides of us humans.

Companies should consider early on how they regularly appraise their systems and what data they use for development. After all, if teams rely solely on technical testing or dismiss the debate, the problem can quickly escalate - and suddenly a whole group of people are faced with closed doors simply because an algorithm says so.

Artificial intelligence challenges us to critically question not only technology, but also our own judgment. Testing software is about more than just program logic. It's about people, fairness and taking responsibility. Because prejudices are not a technical problem - they are a human one.

Better collaboration with model-based testing

Better collaboration with model-based testing

A new crew management system is being developed under intense time pressure. Requirements are flexible, acceptance must be spot on. Model-based...

Weiterlesen
Why is code so difficult to understand?

Why is code so difficult to understand?

Podcast Episode: Why is code so difficult to understand? In this episode of the I discuss the comprehensibility of code and the cognitive processes...

Weiterlesen
More quality in requirements with AI

More quality in requirements with AI

When requirements remain vague, testing becomes a lottery. Across all industries: pure text is rarely enough. Domains differ, but the principles...

Weiterlesen