Man vs. machine: Who judges more fairly?

AI bias refers to the systematic distortion in decisions made by artificial intelligence that arises when training data reflects social inequalities. Humans hardly recognize such biases because they trust AI recommendations in a similar way to human judgements. Particularly problematic: in the case of algorithmic recommendations, people adopt an existing bias even faster and more strongly than with human suggestions.

Key Takeaways

Algorithms amplify their own biases more than human recommendations: Study participants adopted a gender bias faster and more consistently when it came from an automated system.
Trust in algorithms follows the same psychological patterns as trust in humans, which means that AI misjudgments are accepted in the same way as those of an experienced colleague.
Gender bias in recommendation systems was not noticeable to most participants over 16 consecutive application decisions, although female first names were consistently rated lower at the same skill level.
Without continuous monitoring with comparative data, a vicious circle is created: people confirm the biased AI judgments, the AI is further trained with these judgments and the bias is reinforced.

AI recommendations are not neutral just because they come from a machine

People trust the recommendations of an algorithm more than those of a human, even if both make the same mistake. This effect is demonstrated by an online study by Sam Goetjes with over 330 working participants who were asked to evaluate job applications for a recruitment decision.

The structure was simple: the participants gave a recommendation from 0 to 100 percent for each application. They then saw a supposed expert recommendation, either from an HR employee or from an automated decision-making system. They were then allowed to adjust their own assessment. This was done 16 times in succession.

This is precisely the crux of the matter. If the biased recommendation came from an algorithm, the participants adopted the bias faster and more strongly than if a human had made it. The widespread assumption that people are initially skeptical of a machine did not stand up to the test.

Why people treat an algorithm like a colleague

People build trust with an algorithm based on the same factors as with a person. This is the first reliable result of the study and it is the basis for everything that follows.

Sam Goetjes tested a trust model that was originally developed for organizations and interpersonal relationships against interaction with an algorithm. The model worked. Not only just significantly, but to a similar extent as with humans.

This observation is consistent with the so-called computer-as-social-actor hypothesis: people treat an algorithm or AI socially like a counterpart and build trust accordingly. In practice, this means that the strategies you use to trust a new colleague also apply to a decision-making system. With all the associated weaknesses.

The built-in bias is simply not noticeable in everyday life

Most people don’t notice when a system makes systematically biased judgments. In the study, first names read by females were consistently rated worse than those read by males at the same level of competence. The vast majority of participants did not notice this pattern across 16 applications.

The comparability was clearly structured. Eight female and eight male first names, read in pairs at the same level of competence, validated by a preceding evaluation study. Annika had the same competencies as Thomas, only the gender of the name differed.

There was also a time effect. From the third or fourth application onwards, the participants came closer and closer to the specified recommendation. Anyone who knows a task and slips into an automated process checks less and is more likely to think: maybe the system is right. The more routine the task becomes, the greater the ability to influence it.

The vicious circle of human and machine bias

The real risk arises when human and machine bias confirm each other. You bring your own prejudices to the table. If an automated system happens to rate a person the same way, you feel confirmed, even though the lower rating has nothing to do with competence.

Sam Goetjes describes this openly using her own example: she was already aware of a bias when looking at a photo of an application, even though she had studied the subject. Knowing about bias does not automatically protect you from it.

This self-affirmation becomes a closed circle. The biased assessment flows back, no corrected data enters the system and the AI does not improve because no one sees the error. Gender bias does not affect a marginalized group, but half of the population.

It starts with one group of people, perhaps with one characteristic. But when that increases, it’s not a marginalized group, it’s half the population that is disadvantaged. Sam Goetjes

How a biased application tool is created in the first place

An AI looks for correlations, not fairness. If an initial application assessment is outsourced to a system, it checks what distinguishes successful from less successful employees and derives predictions from this.

Some of these correlations seem plausible, such as experience in the field as an indication of later performance. Others are pure correlations without any factual reference. The example from the study: if 75 percent of a successful group of people enjoy playing soccer by chance, the system can link soccer with job performance.

Such distortions depend heavily on the training data. The widespread expectation that an AI is per se more objective than a human is therefore not tenable. It is only as good as the data it has been trained with.

Anyone introducing an AI decision-making system needs comparative data instead of assuming that errors will already be noticed. As the study shows, this is precisely what they do not do. The most effective countermeasure is a structure that keeps the system continuously testable.

A practical approach for test managers:

Parallel operation in the pilot phase: Run the old process, manual or otherwise, alongside the new AI system for a period of time and compare the results over time.
Evaluation via multiple sources: Do not refer to the training data alone, but cross-check with other tools to have evidence of quality.
Monitoring over time: Especially when a system continues to develop and becomes a black box, regularly check whether it is drifting in directions that nobody wants.

The point here is that comparative data is better than comparing people, because people themselves can be influenced. A provider cannot understand in detail what happens in the box anyway. But they can understand what data was used for training and they can measure where the system is heading in the long term.

Awareness is the lever that quality assurance needs

The first step against biased AI decisions is not to consider them neutral. Software is quickly seen as an objective, factual tool. People lean back and expect that it will work. This relaxation is the problem.

This shifts the emphasis for testing and quality management. Data quality in training and the question of whether the AI works remain important, but are known. There is also a second level: how do people react to the recommendations and do they adopt biases without realizing it?

Behind the economic interest, which is the top priority for companies, there are real people who are already being disadvantaged by decisions today. If you start early, invest some effort and consciously make the system observable, you can prevent bias from increasing unnoticed. The effort does not have to be great. Awareness is the beginning.

Man vs. machine: Who judges more fairly?

Key Takeaways

AI recommendations are not neutral just because they come from a machine

Why people treat an algorithm like a colleague

The built-in bias is simply not noticeable in everyday life

The vicious circle of human and machine bias

How a biased application tool is created in the first place

Piloting and monitoring beat blind trust

Awareness is the lever that quality assurance needs

Related Posts

Positive Leadership: What It Is—and What It Isn’t

What AI Really Does to Trust and Team Dynamics

What makes testing actually work?