Why AI cannot do cause-and-effect

Quality Function Deployment (QFD) is a matrix-based method that directly links customer benefits with software functionalities and test cases. It delivers what LLMs cannot: Cause-effect analyses. Those who trace tests back to measurable customer benefits need fewer tests and make better prioritization decisions.

Key Takeaways

LLMs cannot perform cause-and-effect analysis because their neural network architecture structurally prevents exactly that, which makes hallucinations unavoidable.
Quality Function Deployment links customer benefits directly to specific functionalities and tests via a matrix so that irrelevant test cases can be identified and deleted.
Those who trace tests back to customer benefits need significantly fewer tests because priorities are clearly derived from the cause-and-effect relationship between function and customer requirements.
The main obstacle to QFD in software development is the matrix size: thousands of user stories versus thousands of test stories can only be processed in a meaningful way since the computational breakthroughs of 2014.

Why LLMs do not perform cause-and-effect analysis

Large language models recognize patterns, but they cannot explain why they come to a result. This weakness is in the architecture, not in the details of a single implementation.

Neural networks have existed as a concept for around 80 years. They are strong at recognizing things and weak at logic and explaining relationships. Humans work in a similar way: we say something from our gut and only have to think about why we came up with it afterwards.

This is precisely the problem with the term explainable AI. A system based on a neural network cannot easily reveal how it arrived at an answer. Hallucinations are therefore not an avoidable bug, but a consequence of the design.

Thomas Fehlmann classifies this as a fundamental limit, not as a question of the maturity of the next model. If you need causal traceability, you have to procure it additionally.

Quality Function Deployment calculates with matrices

Quality Function Deployment, or QFD for short, is a method that uses matrices to examine cross-connections between requirements and functions. The aim is to achieve the greatest benefit with the least effort.

Anyone familiar with AI will recognize the principle. In both cases, matrices are used to map relationships between many variables. QFD uses measurable contributions and asks: What are the factors that determine whether a functionality supports a customer benefit?

A simple example is the coffee machine. If you want a strong Italian coffee, you need the right setting option and the corresponding function in the machine. If the machine only delivers weak water, the customer will be dissatisfied and will no longer buy from the manufacturer.

QFD originated in Japan and came to Germany via individuals. Volkswagen and Skoda have used the method extensively, for example to find out that a car needs a place for the driver’s handbag. BMW uses similar methods, but calls them something else. Much remains hidden under the heading of product secrecy.

Reconciling causality and LLM

The practical appeal lies in combining the probability logic of LLM with the cause-and-effect logic of QFD. Theoretically, this is possible because both approaches work with the same mathematical procedures.

In this picture, data are not just numbers and transactions. It represents knowledge that is moved back and forth between objects or modules. These data flows can be used to introduce cause and effect in a measurable, traceable way.

A reasoning model provides information on the causal path to take in order to achieve a result. If causality checks were properly incorporated, hallucinations would be eliminated. Only then would it be possible to certify an AI for the automotive industry, for example, which is not possible with today’s systems.

The technical hurdle is real. It has only been possible to practically resolve large, sparse matrices since around 2014. It is precisely this computing capability that is in LLMs today, not in QFD. It could be redirected.

Customer benefit as a benchmark saves testing

If you align tests with customer benefits, you need far fewer of them. The customer benefit runs through the elements of the software and provides a clear benchmark for which test cases are really relevant.

Tests are expensive, even with current AI support. At the same time, test density hardly pays off on the market. A well-tested car is no more expensive to sell than a poorly tested one. Theoretically, test density would be a distinguishing feature in competition, but in practice nobody does this, often due to time pressure for the next release.

Prioritization based on customer benefit changes the focus of testing. If you repeatedly align test cases with the benefits, they become more focused over time. You separate good test cases from those that contribute little to the message.

Security is excluded from this. Security and privacy are taken for granted, without compromise. The customer never explicitly formulates this requirement, but he can expect it. Anyone who gets into a car assumes that it will drive and brake reliably.

Personalized testing instead of mass testing

Tests can be tailored to individual users. Not every function of a software release is relevant for every driver; many are never used.

If the machine knows which functions you actually use, it can test precisely these. A new release could be run at home in the garage using a test series that only covers the functions that are important to you.

This does not fit in with the classic idea of mass production, but with an Industry 4.0 that produces individually. The scale shifts from general coverage to personal relevance.

Why QFD is rarely used despite customer focus

QFD is unpopular in the software environment, although customer benefits are propagated everywhere. There are two reasons for this.

The first is secondary, but effective. QFD requires functional models, and functional models are hated by developers. Time and again, managers try to derive a pay scale from function points, according to the motto: more functions, more money. This is not how software development works.

The second reason is the size of the matrices. Even a comparison of 20 user stories with 25 test stories pushes the method to its limits. The reality is different: 1,000 user stories and 5,000 test stories no longer fit on such a matrix.

If I take customer benefits as a reference for which tests are really relevant for the customer, I have significantly fewer tests that I need to do.
Thomas Fehlmann

The calculation capability for large matrices has been available since 2014. It is used today for LLMs, not for Quality Function Deployment. Thomas can only find academic research on QFD in Aachen and Stuttgart, for example in the construction of small electric cars. Funding remains an open problem there too.

Transfer functions: Measuring causes that cannot be seen

A transfer function shows how an effect arises from a cause. It is the common tool behind Six Sigma, behind software functions and behind learning systems.

The everyday example is music on a cell phone. A transfer function converts the digital information from an MP3 or MP4 file into acoustic data. The same logic applies on a cosmic level: exoplanets cannot be measured directly, only their effect. If you want to detect them, you need to know the laws of gravity, i.e. the law behind the effect.

The same applies to software. You need to know the laws according to which functionality is created in the product. In Six Sigma projects, transfer functions are used to minimize variation in production.

With learning systems, the situation is uncomfortable. Whether in training, in the original data set or in reinforcement learning: you don’t know exactly what is behind it. You have to find out the causes and then hopefully you can observe the effect.

Customer benefits can be extracted, not guessed

Customers say what excites them and what doesn’t, but the useful information is in a cause-and-effect analysis. Net Promoter Score provides the signal, the evaluation must clarify why the rating is as it is.

This analysis is the basis for the next feature list. Over the years, two small companies that started with a handful of people have grown into larger companies, one in the field of color quality and one in customer communication. In both cases, QFD was used to build the next most important thing.

The effort required for the causal explanation is laborious, because it is easier to rave about customer benefits. But thinking it through forces you to see what matters, and sometimes your own mistakes. What seemed easy is often not.

Why AI cannot do cause-and-effect - and QFD helps

Key Takeaways

Why LLMs do not perform cause-and-effect analysis

Quality Function Deployment calculates with matrices

Reconciling causality and LLM

Customer benefit as a benchmark saves testing

Personalized testing instead of mass testing

Why QFD is rarely used despite customer focus

Transfer functions: Measuring causes that cannot be seen

Customer benefits can be extracted, not guessed

Related Posts

Positive Leadership: What It Is—and What It Isn’t

What AI Really Does to Trust and Team Dynamics

What makes testing actually work?