DEFOSPAM is an acronym for structured appraisal of requirements that bundles seven testing dimensions: Definitions, Features, Outcomes, Scenarios, Predictions, Ambiguity and Missing Elements. The aim is to identify gaps, contradictions and ambiguities in requirements documents at an early stage. AI can serve as an idea generator, but cannot replace human appraisal.
Key Takeaways
- The acronym DEFOSPAM stands for definitions, functions, results, scenarios, predictions, ambiguity and missing parts, and is used to systematically scrutinize requirements.
- Unclear definitions such as “customer” or “order” lead to developers and testers understanding the same document differently, which makes errors only visible late in the project.
- Requirements act as an oracle: if you cannot derive a prediction from a requirement as to what the system should do with a certain input, you can neither test nor develop.
- AI is suitable as an idea generator for scenario-result pairs that are overlooked in the team, but is not a reliable substitute for human appraisal because it hallucinates and misinterprets questions.
- Using AI as a coach rather than a mentor means not relying on complete answers, but using its questions and suggestions as food for thought, which you then evaluate yourself.
Checking requirements means asking the right questions
Requirements can be systematically scrutinized if you don’t just read them, but hold them up against a fixed list of questions. This is precisely the idea behind DEFOSPAM, an acronym that Paul Gerrard developed over ten years ago and recorded in a slim book together with Susan Windsor.
The name came about by chance. Paul sketched out the method in an hour, looked at the initial letters and got stuck on a word that nobody forgets. The book never sold, hardly anyone used it. Nevertheless, the approach has a logic that is useful for every tester and every developer.
The background is classic: requirements often read smoothly and seem complete, but they are not. Examples only ever partially describe software. You would need an infinite number of examples to fully specify a system and an infinite number of tests to fully evaluate it. Specification based on examples is therefore only a partial solution.
What do the letters in DEFOSPAM mean?
DEFOSPAM stands for a series of test questions that can be used to examine requirements from different angles. Each letter marks a different search direction.
- D for definitions: Clarify terms such as customer, product, order or error before continuing.
- F for functions and features: What can the system do and how can it be bundled into functions?
- O as in outcomes: What does the system actually do when you feed it?
- S as in scenarios: Which situations lead to which results?
- P as in Prediction: What behavior can be derived from the requirement?
- A for ambiguity: Where is the language or logic ambiguous?
- M as in Missing: What is missing in terms of scenarios, data or rules?
The letters S and O belong closely together. A result without an associated scenario hardly makes sense, nor does a scenario without a result. Scenario-result pairs along the lines of: if this, then that make sense.
Why definitions are at the beginning
Definitions are the starting point because most misunderstandings are caused by unclear terms. People read a requirements document, pick out ideas and never question the words.
Over the years we have seen this: The conversations that arise when you really get to the bottom of a definition are often the most valuable. What exactly is a customer? What is an order? What is an error? The same fuzziness is familiar to testing itself. Everyone carries their favorite definition that is not shared in the industry.
Agreeing on a set of common definitions sounds boring and time-consuming. And it is. But then your message gets through and you understand the other person’s message.
The requirement is your oracle
The prediction is the point at which a requirement fulfills its actual purpose: It serves as an oracle. If you enter A, B and C, the system should do D, E and F. If you make this prediction and read it from the document, the requirement functions as a source of knowledge.
Both sides use the same oracle. The tester derives expected results from it. The developer deduces what he has to build.
A perfect oracle does not exist. It would have to be a huge document formulated in mathematical logic. In practice, we therefore work with compromises and trade-offs.
This is exactly where the testing attitude becomes productive. You can imagine a reasonable result, but you can’t find it in the requirement. For example, a document describes order processing with all the rules, but says nothing about invalid values. The requirement only knows “rejected”. Being rejected is one thing, but under certain circumstances a value might be acceptable and under others not. If these rules are not in the document, neither the tester knows what to test nor the developer knows what to build.
By acting like a tester and looking critically at the requirements, you help everyone: the developers, the users and yourself. Paul Gerrard
Ambiguity hides between the pages
Ambiguity occurs on several levels, and the most dangerous is not in the individual word. At the linguistic level, the case is still obvious. The word “customer” can mean new customer or regular customer, private or corporate customer, high-value or low-value. Without restriction, the term remains vague.
The structural contradictions throughout the document are more difficult. Two very different scenarios lead to the same result, one on page 23, the other on page 38. No one has seen the connection because the passages are far apart.
The reverse case is just as tricky. The same scenario produces different results depending on where you look in the document. On page 23 the system behaves one way, on page 43 it behaves differently. Both cannot be correct. Finding such contradictions requires keeping the entire document in mind at the same time, a real effort. The time spent can be worthwhile.
The missing part is the hardest to see
Missing parts are the most difficult category because it is hard to imagine what is not there. A certain combination of inputs doesn’t appear anywhere. A data field is missing, such as the user’s age, which would actually be relevant in a situation.
Requirements can never cover every eventuality. Otherwise the document would grow immeasurably. That’s why it’s not about completeness, but about prioritization: the most important missing things remain the most important, and that’s exactly what you need to focus on.
This is where the value of an external nudge lies. You know from experience that a certain situation needs to be dealt with. But no experience is comprehensive, complete and omniscient. A wider range of examples covers more than the memory of a single team.
AI as a coach for better requirements
AI can be a useful assistant when working through this list of questions, without replacing the work. Paul is currently testing this in experiments and is building a prototype to see what works.
Initial observations are concrete. If you give the AI a text, it identifies named entities, i.e. nouns, actions and activities, quite reliably. It suggests useful definitions, perhaps not perfect, but enough to get you started. It generates scenario-result pairs and assigns them to each other.
A good-looking requirement provides surprisingly little material. It assumes a lot of implicit information that a human would add, but which the AI does not know. A handful of pairs quickly becomes more on demand. In an e-commerce application, the model knows that there are stock limits, price calculations and discounts, and suggests suitable scenarios. Five examples become eight more without much effort, and most of them make sense.
The system can also be queried directly: for linguistic ambiguities, for inconsistent scenario pairs, for missing situations or data. It provides a list of what it recognizes.
Trustworthy no, helpful yes
When checking requirements, AI is a less than perfect assistant that should not be trusted blindly. The ideas are not guaranteed to be sound. You need to appraisal them, keep an eye on what the model is doing and use any good input.
The strength lies in memory, not in thinking. The model remembers everything, calculates reliably and, based on the training material, knows more cases than any individual team member. It is not the best member of the team, it just has the better memory. It can hardly think, debate or really question.
Its mistakes are similar to human ones. It exaggerates, hallucinates, misunderstands one question and answers another. This requires supervision.
The more productive role is that of coach, not mentor. A mentor should know everything about everything and pull you out of the mess. A coach asks good questions, makes suggestions and gets you to think more deeply. In this role, the tool achieves perhaps 80 percent of what a team would achieve on their own with more effort. This gets you further, but doesn’t solve the whole problem.
The same applies to generated code. Very few people trust it blindly, but it can deliver a large part. If it complies with your instructions and you know what it does, you save time. You still need to tester or at least appraisal it.


