Exploratory Ensemble Testing

Exploratory ensemble testing refers to a method in which a group tests software together in front of a computer: without predefined test cases, but with clearly rotating roles (driver, navigator, ensemble) in fixed time intervals of eight to ten minutes. The starting point is a charter, a rough idea of the test area. Each experiment informs the next.

Key Takeaways

Ensemble testing only works if everyone involved consistently fulfills their roles: The driver doesn’t make decisions, the navigator doesn’t touch the computer, and the timer forces the rotation every 8 to 10 minutes.
Late stakeholder feedback shortly before the release date was the concrete trigger for the introduction of exploratory ensemble testing, not methodological idealism.
The stated goal of a session is not to find bugs, but to learn something about the system using a charter; bugs are a by-product, not a benchmark.
Notes from the session are deliberately kept neutral and only evaluated one or two days later with product and project management in order to keep the “bug or no bug” discussion out of the test flow.
Developers implicitly build up test knowledge through repeated sessions, recognizable by the fact that they predict edge cases that they have already tried out themselves.

What is exploratory ensemble testing?

Exploratory ensemble testing means that a group of people sit together in front of a computer and test software in clearly distributed roles, without predefined test cases. Instead of ready-made scripts, there is a charter, a rough idea of what is to be tested. The next experiments emerge from the first.

The term ensemble testing replaces the older term mob testing. The change came when it became clear that “mob” and the proximity to “bullying” were not a good choice of words. In terms of content, ensemble describes the matter more precisely: a group working together on a task.

The method combines two practices. The role model comes from collaborative work on code, the open approach from exploratory testing. The two intertwine because the group decides together what to try next.

How does the distribution of roles in the ensemble work?

There are three fixed roles in the ensemble: Driver, Navigator and the observing ensemble. This separation is at the heart of the method.

The driver sits at the computer and operates it. However, he does not make any decisions. The navigator decides what to do, but does not touch the keyboard. If both remain silent, nothing happens. Only when they talk to each other does a test step occur. This forced communication is intentional.

The rest of the group observes and contributes ideas. Nevertheless, the navigator alone decides on implementation. Anyone who absolutely wants to implement an idea must wait until they themselves are the navigator. Then it is done.

The roles rotate in a fixed time pattern, in practice every eight to ten minutes. This means that everyone gets one turn in each position. A complete session usually lasts around one and a half hours, of which at least one hour is purely for testing, with the rest for intro and wrap-up.

Why it’s worth starting with a clearly defined problem

Ensemble testing needs a specific reason, otherwise it won’t start. The trigger for Tobias Geyer was a recurring pattern: problems emerged in late test phases, feedback from internal stakeholders regularly arrived too late to be included in the current release.

The pace of releases increased the pressure. With two to four releases a year, it’s not possible to push back at short notice. Late feedback often means months of delay. This is exactly what frustrated many in the team.

The method was not introduced as a mandatory process, but was proposed as an experiment. This framing helped. Instead of imposing a new requirement, the question was whether it would solve the problem. The feedback was positive and it has been running regularly ever since.

If you want to introduce ensemble testing, this approach will help: name a real pain point, call it an experiment, invite people to take part and let them decide for themselves whether it is worthwhile.

A four-week rhythm keeps the method alive

The frequency determines whether ensemble testing becomes a routine or a chore. A four-week rhythm, linked to the sprint, has proven its worth.

The interval has two effects. Firstly, at least one sprint has passed, so there is a new feature or at least a new perspective, such as a familiar feature with a different persona. Secondly, the session retains a certain novelty character because it does not take place all the time.

The hour and a half is also a conscious break from everyday business. This is perceived as a value, not a burden. A sign of this: canceled sessions result in discomfort, not relief.

A sensible group size is around five people per ensemble as a starting point. If there are ten or more people, the group is split up so that rotation and collaboration can work. These numbers are not a law, but an experience that you should adapt to your context.

What mistakes are made in ensemble testing?

The spectrum ranges from simple click problems to exceptions and usability weaknesses. In practice, usability was a surprisingly large proportion of the findings.

What becomes particularly visible is what small-scale, feature-driven development overlooks. When a charter maps a larger user workflow through the entire product and those involved actually launch the software instead of relying on unit tests, the breaking points between the features become apparent.

An example illustrates this. One team had been testing an input field for formatted text, including shifted lines and deleted passages. Everything worked. Then a product manager acting as navigator suggested moving the text using drag-and-drop. Everyone thought that was impossible. It worked, and it promptly threw an exception.

Such discoveries arise because people with different points of view sit together in the ensemble. What one person considers unthinkable, another simply tries out.

The real gain is the test knowledge in the development team

Ensemble Testing not only finds errors, it also builds up testing expertise among developers. This is the lasting effect that extends beyond the individual session.

In the sessions, participants learn about test ideas and test methodology. They develop a feeling for what needs to be tested beyond the happy path. They take this knowledge with them into the next sprint, whether explicitly or implicitly.

One particular moment illustrates this learning process: the ensemble believes that they have found an edge case that breaks down the software straight away. The person who built the feature is sitting next to it and says that they have already tried it out themselves. This is exactly when test thinking enters the development process.

For many, the method also breaks down a barrier: testing can be fun. Those who previously regarded exploratory testing as a nuisance now experience it as a shared, lively activity in the ensemble.

Why evaluation and testing should be kept strictly separate

During the session, there is no discussion about whether something is a bug. This separation keeps the flow going and prevents tough arguments.

In testing, an attitude from improv applies: “Yes, and”. There are no bad suggestions, ideas are taken up in order to keep things moving. Notes are made as neutrally as possible. An exception or a crash is clear, everything else is initially only recorded, not evaluated.

The evaluation follows one or two days later in a small group with product and project management. Only then is it decided what is a real bug, what can be accepted and what is perhaps just a misunderstood product that needs better documentation.

It is also important to set a clear goal. It is not about breaking the software or finding as many bugs as possible. Errors occur anyway because all software has them. The aim is to learn something about the system from the charter.

What are the pitfalls of ensemble testing?

Two problems affect almost every team: taking notes and the discipline of the roles.

Taking notes is underestimated. In the zeal of testing, the group easily forgets to document what they have done and found. At the end of the session, it is then difficult to reconstruct what they have found. When you are in the flow, you don’t think about the minutes, which is precisely why you need a conscious note-taking role or reminder.

The role discipline slips away especially when new people join or an ensemble starts fresh. One pattern is the creative circumvention of the rule. Someone says to the navigator: “Tell the driver to click button A, then enter XY and select feature Z.” Formally correct, but missing the point. The only thing that helps here is a friendly repetition of the rules of the game.

The timer is also a stumbling block. It’s easy to miss it in the middle of an action and suddenly ten minutes have gone by. The rotation only disciplines if it is adhered to.

External stakeholders need their own framework

Bringing internal customers into the ensemble provides valuable feedback before the real customer, but can tip the dynamic. They tend to become dominant.

The solution is to split into two session types. One runs with the development team, the other involves internal stakeholders, supplemented by individual development representatives. This creates an exchange of knowledge without one group taking control.

Participation remains the open construction site. The development team is significantly larger than the group that regularly participates. Some never joined, others dropped out over time. Bringing more people back is a declared goal.

Ultimately, it’s a meeting invitation, you might need a meeting room. An hour and a half flies under any management radar. Just do it and be surprised by how good people think it is.
Tobias Geyer