System testing for kitchen appliances in large-scale catering means fully automated testing of heating, motor control and bus systems on real appliances. A Python framework controls hardware such as relay modules and power supplies. Around 2,000 test cases run every night, up to 30 hours at a time at weekends.
Key Takeaways
- There are around 700,000 lines of code in RATIONAL’s combi-steamers, distributed across several electronic components that communicate with each other via bus systems.
- Ten test automation experts at RATIONAL face around 70 embedded developers, which is why scalability of the automation infrastructure is the biggest strategic challenge.
- Nightly test runs over several branches cover around 2,000 test cases, with the physical heating time of the real devices rather than the framework forming the runtime bottleneck.
- Where heating cycles would take too long, the team simulates specific temperature values in the system in order to test switch-off logic without having to heat up the appliance every time.
- RATIONAL is gradually shifting the test design to the specialist departments: The test team provides the infrastructure and framework, while the specialist departments are expected to write and maintain their own tests.
Commercial kitchen appliances are software systems with a steel shell
A combi-steamer for commercial kitchens is a networked software system. RATIONAL appliances contain around 700,000 lines of code, distributed across several electronic components that communicate with each other via bus systems and run with different firmwares.
The main software is developed by the manufacturer itself, while the electronic components come from external companies. Both worlds have to work together, and it is precisely at this interface that it is decided whether an appliance functions reliably.
The dimensions are not those of a home kitchen. The largest appliances process over 90 chickens in one go, are taller than a man and have a heating capacity of up to 70 kilowatts. Software that controls this much energy needs a tester that can reproduce this performance in real terms.
Why the test needs real hardware instead of pure simulation
When testing an appliance with 70 kilowatts of heating power, software alone is not enough. At RATIONAL, the test is backed by its own hardware infrastructure in its own racks: relay modules, remote power supplies, logic analyzers and the ability to switch electricity, gas and water on and off in a targeted manner.
Everything is accessible via the network, right down to the serial console on the device. USB runs via the network, which Andreas Berger himself describes as a challenge. This hardware is a lot of money and material, clearly identified as the expensive part of the system.
The limiting factor in test operation is not the software, but the hardware being tested. If a device has to heat up to test the heating, the test waits most of the time for physical processes. A server with eight cores and 8 GB of memory can easily manage 18 test devices running simultaneously. The heat still takes its time.
Clear focus: system testing and regression, no unit testing
The test automation team defined its focus early on. It carries out system testing, regression testing and low-level testing down to the bus systems. Unit and module tests remain on the development side, acceptance testing with the requester.
This demarcation is a decision, not a coincidence. If you want to test everything, you end up testing nothing properly. The focus on system behavior suits a product in which the interaction of many components makes the difference.
A typical test case activates an operating mode, such as cooking with hot air, and then checks at the controller level whether the correct controllers respond, whether the motor turns, whether the heaters start and whether the set target temperature is reached. This is followed by system integration testing, which also includes the networking solution.
What the networking of the appliances actually achieves
Networking here does not mean that the appliances communicate with each other. It means that updates and programs can be rolled out remotely. Large chains have the requirement to distribute their cooking programs or a software update to all devices in all stores.
The service required for this is provided by the manufacturer. A small restaurant can also use the same networking solution with its own account. For testing, this means that the roll-out must be checked in the same way as the cooking itself.
Why Python was the right choice for the test framework
Python was chosen because the team didn’t want to compile anything and needed a compact language with lots of libraries. Andreas Berger comes from the world of C and C++ and knows the difference inside out.
If I try to make a network connection or a serial connection, I program myself a wolf. If I do the same thing in Python, it’s two lines of code.
Andreas Berger
The compile times should be eliminated. Today, the team maintains several of its own Python packages, automatically uploads them to its own registries and therefore operates its own package infrastructure. Looking back, Andreas thinks the decision to use Python was the right one.
In addition to Python, GitLab and a test design tool form the basic framework. GitLab carries the pipelines and quality assurance of the framework. The test design tool helps with variant management: an abstract test case can be applied to 14 different device types via variable test steps without multiplying the test case.
How the test operation is clocked day and night
The goal is clear: all necessary branches run fully automatically at night, from the start to the evaluated report. Two branches are tested each night, each for around four to five hours, because the devices cannot be used twice and there is a queue behind them.
Around 2000 test cases are run per branch. The results should be available by six o’clock in the morning. At the weekend, a complete test runs for around 30 hours.
During the day, the devices are used for other tasks. Then there are on-demand runs if a developer has changed something and wants to test it directly. New test designs are also created here, the team provides debug support for complex errors, checks its own packages for refactorings and maintains the hardware.
The following overview organizes the clocking:
| time window | task | scope |
|---|---|---|
| night | automated run, two branches | approx. 2000 test cases per branch, 4 to 5 hours per branch |
| Weekend | Complete test | around 30 hours |
| During the day | On-demand runs, test design, debug support, hardware maintenance | as required |
Risk-based testing means prioritizing close to the customer
The team regularly finds errors and prioritizes them according to impact. Critical is anything that hurts the customer or would trigger many service calls. An error in the tenth sub-level of a service menu is usually less urgent.
Stabilization branches are quieter, the main supply branch is the more critical one. Every company is familiar with a classic problem: something is changed and the test doesn’t notice it beforehand. It happens.
Errors found do not remain in the tool. The team goes to the developers, talks and clarifies together where the problem lies and how it can be solved. Some faults are also infrastructure problems, such as a failed network, which must first be ruled out.
Testing security before the device is even damaged
Personal protection is initially covered by the intrinsic safety of the individual components and their approval. These protective shutdowns take effect late; they are intended to prevent personal injury and damage to buildings, even if the device has already been destroyed.
The test starts earlier. It checks the shutdowns that the software triggers before the device is damaged. If a component is at risk of breaking down, the heating should switch off in good time.
Such tests have their own pitfalls. An empty steam generator is broken within a short time if you heat it up. So it must not actually be empty when testing the heating barrier, but the system must assume that it is. You have to think about each of these conditions.
This is exactly where the simulation helps. If the heating has already been tested in upstream testers, the team plays temperature values directly into the system and specifically sets them one degree above or below to trigger heating shut-offs. In this way, the device remains intact and the test remains meaningful.
The real challenge is scaling, not technology
The biggest challenge is scaling with development. Embedded development has around 70 software developers at two locations, compared to ten people in test automation. The team wants to keep pace without catching up with development in terms of personnel.
This has resulted in a change of strategy. Instead of delivering complete test frameworks and finished test designs, the focus is shifting towards infrastructure. The team provides hardware and software for testing, while the specialist departments are increasingly responsible for test design.
The logic behind this is obvious: if you change the requirements, you can maintain the tests at the same time and reduce your manual tests in return. This distributes the load to where the specialist knowledge is located.
There is also the spatial distribution. One laboratory is located in Landsberg, another in Wittenheim in France, where a subsidiary looks after the tilting frying pan systems. Two colleagues there are part of the team. Regular visits in both directions keep the cooperation together and prevent groups from becoming independent at one location.
Test impact analysis as the next step
The next big lever is a test impact analysis. The team is working on automatically re-sorting test cases depending on what has changed in the code. Instead of stubbornly running the same block every night, the relevant tests should be moved to the front.
When asked about the highlight of the coming years, Andreas mentions two things: that the test impact analysis is running and that the team has done a good job of scaling the company. Both goals are related, because a smart test selection is exactly what keeps a small test team capable of acting alongside growing development.


