What are microservices and what role do they play in modern software architectures?

Microservices are an architectural style that divides software applications into small, independent services, each performing a specific function. This architecture enables more flexible development, scalability and maintainability of software applications.

What are the challenges of testing microservices?

The challenges of testing microservices include the complexity of interactions between services and the management of technical debt in a dynamic development environment.

How can you successfully switch from a monolithic to a microservices architecture?

The transition to a microservices architecture requires careful modularization of the existing system, consideration of organizational factors, and strategic planning to fully benefit from its advantages.

What types of testing are particularly important for microservices?

Various test types are crucial for microservices, including end-to-end testing, unit testing and API testing. Each of these test types plays an important role in ensuring the quality and stability of the application.

How can development teams implement test automation in a microservices environment?

Development teams should take responsibility for test automation by implementing best practices, integrating automated testing into their development process and paying continuous attention to the quality of testing.

Why is monitoring and observability important in the testing process of microservices?

Monitoring and observability are crucial for the testing of microservices, as they help to detect and diagnose errors at an early stage. They support the shift left/right approach to ensure the quality of the software throughout the entire development cycle.

Testing microservices

Testing microservices means distributing automated tests across three levels: Unit testing for business logic, API testing at the service level and reduced end-to-end testing for the main usage path. Flaky tests are consistently removed or stabilized. Production monitoring and alerting replace complex test environments because fast rollbacks keep the risk of errors per release low.

Key Takeaways

Microservices do not solve a technical problem, but an organizational one: If you make Conway’s Law an architectural principle, you scale teams, not just code.
A pure end-to-end testing suite with several hours of runtime is not a safety net, but a maintenance problem that slows down the overall development speed.
Consistently quarantining flaky tests and addressing the responsible teams directly is an effective way to gradually clean up a bloated test suite.
Shift-right through monitoring and alerting can partially replace end-to-end testing on test days because a quick rollback in e-commerce is cheaper than elaborate test environments for rare production defects.

Why the monolith became microservices

The trigger for microservices is rarely technical, but organizational. REWE started with a monolithic online store that was connected to many legacy systems and had grown chaotically over the years under time pressure. The code had to be modularized and made maintainable, while at the same time many teams had to work on it in parallel.

This is exactly what is difficult in a monolith. Michael Kutz describes the “merge hell” in which one person refactored a file and three others did the same thing differently. In the end, there are a hundred changed files and nobody knows which version will win.

Microservices solve this problem by making Conway’s Law an architectural principle. Teams are given clearly defined areas of responsibility and can deploy independently of each other. The price for this is new problems: coordination, API management between services and a more complex code organization.

Technically, microservices don’t really make sense. It is an organizational problem that you are elevating to a technical level. And with the “only” I see the mistake in the sentence, because in IT everything is an organizational problem.
Michael Kutz

Long end-to-end testing slows down any rapid development

If you validate a system almost exclusively via end-to-end testing, you pay for it with long runtimes and poor informative value. The test suite of the old monolith ran for three to four hours, even after optimizations for a good two hours.

There is a comprehensible pattern behind this test architecture. If a system is constantly changing, developers don’t like to write unit tests with lots of details because these details are constantly changing. Instead, they concentrate on what is stable: the end-to-end testing, which checks the requirement directly.

The goal was different. Following the example of extreme programming, the team wanted to know within around ten minutes whether a change had broken something. Seeing a red light hours later does not fit the bill.

Error analysis was particularly expensive. A falsified end-to-end testing initially only provides a red signal, but no information about the cause. It was often due to a slow database at the wrong time, i.e. something for which the code that had just been deployed was not responsible.

Three times red only then means a real error

An honest, albeit unattractive metric from practice: a test was only treated as a real error if it was red three times in a row. It was economically easier to run the suite again than to search for the cause after every single failure.

This pragmatism is not a role model, but it illustrates the basic problem with flaky end-to-end testing. If analyzing is more expensive than rerunning, the test loses its value as an early warning system. You are then testing the stability of your infrastructure rather than your code.

How microservices are gradually separated from the monolith

The way out of the monolith followed the strangler-fig pattern. The existing code was taken for granted and no longer touched because any change would have destabilized the cemented construct. The assumption that this code was error-free was never true, but it was the basis for the work.

Functions were gradually replaced by code in microservices. A bypass led from the monolith to the new service, which took over the function. The first services were essentially databases with APIs, i.e. hardly any business logic, but mainly data storage that encapsulated the complex database structure of the monolith.

The business logic was then gradually pulled over these services. In the end, only the rendering of the HTML remained in the monolith. The switch to asynchronous APIs was a conscious architectural decision, even though eventual consistency and end-to-end testing do not go well together.

How to build a real test pyramid

Every new microservice was given a cleanly formulated API and its own tests right from the start. With Spring Boot’s test tooling, only the individual service was started up and the API was consistently tested without looking at the internal implementation. At the same time, unit tests were created for the places with business logic.

In this way, a pyramid-like test structure gradually grew. What was missing for a long time was a good tip. The old overall test suite remained the end-to-end level, and everyone was responsible for it. When everyone is responsible for something, it usually ends up being no one.

The reduction of end-to-end testing followed a clear logic. Where a functionality was already secured at API level or in unit testing, there was no need for ten end-to-end variants of the same case. One test per risk is sufficient if the risk is already covered further down the pyramid.

Quarantine instead of maintenance: how to sort out flaky tests

For many teams, a shared test suite can no longer be maintained centrally. Four to six teams have become twelve to fourteen. Where initially a kind of guild worked together on the suite, later only a few people took care of it.

The solution was a simple mechanism. A test that failed on the first attempt went into quarantine. The team presumably responsible was asked to stabilize it, because flaky tests are not wanted.

Often the response was that the team didn’t even know about the test and no longer needed it. In this way, the suite shrank organically: the stable tests remained, the rest fell out. At some point, it was completely rewritten.

The new suite deliberately only tested the Money Path. Instead of many individual Given-When-Then cases, a continuous test was created that runs through the Happy Path and breaks exactly where something is wrong. This makes it easier to isolate the error.

Shift left, but look right: shifting tests and observing production

The most effective complement to smaller tests is good observability in production. The team followed two movements. First, shift left: tests were made smaller and moved to a lower level to get feedback earlier.

Then came “shift left, but look right”. Logging and observability were expanded, but above all monitoring and alerting. The rule: as soon as a customer sees an error message, a red light should go on somewhere. This does not apply to pure network problems, such as someone traveling through a tunnel on a train.

Frequent releases reduce the risk per deployment. A large number of small changes means that each individual release can break very little. The weeks-long acceptance testing of the monolith became superfluous.

This shift changes the role of end-to-end testing. For an e-commerce business, it can be enough to deploy to production in case of doubt, wait for the alert and roll back if necessary. The effort required to reproduce every rare error in advance on a test day is disproportionate to the potential damage.

Why rollback capability is not an optional extra

This approach does not work without a clean deployment mechanism. During a microservice changeover, you have to constantly redeploy parts of the system without any downtime. This forces procedures such as blue-green deployment or canary releasing.

Some lessons remain painful. Database migrations need much more attention and care in this context. Nevertheless, the gain is great: the ability to deploy something to production and check whether it works there.

Those who can’t live with the rollback often build huge, energy-intensive test environments instead. This is often the more expensive way, which only masks the actual gap.

Clean testing is less motivating than you think

Faster feedback motivates, cleaning up tests rarely does. Developers are happy when they halve the runtime of the test suite, because then everything goes noticeably faster. Replacing a large end-to-end testing with six unit tests, on the other hand, is hardly fun for anyone.

The reason lies in the attitude of many developers. They want to write code that solves problems, not code that creates problems. A failing test is initially perceived as a problem, not a help. Advocates of good test code quality are rarer than most people assume.

This leads to some practical advice. If you want to improve test quality, link it to tangible benefits such as shorter runtimes and faster feedback. Pure tidying up without visible benefits is difficult to achieve.

Tidying up needs the same plan as feature work

Clean-up sprints often fail because nobody plans them. There is always too much and too little time at the same time: you never clean up the big mess, and most of the time you don’t even want to.

Michael compares it to a child’s room. Tell someone to tidy up and the first question is: Where should I start? It’s too big and lacks structure.

The mistake lies in the expectation that you already have a plan for tidying up anyway. You don’t, because you’re in feature mode. Reducing technical debt requires the same planning effort as feature development. Without this plan, the time given is wasted.

Why test data quality remains the hardest end-to-end problem

The biggest open construction site is not the test technology, but the data supply on the stages. Hundreds of stores with different product ranges, different logistics organizations and an unmanageable number of configurations have to be mapped.

Delivery warehouses that supply private customers directly sometimes work differently to other locations, and the IT process has to map these variants. Providing an environment that delivers all variants at all times is time-consuming.

This shows why full-time testers are needed again. For a long time, there were no testers at REWE, only development teams that did the testing. Today, there are some very good testers who do end-to-end testing, some of them using the mobile devices that employees work with in the warehouse. Not every developer has these devices on their desk.

Exploratory testing belongs in the development teams

One specific recommendation is to anchor exploratory testing directly in the teams. A development team should also take half an hour during a sprint to put their own product through its paces.

A good charter often comes from the code itself. When reading an API, a developer notices that the authorization may not work as it should. It is precisely such internal inconsistencies that can be quickly uncovered through exploration.

Professionally simple, technically complex: the trap of legacy systems

Some requirements are professionally trivial and technically surprisingly complicated. From a business perspective, a change is easy, but technically it is not. Michael categorizes this clearly: Things that are professionally simple should not actually be technically complex.

The reason lies in the history. Legacy systems that have already been rebuilt three times and passed through four teams are difficult to adapt. Both platform engineers and the business side often need to be picked up where the actual test problem lies.

A well-tailored microservice and data architecture pays off, especially when there is external pressure. The VAT reduction could be implemented surprisingly quickly because the data architecture made it possible. Regulatory changes also go faster if the organizational structure and code fit together well.