Shift Left

Shift left means integrating quality assurance into the development process as early as possible instead of doing it downstream. This starts with clean-cut user stories, continues with tool-supported code quality directly in the development environment and ends with broad unit test coverage as the basis for all further test levels.

Key Takeaways

Unit testing is the basic prerequisite for any shift left strategy: without this fine-meshed network of cases at code level, sufficient coverage at higher test levels cannot be realized economically.
Consumer-driven contract testing with tools such as PACT makes it possible to check API compatibility between services locally against automatically generated mocks without having to set up a complete integration environment.
Quality feedback directly in the IDE, for example cyclomatic complexity displayed inline, is superior to a later build server report because the developer reacts immediately instead of turning an additional correction cycle.
Teams that let each developer implement their own user story alone and only involve the testers at the end of the sprint build a mini-waterfall in the sprint and risk not completing a single story at the end of the sprint.
A blanket 100 percent unit test coverage as a goal is counterproductive; it makes more sense to explicitly annotate critical business logic and demand complete coverage for precisely this code.

What shift left really means in quality assurance

Shift left means ensuring quality as early as possible in the development process, not downstream at the end. Shift left-right refers to the process from requirement to delivery. Shifting quality to the left means doing things as early as possible.

The basic idea is multi-level: quality can already be ensured in the user story. This is followed by broad coverage at unit testing level in the team. The other quality levels build on this, which are interlinked, complement each other and are as non-redundant as possible. The result is delivered in a quality-assured manner via a continuous integration and delivery pipeline.

This knowledge is not new. Decades of experience in software development have led to agile processes in which collaboration between development and testing has once again become a matter of course. With DevOps, Operations is now also growing into this integrated approach.

The requirement is the biggest lever for quality

The biggest lever lies at the beginning: well-described and implementable requirements. Agile processes offer long-established methods for cutting, reviewing and checking user stories against quality criteria at the most cost-effective time.

The real sticking point lies one level above the individual stories, in the mindset of the team. In many teams, the historically grown structure drives the slicing of user stories. A classic pattern: each developer grabs their own story at the start of the sprint, implements it alone and two testers check everything at once in the last two days. This is a mini-waterfall in the sprint.

The reason for this sounds harmless, but it’s the problem. “We don’t want to get in each other’s way” is an approach that makes real collaboration difficult. The editing of a story is based on what a single person can do alone, not on the value of the result.

Cut user stories vertically, not by convenience

Developer convenience should not drive slicing, the value of the result should. A vertical slice encompasses all levels of a function and delivers something executable at the end of the sprint that can be tested and used.

A horizontal cut, on the other hand, produces dead intermediate statuses. If you only pack the surface of a function into a story and only develop the associated database three sprints later, you have built something that can neither be tested nor used. It has no added value.

The goal of every iteration is a finished piece of added value: quality-assured, functional, ready for delivery. Vertically cut stories are easier to validate because a completed function can also be tested.

Scrum provides for a prioritized sprint backlog. Theoretically, the entire team works on the highest-priority story first and completes it, then on the next one. Out of five committed stories, only three may be finished in the end, but these three are really finished. With the mini-waterfall, on the other hand, potentially nothing is finished at the end because all stories were started at the same time and none were completed.

Stop starting, start finishing.
Alexander Vukovic

Why teams are reluctant to work together

There is often fear, not ignorance, behind sticking to the old way of doing things. It is often the biggest know-how carriers who have been with the company for decades and are afraid of becoming replaceable if they share their knowledge.

For such people, a cross-functional team is a potential danger because others are expected to acquire the same knowledge. The only way to change their mindset is to accompany them through several iterations and show them that their fears are unfounded.

The argument can be refuted: a junior will never catch up in three months what someone has acquired over thirty years. But if they can take over certain tasks if the experienced employee is unavailable, everyone benefits. This is actually a task for the Scrum Master. In practice, this role has been optimized away in many Scrum implementations right from the start.

The development environment brings quality directly into the code

The IDE plays a bigger role than many expect. A great deal of quality can be brought into the coding itself via the development environment, long before the build server even starts.

Refactoring functions are one example. Being able to rename a class method consistently throughout a project instead of breaking your fingers in every file is a small thing that modern IDEs can do today. In the Java environment, IntelliJ is the most common, on the .NET and C# side Visual Studio, with the free, lean and fast Visual Studio Code in between.

Plugins provide immediate quality feedback in the editor. A plugin for Visual Studio Code measures the cyclomatic complexity of a method inline as you write it. If you write too many nested conditions, it reports directly above the code that the complexity is too high.

This feedback is part of Shift Left. If the same static analysis was run on a Jenkins or GitLab first, you would have one more loop: wait, look inside, get reprimanded, rebuild. In the editor, you see the problem at the moment of writing. This saves time for the entire process.

Tools like SonarQube are established in the enterprise environment for static analysis of code. But if the same feedback is immediately available, it should also be used immediately instead of waiting for server analyses and filled dashboards.

Code quality cannot be tested afterwards

Code quality is either there or it is not. Testing cannot be used to test code quality into software; testing can be as good as it wants to be. If you ignore quality criteria such as naming, coding conventions or a lack of code duplication, you are building on a basis that will be difficult to continue working on later.

The mechanism is predictable. If the team is under pressure to deliver as much functionality as possible in a short space of time, they will churn out whatever they can without regard for quality. This works, but not for long. At some point it falls back, and the developers usually have to take the blame because the pressure to deliver remains and even increases if the software is successful.

This leads to a simple rule for day-to-day work: you should invest what can be invested with the help of tools and without significant time expenditure. This is a basic prerequisite for the long-term value of the software.

How much unit testing coverage makes sense

Unit tests are the basis of Shift Left, the fine-meshed network of cases. They are isolated, executable regression tests on a small scale that ensure that the code continues to do the same thing as before. With this case network, changes, including major architectural changes, can be made with the certainty that built-in errors will be detected.

No other test level provides this level of coverage. If you wanted to achieve the same coverage at higher levels, you would have to invest disproportionately more, which is not commercially viable.

The ideal unit testing coverage is not 100 percent. Many things are not meaningfully testable at unit level. A unit test should run in isolation, test a small unit and run in milliseconds to seconds at most, not hours. End-to-end and integration testing therefore belong at the levels above.

The standard example is getter and setter methods. If you test a setter method just to reach the quota, you have successfully tested the assignment functionality of Java or .NET. This brings no added value, it is waste.

Then there is legacy code. If you have been developing for ten years without unit testing, you start with zero percent coverage. No management and no customer will pay for source code that has grown over many years to be brought under full coverage at a later date. A target of 100 percent is not a goal here, and even an increase from seven to ten percent does not motivate anyone.

The better way is to use critical code. Define taxatively which code parts should definitely be under coverage, such as business logic that calculates with monetary values. You annotate this critical code, measure it and demand 100 percent coverage. This is independent of legacy code and getters and setters and resolves the conflict of objectives.

API testing and contract testing close the gap between unit and interface

In architectures with microservices and APIs, the next step is to secure the API level. API testing does not check the graphical interface, but the programmed interface: it sends API calls and checks the result.

Another level can be inserted between unit testing and API testing: consumer-driven contract testing. An API always consists of a provider who offers it and one to many consumers who use it. To ensure that they can be developed independently of each other, both sides agree on a contract: which requests are permitted and which responses are expected.

The open-source tool PACT records such contracts on a consumer-driven basis and stores them centrally. Mocks that simulate consumers or providers can be automatically generated from the contracts. This makes it possible to test against the contract on the development computer or in the CI system without a complex setup with a database and Docker images.

The benefit is early feedback. You find out immediately whether all consumers are still compatible with a changed provider. This interaction is particularly important in microservice architectures and in enterprise environments with Enterprise Service Bus or Kafka, in which many services exchange data.

The following layering distributes responsibility sensibly:

level	what it is intended for
Unit Testing	Fine Mesh Case Network, Isolated Small Units, Critical Code
Consumer Driven Contract Test	Compatibility between provider and consumer, early and without full setup
API testing	Secure backend, check variants and combinatorics
UI testing	End-to-end technicality and layout, in small quantity

The test pyramid instead of the ice cone

Shift left also means creating the test pyramid instead of the testing ice-cone. The pyramid has a lot of unit testing at the bottom and little surface automation at the top. The ice cone as an anti-pattern reverses this: everything is automated via the interface at the top, nothing at the bottom.

Modern interfaces are usually built with React, Vue or Angular, Svelte is emerging. They fetch their data from the backend via HTTP, REST or GraphQL calls. A backend for frontend often aggregates these calls into a UI API. At this API level, one level below the user interface, combinatorics can be checked without having to click through a hundred variants in the UI. This is faster and less prone to change.

The UI itself still needs to be tested, both for correct layout and function. But interface automation is only a small component. If you take the lower levels seriously, you can limit yourself at the top to what this level is intended for: technical end-to-end testing.

Automated delivery requires the entire chain

Frequent releases can only be managed via an end-to-end automated continuous integration, delivery and deployment pipeline. The ideal state that large providers have achieved: At the push of a button, the entire quality assurance runs, the result is automatically deployed to production, where it is tested and monitored again, with immediate feedback as to whether it works.

This level of automation brings new risks. CI systems have become a target for attack because they can be used to intervene directly in the code. The security of automated pipelines is a major topic in its own right, for which there is now even a separate OWASP Top 10 list. There is no advantage without a potential disadvantage, but the direction is clear.