DevEx (developer experience) is a framework for measuring and improving how developers experience their daily work, built around four DORA metrics: deployment frequency, change lead time, change failure rate, and mean time to recovery. Quality assurance directly influences all four. Test automation affects pipeline speed, coverage decisions shape failure rates, and targeted testing experiments help teams reduce recovery time and deliver faster.
Key Takeaways
- Change failure rate is the DORA metric most directly tied to QA work, because finding failures before production is the core of what testers do.
- A slow or shared test environment blocks all four DORA metrics at once; giving each developer an isolated personal test environment resolves bottlenecks in change lead time, deployment frequency, and mean time to recovery.
- Metrics without context are just numbers: an acceptable failure rate or deployment frequency depends on the risk profile and business priorities of the specific company, not on an industry standard.
- Introducing changes as low-friction experiments with a fixed review window makes teams willing to try them, because a failed experiment carries no stigma and is simply replaced by the next one.
- QA professionals who learn to express their work in DORA terms can make their contribution visible to product owners and business stakeholders who already speak that language.
What DORA and SPACE actually measure
DORA and SPACE are two frameworks for measuring how software teams deliver and how developers experience their work. Both sit under the broader idea of developer experience, often shortened to DevEx.
DevEx as a term goes back to a scientific paper from around 2010 to 2012. The paper turned a familiar question inward: teams talk constantly about user experience for the people using their applications, so what about the experience of the people building the software? From that question, work began to attach real metrics to it.
DORA stands for the DevOps Research Association. It set the early standard during the period when nearly every team was moving toward DevOps and team ownership. The framework was acquired by Google, which then folded the metrics into its yearly State of DevOps report. That report is still the best starting point if you want to understand the metrics from the source.
SPACE arrived later, from a different angle, built by Microsoft, a university, and another company. It does not stand for a single word. SPACE looks more at the developer’s felt experience: whether you are supported, whether you have enough mandate and agency in your own work. The P in SPACE stands for performance, and that performance dimension maps closely onto the tangible DORA metrics.
The two frameworks overlap heavily. DORA is the more concrete of the pair, which is why it is the more useful entry point for testers who want a measurable hold on quality.
The four DORA metrics, in plain terms
DORA reduces delivery health to four measurements: deployment frequency, change lead time, change failure rate, and mean time to recovery.
Deployment frequency asks how fast a team can ship changes. You can tweak this from a process angle or from a tooling angle, through CI/CD pipelines and quality gates. There is almost always something to make the pipeline faster or smarter.
Change lead time tracks how long it takes to go from an idea to working software in production. The step teams often skip is working out the requirements, and skipping it stretches the implementation time later. Pair programming, in-pair continuous testing, and test-driven development all shorten this stretch.
Change failure rate is the metric closest to testing work, because finding failures is what testers do. It counts how often changes fail. On its own the number says little. A failure is information you now hold because someone looked, the same way a bug is a finding that still needs a conversation about whether it matters.
Mean time to recovery measures how quickly a team gets back to a working state after something breaks. When teams are buried in production fixes, this metric and the lead time both suffer at once.
A metric is insight, not a verdict
A metric is just a metric. It gives you visibility, not a judgment.
Finding a bug does not settle whether the bug is important. That still takes a discussion. The same holds for a failure rate or a deployment frequency: the number surfaces because you are measuring, but the meaning has to be decided for your specific product and company.
There is a known trap here. If you reward a measurement, people start optimizing the measurement with small, hacky tweaks rather than improving the underlying work. The cobra effect, where a bounty on cobras led people to breed cobras, is the cautionary tale. So before you chase a target, define what an acceptable failure rate or an acceptable deployment frequency actually is for you.
Speed has a ceiling you should choose deliberately. You could gold-plate the setup so a star developer pushes an idea to production in ten minutes. The real question is whether your setup is safe enough for that, whether you have the quality gates and coverage your risk profile demands. Faster is not automatically better.
Why testers belong inside the DevEx conversation
Testers have a direct, measurable impact on all four DORA metrics, which makes these frameworks a natural home for quality work.
Test automation touches every one of the metrics, each in its own way. The shape of your coverage decides how fast you can move. If you lean only on slow UI tests, parallel testing will not save you, because spinning up a test environment can never match the speed of unit tests that run in the same window. Getting the right balance across the automation pyramid is what unlocks deployment frequency.
This is also where a tester earns a different kind of relevance. Quality work is often hard to see, and visibility is a recurring problem for QA. DORA gives you a way to make testing tangible and measurable in language other people already respect, language popularized through Google’s reporting.
The harder discipline is talking value, not just risk. The highest-risk area is not always the area your company values most, so there is a trade-off to weigh. Several conversations in the field now push toward value-based testing, asking what testing actually adds rather than where the risk simply sits. If QA learns to speak value, it stays relevant and visible.
How a DevEx coach works with a team
A DevEx coach embeds with a team, reads the metrics together with them, and runs small experiments to improve specific numbers. The goal is to hand the work back, not to become a permanent fixture.
The approach played out across roughly twelve to thirteen development teams in one organization, grouped into streams like front-end, back-end, and logistics. Coaches were assigned across quarters. The pattern was consistent: observe first, ask what is hurting, then workshop changes. People open up quickly when someone arrives with an open hand instead of a deadline.
The experiments run in cycles of about four to six weeks, depending on what is being changed. At the end comes a retro and a handover, and then a check-in later. The cadence matters because metrics need time to move before you can read whether a change helped.
Calling the changes experiments is deliberate. An experiment is allowed to fail. That framing lowers the friction of trying something new and keeps the team willing to engage, because nobody is on the hook for a guaranteed result.
A concrete experiment that worked
One web-based team owned the web portal and had no test environment of its own. Everyone tested on a shared acceptance environment, so one team’s breakage hurt everyone, and everyone else’s breakage hurt them.
The fix was simple: give each developer a personal integrated test environment. That let the team test in isolation, build confidence before touching acceptance, and move faster. Lead time and mean time to recovery improved because a single blocking problem was removed.
How to keep an experiment from quietly dying
The realistic answer is that experiments fade unless you make them small, attributable, and revisitable. Human nature pulls a team back into its sprints the moment the coach leaves.
Keep the chunks small. Two changes at most, and ideally in two different areas or aimed at two different metrics. That separation lets you trace which change actually helped. If you alter several things at once, you cannot tell what worked.
When a change does not move the metric even though the team upheld it, name it a failed experiment and move on to the next one. That is not a setback, it is the method doing its job.
Respect the team’s cognitive load. A team facing a long list of complaints cannot act on all of them. Ask which single change would make the most impact right now, keep the entry barrier low, and leave the carrot visible so people stay motivated.
Then come back. The honest cycle looks like this:
| Stage | What happens |
|---|---|
| Coach present | Observe, workshop, agree on one or two experiments |
| Coach leaves | Team returns to sprints, the change starts to slip |
| Check-in (around three months) | Metrics show whether it held; if not, run it again |
| It sticks | After a second try, the team feels the benefit and the change cements |
Where to start if you want to apply this
Begin at the source, then map your own work onto it. Read Google’s State of DevOps report and learn what each DORA metric means before layering anyone’s interpretation on top.
A useful first step is to cross-reference the DORA basics against the QA practices you already run. You are doing testing work today; the task is to connect it to a metric so that developers, a product owner, or even a CEO can see the effect. The pitch translates cleanly: if an idea reaches production with a shorter change lead time, the company moves faster and the people in it are happier.
If we learn to talk value as QA, I think it’s super important for us to remain relevant, visible, because that’s typically an issue with being in QA. — Martijn Goossens
Experienced testers can do much of this mapping themselves. More junior testers will want clearer guidance, which is why a single overview that links DORA to concrete testing activities is worth building before you start.


