Skip to main content

Search...

Metrics: Asset or Trap?

More tests don't mean better quality. See why pairing every metric with a counter-metric is the move that actually drives improvement.

7 min read
Cover for Metrics: Asset or Trap?

Quality metrics for software teams are measurable indicators that track whether a change or improvement effort is actually working. A single metric almost always needs a paired counter-metric: for example, mean time to resolve a production issue should be balanced against reopen rate, and test pass rate against escaped defect rate. Teams should own no more than three or four metrics at a time, and every metric must be reviewed and acted on periodically to have any value.

Key Takeaways

  • Every KPI needs a counter metric: measuring mean time to resolve bugs without also tracking reopen rate invites teams to close tickets without actually fixing the underlying problem.
  • Teams should own no more than three or four KPIs at once, because a focused set drives direction while a long list produces noise and diffuses accountability.
  • A metric nobody acts on is a useless metric: ownership and periodic review are what separate a meaningful KPI from a number that decorates a dashboard.
  • Involving the team in building their own metrics creates genuine ownership, which is the difference between engineers who game a number and engineers who care about what it measures.
  • High unit test coverage can coexist with production defects, making escaped defect rate the necessary counterweight to any pass-rate or coverage percentage metric.

Why a single metric never tells the truth

A metric only earns its place when it comes with a counterpart. Counting the number of tests written or the unit test coverage percentage looks tidy, but it pushes people toward gaming the number rather than improving the product.

If a tester is measured purely on how many tests they write in a given period, they will write tests to hit the target and collect the bonus. The metric then describes effort, not quality. That is why metrics should travel in pairs.

A pair holds the number honest. Mean time to resolve a bug reads well on a dashboard, but on its own it tempts teams to close tickets fast. Pair it with the reopen rate and the picture changes: now closing a ticket only counts if the issue stays closed. The same logic applies to test pass rate. A high pass rate looks reassuring until you set escaped defect rate or flaky test rate next to it. Then you can see whether green tests are actually catching anything.

Coverage of 90 percent still ships defects

High coverage does not prove quality. You can hit ninety percent unit test coverage and still have defects in production, because coverage measures how much of the code your tests touch, not how well they catch what matters.

Jani Grönman points to a gap many engineers carry from their early years: the belief that more coverage equals fewer bugs. The missing piece is the guardrail. Without a counterpart that watches what leaks past the tests, coverage becomes a comfort number.

“I can have 90% unit test coverage and then still have defects in production. How can that be? We are perfect.” Jani Grönman

The honest question is not “how green are we?” but “what are we not testing that still leaks defects into production?” Escaped defect rate answers that, and it tells you how good your test set really is even when every test passes.

The bug-discovery curve still holds, with one caveat

The classic curve, many defects found early, then a flattening as fewer new ones appear, is still real. A fixed test set against an unchanging product will find the bugs it can find and then stop.

That is the useful insight: a set of tests develops up to a certain point and then plateaus. Coverage and test count cannot tell you about the defects living outside that set. Escaped defect rate can. It reveals whether the plateau means “we have found the bugs” or “we stopped looking in the right places.”

Keep the metric set small and owned

Three or four KPIs per team is enough. More than that scatters attention instead of pointing it somewhere. A focused set keeps everyone moving in the same direction.

Every metric needs an owner and a review rhythm. A number that is measured but never acted on is dead weight. If test pass rate drops and the response is to disable failing tests, the metric improves while the product gets worse. Ownership means someone analyzes what the number says and does something about it.

A metric also has to be understood the same way by everyone reading it. The same number can be interpreted in different ways, so part of adopting any metric is agreeing on what sits inside it and what it actually tells you.

One team, one number beats separate scoreboards

Measuring testers and developers with different metrics splits a team that should share a goal. Jani Grönman argues for a shared number the whole team can own, drawn from product-minded thinking rather than role-specific scorecards.

Mean time to recovery works as a team-owned technical metric. DORA metrics matter in a technical sense. But to point a team in one direction, you want something closer to the business. Spotify shares a time-spent-listening metric with its teams; the principle is a single number that connects daily work to what customers experience.

Product-minded development means pulling the team into contact with end users and business people. When developers and testers understand where revenue comes from, they can let that shape their work. The point is agency: your work matters because it contributes to the product, not because you produced a quota of tests.

How to introduce metrics without breaking trust

Build metrics with the team, never dictate them. Announcing “we now measure your this and that” invites the team to game the number, and you should expect that outcome.

Start with the people who are naturally curious about why the product exists. Some engineers prefer distance from business goals and want to focus on the technical side, and that is fine. Others want to know why they are building what they build. Those people grasp KPIs faster and become a starting point.

A useful entry is rework. Ask how much time goes into fixing bugs or figuring out why something does not work. If half the week disappears into rework, put the question plainly: would you rather spend ten percent of your time on bugs instead of fifty? That framing builds the case for measuring, because the metric now connects to something the team itself wants to change.

When you set the metrics up this way, ownership follows. People who helped shape a number care about moving it, and they treat it as a measure of something real that customers notice, not as an abstract figure handed down from above.

What’s important for you, the client, and the company

A simple three-way question opens the conversation about which metrics matter: what is important for you personally, what is important for the client, and what is important for the company building the product.

Product management is usually stretched thin and has little time for metric debates. Getting their attention and then working from these three angles keeps the discussion grounded. From there you ask whether the team is reaching its goals, where it should be, and whether features are being built for their own sake. The aim is to reach the why of the work.

Lead time to production serves as a strong top-level metric for this. With a benchmark for the kind of team you run, you can ask whether you are hitting that goal, and if not, why not. The answer turns into concrete work the team can do to close the gap.

Testers belong upstream, near the requirements

Faulty requirements cause defects, and most testers know it. The harder admission is that testing still sits too far to the right in the software development lifecycle, away from where requirements are written.

Bad requirements lead to bad test design, because you cannot test something well without understanding what it should do. Yet the response is rarely to move closer to requirements engineering. Testers have much to offer there: testing the business idea, challenging product management, examining requirements before code exists.

Counting production issues only helps if you analyze whether each one is actually a software bug. Production defects have many causes. For a development and testing team, what counts are the issues that trace back to missing test cases, software bugs, or requirement problems. A solid root cause analysis feeds that information back and is one of the more valuable assets a team can build.

Share this page