Skip to main content

Stop Writing Tests for the Wrong Functions Before a Bug Goes Live

Read Time

6 mins

Stop Writing Tests for the Wrong Functions Before a Bug Goes Live
11:14

Eighty percent coverage and a green pipeline is a comfortable place to be... Comfortable enough that most teams call it done. The problem is that coverage percentage says nothing about which code was actually exercised, or whether the parts carrying the most risk were ever touched. You can have near-perfect coverage on the most trivial functions in your codebase while the most complex, branch-heavy logic sits completely untested, and the report won't tell you either way.

Code coverage is a standard part of any testing workflow; It measures how much of your codebase is exercised by your test suite, and it does that job well. What it was never designed to do is tell you where to focus. That distinction matters more than it might seem. "How much is covered" and "which parts actually need covering" are different questions, and before a release, it is almost always the second one that determines whether you are ready to release an update.

The Prioritisation Gap

Take two functions, both sitting at 0% coverage. From a reporting standpoint, they look identical: untested, equally in need of attention. However,  they are not the same:

  • One is ten lines long with a single conditional, clear logic, low branching, easy to reason about.
  • The other has nested loops, multiple switch statements, and a dozen interacting if conditions.

The first is unlikely to hide surprises. The second is exactly where subtle bugs breed, and where a change made under time pressure is most likely to introduce a defect.

Coverage sees both as "0%:  needs attention", and it gives teams no basis for deciding which one comes first.

The natural result: engineers test what is easiest to test. Simple, accessible functions lift the percentage with the least effort. Complex, risky code stays untested  because nothing in the tooling made the case for why it should come first. So teams end up optimising for the metric, the number climbs, however, the risk does not fall.

What the CRAP Metric Measures

Coco 7.5 introduces the CRAP metric, short for Change Risk Anti-Patterns, by combining two signals that separately tell an incomplete story.

Cyclomatic complexity (the McCabe metric) counts the number of independent execution paths through a function. A function with no branches has a complexity of 1; each additional if statement, loop, or switch case adds to that count. High complexity does not guarantee bugs, but it does mean a function is harder to understand, harder to test thoroughly, and more likely to behave unexpectedly when modified.

Test coverage measures what percentage of that function your test suite actually exercises.

Together, they produce a single score per function that reflects real-world risk, where functions scoring above 30 are flagged as high-risk. That threshold indicates a function is both structurally difficult to maintain and likely to introduce defects when changed.

Overview Sources Functions Code
coverage
Functions
Function ▲▼ MC/DC % ▲▼ McCabe ▲▼ CRAP ▼
0% coverage <30% <60% ≥60% CRAP >30 — critical CRAP 12–30
Click any column header to re-sort · Hover column names for definitions

The table above is an interactive version of Coco's CoverageBrowser, built from real coverage data. Functions are sorted by CRAP score by default, the riskiest ones appear at the top, colour-coded red. Click any column header to re-sort, and hover the column names to see  each metric.

How It Works Inside Coco Code Coverage

In CoverageBrowser, risky functions are colour-coded: yellow for concerning scores, red for critical ones. Problem areas surface without any manual analysis or cross-referencing of separate reports.

The full function list can be sorted by CRAP score, turning what would otherwise be an open-ended investigation into a prioritised queue. The functions most in need of attention appear at the top, ranked and ready to work through. For teams whose workflows extend beyond the browser, CRAP scores are available in HTML and CSV reports as well, so risk data can feed directly into CI pipelines, sprint planning, or stakeholder reviews.

 

QA_icon_AutomaticReporting-1 x 100px

Free Webinar: Move from gut-checking coverage to ranked risk assessment

Join Live demos + Q&A

Where the Impact Is Greatest

This metric benefits any software team, but its value is most pronounced in two scenarios.

Large and legacy codebases

Large and legacy codebases are where the prioritisation gap hurts most. When a codebase has grown over years, sometimes decades, it is rarely feasible to achieve meaningful coverage everywhere at once. Teams need a principled basis for deciding where limited testing resources go first. CRAP provides a data-driven ranking of where structural risk and coverage gaps coincide, making those decisions traceable rather than instinctive.

According to Gartner research on AI-assisted code modernisation, the biggest challenge organisations face with legacy code is not the technical work of migration itself, but knowledge: understanding what the existing code does and identifying where the real risk sits before any changes are made. The CRAP metric directly addresses that gap, giving teams a data-driven answer to the question that is hardest to answer by inspection alone.

Gartner also highlights test generation as one of the most practical and immediate uses of AI for legacy modernisation, specifically, generating tests from legacy code to create a behavioural baseline before migration begins. The CRAP score makes that process more efficient: rather than generating tests indiscriminately across the entire codebase, teams can direct that effort toward the functions that score highest, where the risk of undetected defects is greatest.

Safety-critical software

Safety-critical software raises the stakes further. In automotive, avionics, medical device, and industrial automation, a defect in complex, untested code is not a bug report, it is a potential recall, a certification failure, or a safety incident. A metric that ties structural risk to coverage gaps gives teams both a practical prioritisation tool and a defensible, auditable basis for their testing strategy, something increasingly relevant as standards like ISO 26262, IEC 62304, and DO-178C demand evidence of rigour.

Stop Chasing Percentages

"We all have this problem. We have coverage data but we don't know what to do with it."

That is a more honest description of where most teams sit than the dashboards suggest. The tooling to measure coverage has existed for a long time. What has been missing is the layer that translates raw numbers into actionable risk: not how much has been tested, but whether the right things have been tested.

The CRAP metric contextualises code coverage by turning a flat percentage into  a ranked list of functions that deserve attention before the next release. High coverage on simple code matters far less than coverage on the complex, critical paths where defects are most likely to hide, and most expensive to find after releasing. That distinction now has a score. And a sorted list to go with it.

References

  1. Qt Coco 7.5 release notes and product documentation — https://doc.qt.io/coco/release-notes.html 
  2. Gartner, "Assessing GenAI for Modernizing Legacy Application Code," Matt Brasier, 16 December 2024, ID G00822567 — https://www.gartner.com/document-reader/document/6020435


Gartner content referenced in paraphrased form in accordance with Gartner's content compliance policy. Gartner does not endorse any vendor, product or service depicted in its research publications. Gartner research publications consist of the opinions of Gartner's research organisation and should not be construed as statements of fact.

If you want to learn how to work through the CRAP score list efficiently when you're managing a large codebase under time pressure, join our next live webinar.

We'll also show how AI takes your coverage data further, turning a prioritised function list into a full risk assessment your team can actually act on, sprint after sprint.

If this post made you think about how your team currently uses coverage data, it's worth a 40 minutes of your time.

 

    Try Qt for Free

    Download now