How to Stop Writing Tests for the Wrong Functions and Start Testing with Ranked Lists by Risk

TL;DR

A high code coverage percentage doesn't tell you whether you're testing the right code. Teams naturally gravitate toward testing easy, simple functions that boost the metric, while complex, high-risk logic stays untested.

The CRAP metric (Change Risk Anti-Patterns), combines cyclomatic complexity (how many branching paths a function has) with test coverage to produce a single risk score per function. Functions scoring above 30 are flagged as high-risk.

In practice, CoverageBrowser surfaces these risky functions colour-coded and sorted by CRAP score, turning an open-ended investigation into a prioritised queue of what to test next, before the next release goes out.

The metric is especially valuable for large/legacy codebases where full coverage isn't feasible, and for safety-critical software (automotive, medical, avionics) where untested complex code can mean recalls or certification failures rather than just bug reports.

Eighty percent coverage and a green pipeline is a comfortable place to be... Comfortable enough that most teams call it done. The problem is that coverage percentage says nothing about which code was actually exercised, or whether the parts carrying the most risk were ever touched. You can have near-perfect coverage on the most trivial functions in your codebase while the most complex, branch-heavy logic sits completely untested, and the report won't tell you either way.

Code coverage is a standard part of any testing workflow; It measures how much of your codebase is exercised by your test suite, and it does that job well. What it was never designed to do is tell you where to focus. That distinction matters more than it might seem. "How much is covered" and "which parts actually need covering" are different questions, and before a release, it is almost always the second one that determines whether you are ready to release an update.

The Prioritisation Gap

Take two functions, both sitting at 0% coverage. From a reporting standpoint, they look identical: untested, equally in need of attention. However, they are not the same:

One is ten lines long with a single conditional, clear logic, low branching, easy to reason about.
The other has nested loops, multiple switch statements, and a dozen interacting if conditions.

The first is unlikely to hide surprises. The second is exactly where subtle bugs breed, and where a change made under time pressure is most likely to introduce a defect.

Coverage sees both as "0%: needs attention", and it gives teams no basis for deciding which one comes first.

The natural result: engineers test what is easiest to test. Simple, accessible functions lift the percentage with the least effort. Complex, risky code stays untested because nothing in the tooling made the case for why it should come first. So teams end up optimising for the metric, the number climbs, however, the risk does not fall.

What the CRAP Metric Measures

Coco 7.5 introduces the CRAP metric, short for Change Risk Anti-Patterns, by combining two signals that separately tell an incomplete story.

Cyclomatic complexity (the McCabe metric) counts the number of independent execution paths through a function. A function with no branches has a complexity of 1; each additional if statement, loop, or switch case adds to that count. High complexity does not guarantee bugs, but it does mean a function is harder to understand, harder to test thoroughly, and more likely to behave unexpectedly when modified.

Test coverage measures what percentage of that function your test suite actually exercises.

Together, they produce a single score per function that reflects real-world risk, where functions scoring above 30 are flagged as high-risk. That threshold indicates a function is both structurally difficult to maintain and likely to introduce defects when changed.

Overview Sources Functions Code

coverage

Functions

Function ▲▼	MC/DC % ▲▼	McCabe ▲▼	CRAP ▼

0% coverage <30% <60% ≥60% CRAP >30 — critical CRAP 12–30

Click any column header to re-sort · Hover column names for definitions

The table above is an interactive and illustrative version of Code Coverage Browser, built from real coverage data. Functions are sorted by CRAP score by default, the riskiest ones appear at the top, colour-coded red. Click any column header to re-sort, and hover the column names to see each metric.

How It Works Inside Coco Code Coverage

In CoverageBrowser, risky functions are colour-coded: yellow for concerning scores, red for critical ones. Problem areas surface without any manual analysis or cross-referencing of separate reports.

CRAP metric

The full function list can be sorted by CRAP score, turning what would otherwise be an open-ended investigation into a prioritised queue. The functions most in need of attention appear at the top, ranked and ready to work through. For teams whose workflows extend beyond the browser, CRAP scores are available in HTML and CSV reports as well, so risk data can feed directly into CI pipelines, sprint planning, or stakeholder reviews.

Free Webinar: Move from gut-checking coverage to ranked risk assessment

Watch Live demos + Q&A

Where the Impact Is Greatest

This metric benefits any software team, but its value is most pronounced in two scenarios.

Large and legacy codebases

Large and legacy codebases are where the prioritisation gap hurts most. When a codebase has grown over years, sometimes decades, it is rarely feasible to achieve meaningful coverage everywhere at once. Teams need a principled basis for deciding where limited testing resources go first. CRAP provides a data-driven ranking of where structural risk and coverage gaps coincide, making those decisions traceable rather than instinctive.

According to Gartner research on AI-assisted code modernisation, the biggest challenge organisations face with legacy code is not the technical work of migration itself, but knowledge: understanding what the existing code does and identifying where the real risk sits before any changes are made. The CRAP metric directly addresses that gap, giving teams a data-driven answer to the question that is hardest to answer by inspection alone.

Gartner also highlights test generation as one of the most practical and immediate uses of AI for legacy modernisation, specifically, generating tests from legacy code to create a behavioural baseline before migration begins. The CRAP score makes that process more efficient: rather than generating tests indiscriminately across the entire codebase, teams can direct that effort toward the functions that score highest, where the risk of undetected defects is greatest.

Safety-critical software

Safety-critical software raises the stakes further. In automotive, avionics, medical device, and industrial automation, a defect in complex, untested code is not a bug report, it is a potential recall, a certification failure, or a safety incident. A metric that ties structural risk to coverage gaps gives teams both a practical prioritisation tool and a defensible, auditable basis for their testing strategy, something increasingly relevant as standards like ISO 26262, IEC 62304, and DO-178C demand evidence of rigour.

Stop Chasing Percentages

"We all have this problem. We have coverage data but we don't know what to do with it."

That is a more honest description of where most teams sit than the dashboards suggest. The tooling to measure coverage has existed for a long time. What has been missing is the layer that translates raw numbers into actionable risk: not how much has been tested, but whether the right things have been tested.

The CRAP metric contextualises code coverage by turning a flat percentage into a ranked list of functions that deserve attention before the next release. High coverage on simple code matters far less than coverage on the complex, critical paths where defects are most likely to hide, and most expensive to find after releasing. That distinction now has a score. And a sorted list to go with it.

References

Qt Coco 7.5 release notes and product documentation — https://doc.qt.io/coco/release-notes.html
Gartner, "Assessing GenAI for Modernizing Legacy Application Code," Matt Brasier, 16 December 2024, ID G00822567 — https://www.gartner.com/document-reader/document/6020435

Gartner content referenced in paraphrased form in accordance with Gartner's content compliance policy. Gartner does not endorse any vendor, product or service depicted in its research publications. Gartner research publications consist of the opinions of Gartner's research organisation and should not be construed as statements of fact.

If you want to learn how to work through the CRAP score list efficiently when you're managing a large codebase under time pressure, join our next live webinar.

We'll also show how AI takes your coverage data further, turning a prioritised function list into a full risk assessment your team can actually act on, sprint after sprint.

If this post made you think about how your team currently uses coverage data, it's worth a 40 minutes of your time.

Watch Now

How to Stop Writing Tests for the Wrong Functions and Start Testing with Ranked Lists by Risk

The Prioritisation Gap

What the CRAP Metric Measures

How It Works Inside Coco Code Coverage

Free Webinar: Move from gut-checking coverage to ranked risk assessment

Where the Impact Is Greatest

Large and legacy codebases

Safety-critical software

Stop Chasing Percentages

References

Sign Up for Updates

Try Qt for Free

Related Articles

6 Months from Now: Will Your Codebase Still Match What Was Specified?

The AI Revolution in Software Development