Does Your Test Suite Actually Test Anything?
You have 100% code coverage. Every line is executed. The CI badge is green. Then you refactor a condition from > to >=, and not a single test fails. Your tests ran the code, but they didn't verify it.
The Coverage Illusion
Code coverage measures execution, not verification. A test that calls a function and asserts nothing still counts as covered. A test that checks the return type but not the value still counts as covered. Coverage tells you which lines ran, it says nothing about whether your assertions would catch a bug.
This is not a theoretical problem. In Phexium, running mutation testing against code with full line coverage consistently reveals tests that execute paths without verifying outcomes.
What Mutation Testing Does
The idea is simple: inject bugs into your code and check if your tests catch them.
A mutation testing tool takes your source code, applies small changes, called mutants, and runs your test suite against each one. If a test fails, the mutant is "killed." If all tests pass, the mutant "survived," meaning your tests don't actually verify that behavior.
Common mutations include flipping > to >=, replacing && with ||, swapping + with -, changing return true to return false. Each one simulates a real bug that a developer could introduce.
Phexium uses Pest's built-in --mutate flag:
This runs the full suite with --parallel --everything --covered-only, meaning it mutates every covered line and checks mutations in parallel.
Real Examples: What Survived
Here are actual cases from Phexium where mutation testing exposed gaps that line coverage missed.
Boundary Condition in Pagination
The original Pagination::getOffset() had an early return for non-positive perPage:
public function getOffset(): int
{
if ($this->perPage <= 0) {
return 0;
}
return ($this->page - 1) * $this->perPage;
}
A mutant changed <= to <, and tests still passed, no test checked perPage = 0 against perPage = -1. The fix was twofold: simplify the code to eliminate the branch, and add boundary tests:
it('returns zero offset when perPage is negative', function (): void {
$pagination = new Pagination(totalCount: 10, page: 2, perPage: -1);
expect($pagination->getOffset())->toBe(0);
});
it('computes offset correctly when perPage is one', function (): void {
$pagination = new Pagination(totalCount: 10, page: 2, perPage: 1);
expect($pagination->getOffset())->toBe(1);
});
The max() approach is immune to the original mutation because there is no comparison operator left to flip.
Ternary Elimination in OffsetClock
OffsetClock::now() branched on whether the offset was positive or negative:
return $this->offsetSeconds >= 0
? $base->add(new DateInterval(sprintf('PT%dS', $this->offsetSeconds)))
: $base->sub(new DateInterval(sprintf('PT%dS', abs($this->offsetSeconds))));
A mutant flipped >= to >, changing behavior when offsetSeconds is exactly zero. Tests passed because zero offset produces the same result either way, an equivalent mutation that revealed unnecessary complexity. The fix:
One line. No branch. No mutation target.
Dead Code in QueryCacheKeyGenerator
The cache key generator had 30 lines of manual serialization: extractProperties(), normalizeValue(), recursive handling of nested objects. Mutation testing found four surviving mutants in that code, conditions and branches that tests couldn't distinguish because the logic was too tangled.
The fix was to replace it entirely with serialize():
public static function generate(QueryInterface $query): string
{
$className = $query::class;
$hash = hash('xxh128', serialize($query));
return sprintf('query:%s:%s', $className, $hash);
}
From 30+ lines to 3. Every mutation in the new code is caught because the surface area is minimal. PHP's native serialize() handles private properties, nested objects, and deterministic ordering, all the edge cases the manual code was trying to solve.
Weak Assertions in RedisCache
The RedisCache TTL conversion used (int) casts that mutation testing could swap without breaking tests:
A mutant that removed the cast still produced the same result in most test scenarios. The fix used intval() for explicit conversion, and tests were strengthened to verify millisecond-level precision:
it('converts integer TTL from seconds to milliseconds', function (): void {
$this->cache->set('key', 'value', 3600);
$pttl = $this->redis->pttl('key');
expect($pttl)->toBeGreaterThan(3_599_000)
->toBeLessThanOrEqual(3_600_000);
});
The test now verifies the actual TTL stored in Redis at millisecond granularity. A mutant that changes the multiplication factor gets caught.
The Pattern
Across these cases, mutation testing did two things:
Exposed weak tests. Assertions that checked "it doesn't crash" instead of "it produces the right value." Adding precise assertions killed the mutants.
Simplified code. Several surviving mutants pointed to unnecessary branches, redundant conditions, or overly complex logic. The simplest fix was often to reduce the code, not to add more tests. Fewer branches means fewer mutation targets.
Trade-offs
Mutation testing is expensive. It runs your test suite once per mutation, potentially hundreds or thousands of times. Phexium disables it in CI (GitLab free tier has limited minutes) and runs it locally before releases.
Not every surviving mutant needs a fix. Equivalent mutations, where the mutated code produces identical behavior, are noise. Infrastructure code and simple getters have lower mutation testing value than domain logic and business rules.
Phexium focuses mutation testing on domain and application layers, where the cost of an undetected bug is highest.
What This Gets You
A mutation score is a measure of test quality, not test quantity. It answers the question that coverage cannot: if someone introduces a bug in this code, will a test catch it?
In Phexium, mutation testing is not an afterthought. It drives refactoring decisions, simplifies code, and keeps assertion quality honest. The nine mutation-driven refactorings in the recent commit history each made the codebase smaller and the tests more precise.
The goal is not 100% mutation score everywhere. It is 100% confidence that your domain logic is verified, and an honest assessment of where it is not.