Combinatorial Testing Rant

Published on May 8, 2025
Photo by Micah Williams on Unsplash

Reindeer plummeting to their death

Those who have worked with me have probably heard this before: don’t test unfalsifiable conditions. And yet, time and again, I find myself explaining why someone’s overengineered set of black-box combinatorial tests — usually numbering more than ten — isn’t doing what they think it is.

These test suites often appear after a high-visibility bug makes it into production. The team scrambles to respond. The fix gets merged. And then comes the flood of black-box tests attempting to cover every possible permutation of inputs, flags, and configurations.

When I ask why, the answer is almost always the same: “We want to make sure this bug never happens again.”

To which I reply, often verbatim: “How many reindeer do you have to push off a roof and watch plummet to their death before you conclude that reindeer don’t fly?”

That usually stuns people. So I clarify: “I only need one. I think you only need one. You seem to need to drive them to extinction.”

You don’t prove the absence of a bug by throwing more and more random combinations at it. You prove that a bug has been addressed by writing a targeted, falsifiable test that fails before the fix and passes afterward. That’s how you ensure the bug is understood and genuinely resolved. Not with a black-box regression.

Even if you write a hundred tests, you’re only demonstrating that a hundred specific inputs didn’t break. That doesn’t mean future inputs won’t. Worse, it can give a false sense of security.

This is where white-box testing shines. If you understand the bug, you can often write a test that reaches into the logic and confirms it’s behaving correctly under the exact conditions that caused the failure. In the reindeer metaphor: instead of watching them fall, you dissect one and confirm it lacks wings. Now you’ve got science.

Sometimes, black-box tests sneak in a white-box check — like hitting an API and then verifying the database directly. But if your black-box test ends with peeking under the hood, it raises the question: why were you wearing the blindfold in the first place?

I’ve softened my stance a little over the years — especially now that we can run black-box test suites in parallel without killing CI times. But the core principle remains: write tests that teach you something when they fail. That means they must be falsifiable, grounded in the actual logic of the code, and targeted to specific, known behaviors.

Don’t try to exterminate bugs with test volume. Understand the bug. Write one great test. Then move on.