It’s Not Your Tests, It’s Your Testability

Published on October 19, 2025

Let’s talk about that test. The one that’s always flaky. The one that takes twenty minutes to run and fails for a different reason every time. Your first instinct is to blame the test. Maybe the locator is wrong, maybe the wait time isn’t long enough.

But what if I told you it’s not the test’s fault? The real problem can be a lack of testability.

You know the feeling. You need to test a new AI-powered recommendation feature. But first, your script has to perform a slow, brittle ballet of UI interactions: Log in, create a user, navigate three screens, add two items to a cart… all just to get to the starting line. It can be the same for calling APIs just to get the system in the state you want. That’s a fundamental lack of operability.

Now, let’s say you get to that point. You finally use that feature. Where do you see the actual result? If it’s an internal operation, often it’s buried in a mountain of logs, not directly accessible to your test. You’re forced to become an archaeologist just to observe the most basic output of the system. That’s a classic lack of observability.

Now, with the internal black-box nature of AI, these two are even more worse.

Operability and observability have always been problems for testers, but the rise of expensive, non-deterministic AI has turned them from a daily annoyance into a critical pipeline bottleneck. We can’t afford this anymore. Let’s stop blaming our tests and start fixing our testability. In this article, we’ll talk about how.

The 5-Minute Conversation That Tames the Chaos

So, how do we start fixing our testability? Testability is really in the realm of the developers. We start with a conversation with them, that helps us both understand and control the chaos in our system.

Let’s be clear: this isn’t about telling developers how to do their job. It’s about showing, with a concrete example, how often a small architectural change can make our tests for the feature dramatically more reliable.

This talk usually is needed when we’re testing an AI feature. Let’s go through a regular scenario.

Step 1: Identify the Source of Unpredictability

First, as part of testing a feature, you (and hopefully your dev collaborator) identify the core function that makes the call to the AI. “The problem is here!”, you shout. It’s often a “god method”—it does everything: it builds a prompt, makes the unpredictable call to the AI, and processes the response, all in one tightly-coupled block.

Something like this:

# The God Method (Hard to Control)
def feature_call_ai_and_analyze(self, user_request):
    # ... prompt engineering ...
    response_text = self._generate_content(prompt) # The source of the chaos
    # ... parsing and other logic tangled here ...
    return final_result_for_feature

Step 2: Frame the Problem – The Need to Control the Chaos

In this case, our job is to validate how the feature behaves with different kinds of AI responses. The core problem is that the live AI’s response is unpredictable. And the prompt is constructed inside that god method. Our lack of direct control makes it impossible to reliably test different prompts.

Our tests feel flaky because we don’t control the chaos in the system. This is a classic problem of poor operability (we can’t control the AI dependency) and poor observability (it’s hard to see what the actual response is).

If only we could make them more controllable…

Step 3: Propose the Humble Seam

Now, you and your developer have the conversation about solving the problem.

And, you don’t say, “Your code is untestable.”

You say, “Hey, I’m working on the tests for the X feature, and I’m trying to figure out how to reliably test what happens when the AI returns weird data. What if we added a ‘seam’ where I could inject a mocked response, bypassing the real AI call during my tests?”.

Or…

“Can we add a seam where we can inject different prompts, bypassing the rest of the system, and see what happens?”

This is a concept we’re all familiar with. It’s as easy like putting prompts in an external file instead of hard-coding them. It gives us a control point.

Step 4: Everyone Wins

Then, you explain the benefit. “If we have this seam, I can write a suite of reliable integrated tests for the entire X feature’s behavior under all kinds of conditions. You, developer, you’ll get extended feedback on how the feature handles bad data from the AI. Our CI/CD pipeline will be more stable because we’re not relying on the unpredictable network call for 90% of our checks. It will help us all understand and control the chaos in the system.”

That’s the conversation. It’s a collaborative, engineering-focused proposal that makes life better for everyone. It’s the first and most powerful tool in your new AI testing toolbox.

The Next Step: A New Collaboration

So, you had the conversation, your developer added the seam, and you now have a way to control the chaos. At least part of it. (By the way – this is not just AI related, it works like magic for whatever you want to control. )

This is a huge win, not just for you, but for the entire team. It opens up a new opportunity for a deeper, more effective collaboration between testers and developers.

This is where you can start bringing gifts to the partnership. Instead of just filing bugs, you can help bring more information to the developers. An actual collaboraion, who knew?

Speaking of gifts…

Here’s one for you.

The AI Test Engineer’s Prompt Pack I’ve created is designed to be a collaborative toolkit. It’s a set of expertly crafted prompts that you and your developer can use together to:

  • Quickly generate the scaffolding and plumbing tests to ensure the core logic and error handling are solid.
  • Refactor code for testability to make code easier to introduce seams.
  • Build the first line of validation tests that check the live AI responses for structure and sanity.

This isn’t just about making your job easier; it’s about building a stabler, more testable, and more reliable system, together. It’s the practical starting point for a new, more powerful partnership.

Conclusion

We’re at the start of a new era in software quality. The rise of AI is forcing us to evolve, to move beyond our traditional roles and embrace a more collaborative, engineering-focused mindset.

It’s a big shift, but it’s also a huge opportunity. By championing testability and bringing new tools to the table, we’re not just finding bugs anymore; we’re helping to build the resilient, high-quality systems of the future.

If you’re ready to take the first step in that journey, the Prompt Pack is for you.

Download The AI Test Engineer Prompt Pack Now

The post It’s Not Your Tests, It’s Your Testability first appeared on TestinGil.