
Considering Generative AI in testing
Considering Generative AI in testing
IT’S ANOTHER GEN AI POST!!!
Whether it’s low code test automation tools or using ChatGPT to create software code, Generative AI is the topic of the moment.

With everyone talking about generative AI and it’s impacts on the industry I wanted to outline how I think it’ll impact testing, based on the risks that using Gen AI will bring. Knowing the risks that using these new tools bring in will allow us to be informed and think about the types of testing needed during their integration into our teams.
Not code how you’d write it
Generative AI scrapes the internet to create code based on the prompts you give it, meaning that you’re going to inherit the styles and quirks of other developers / code writers. There’s a risk that we won’t be able to understand what’s written for us making it LESS MAINTAINABLE AND INHERITABLE for us and our teams. There can also be a risk that WE JUST MAKE ASSUMPTIONS THAT IT’S DOING WHAT WE WANT IT TO and don’t validate that (because it’s hard to work out what the code is doing).
This can impact both test code as well as product code generated and can be combatted with thorough static analysis and refactoring of the code to make it understandable.
Inconsistent coding habits & styles
Like with the above, the styles and habits of other coders will be scraped and used by Generative AI. But this means that each time you prompt for code you might get different ways of doing things or writing code pulled back and presented for you. We may see the same problems when it comes to MAINTAINABILITY AND INHERITABILITY where a lack of consistency makes it harder to spot problems. There may also be an issue with OWNERSHIP OF CODE where it won’t be obvious who on the team’s code this is, so we don’t know who to ask about it. There’s also a minimal risk of INTEGRATION FOR CODE where different coding styles don’t work well together, function behaviour contradicting or conflicting with other functions.
Again, more static analysis or refactoring of code will be needed along with code based tests of function behaviour to ensure understandability and correctness.
Larger maintenance overhead
It might be a lot harder to make generated code play well with existing code, so we may end up having to replace code rather than add to it. This means we may lose existing code tests and have a higher risk of CODE BEHAVING IN A WAY WE DON’T WANT. Replacing existing code will also mean having to retest for existing functionality from scratch and INCREASE THE AMOUNT OF TIME NEEDED FOR TESTING, especially if we lose our close to the code tests.
A lack of understanding the code (or generated tests in the code) will mean more time spent on static analysis each time we want to add to the code base. one way to combat this is to have a separate repo of tests so that changes to the code don’t impact the tests. Testers will also have to manually review generated tests to ensure they are meaningful to the code base as it is now.
Works like an intern (it’s basic)
Your Generative AI created code for software or automated tests will likely be at the level of an intern. You’ll get what you ask for specifically, but nuance or edge cases will not be accounted for. This means there’s the risk of POOR FUNCTIONAL BEHAVIOUR and also the risk of TESTS NOT FINDING DEEPER ISSUES. there’s also the risk that any uncertainty will result in assumptions being made that ARE NOT WHAT YOU HAVE INTENDED, leading to poor functional behaviours.
The way to combat this is to ensure that exploratory testing and human testing still takes place; testers will have to not only review tests that have been written by still pick up and cover the functionality GenAI has picked up for accuracy. We will also have to get good at reducing uncertainty and documenting all and every requirement to feed into the GenAI in a way that leaves no room for assumptions being made (don’t trust that it’ll have common sense and know a failure should result in an error).
Not having a model for what’s tested
As you may not have written the tests long form, instead using a prompt, you won’t have a mental model of testing coverage. We may also (if we over rely on low or no code testing tools) not have a good understanding of how the code / software behaves. This means risks to the MAINTAINABILITY OF THE SOFTWARE as well as the potential for risks that EDGE CASES AND FUNCTIONALITY WILL BE MISSED OR TESTED INCORRECTLY.
To combat these risks, we will need to have a plan to explore code and functionality to gain an understanding of what it does. This understanding will then be able to feed into the automated tests that have been generated and support knowing what to test and how it should be tested.
Pulling Bad dependencies means tech debt
There is a risk that GenAI will SCRAPE AND PULL OBSELETE LIBRARIES meaning that your code won’t be supported or may not work. Attempts to update the code that we’ve had created may fail, or if the GenAI has pulled libraries that change (that we don’t know about) there’s a risk that OUR CODE STOPS WORKING ONE DAY AND WE DON’T KNOW WHY. There is a risk that libraries used by GenAI may become poisoned introducing SECURITY VULNERABILITIES through introducing backdoors or malicious code that we have to diagnose and repair.
We have to ensure that teams are making analysis of what libraries have been used and document these in generated code so that we can check their validity. Static analysis of all generated code will also be needed to test for vulnerabilities and security flaws (including penetration testing of any features created by GenAI).

Untrained AI making unintuitive decisions
The AIs being used may make bad decisions about what code to create, what to test or how to test things. If the code is unintuitive we might not be able to reverse engineer the assumptions that the AI has made from the code. This lends us with the risks of NOT BEING ABLE TO UNDERSTAND OR MAINTAIN OUR CODE AND TESTS as well as the risk of HAVING TESTS THAT ARE NOT USEFUL TO US or that MISS THE POINT OF WHAT IS NEEDED.
As well as static analysis and refactoring of generated code, to make it more meaningful and look for bad decisions we also need to start adding more verbose and meaningful comments to code. AI needs to be taught to document the code it creates and the way we do that is by supplying it with better examples of this, every function and every test need really meaningful comments; this will train AI to do the same and allow us to diagnose for bugs better.
A bias for this skill when it comes to hiring?
There’s a risk that teams will start to over index on hiring people that can write genAI prompts over and above testing or coding skills. We could end up with teams of engineers that don’t know enough about software development to be able to diagnose and rectify issues that come from low code development. This means a risk of TECH DEBT THAT WE CANNOT FIX or major UNDERLYING ARCHITECTUAL ISSUES that teams just don’t know what to deal with.
Testers will need to challenge assumptions or support removing uncertainty from teams to allow for GenAI prompts to be created. Testers will also be big in supporting review of code that’s been generated for meaning, maintainability and underlying engineering flaws. This may also mean that senior engineers will be more likely to be testers as they can support strategies for dealing with generated code or it may mean having to prepare for (and advocating for) rebuilding everything from scratch.
We may also have to help educate the industry on the risks that come from over indexing on this skill at the expense of engineering and testing craftsmanship. Organisations may want to focus on hiring cheaper engineers who can use low code solutions and not realise the risks inherent in that.
If you’re interested in Generative AI, I wrote some additional pieces on using Gen AI to write a test approach, write exploratory tests and teach me technical concepts.