
AI meets the Browser: What’s next for software testing?

Some time ago I found a video called ‘1000 AI NPCs simulate a CIVILIZATION in Minecraft’ that caught my attention.
This experiment highlights the potential of AI in creating and managing complex virtual societies, offering insights that could be applied to various fields, including urban planning, economic modelling, and social science research.
This got me thinking about an interesting idea like “simulating an environment for an application where AI agents can test it”. I found some tools that allow the creation of a local environment for cloud applications (e.g. Colima, Kind and Metalb). However, I was specifically interested in simulating a world where AI tests a web application.
An idea popped into my mind about how to integrate Playwright and OpenAI. Imagine that AI not only opens the page but also analyzes the DOM in real-time, corrects its own errors (for example, if an element is not found), and depending on the results chooses alternative test steps.
The first thing that came to mind was that OpenAI needs to be able to control the browser at runtime. I remembered that there is a REPL in Node. Of course, to avoid wasting a lot of time myself and to perform a quick analysis, I talked to ChatGPT and got (here is a simple piece of code from an example where AI was able to open the needed page):
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
global.page = page;
rl.on('line', async (input) => {
if (input.startsWith('ai:')) {
const userQuery = input.replace('ai:', '').trim();
try {
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{
role: 'user',
content: `
Generate a valid Playwright command for the following user request:
"${userQuery}"
The command must assume that the 'page' object is already defined and initialized in the Playwright context.
Only return the JavaScript command itself, without additional explanations or formatting like code blocks.
Ensure the command can be executed directly inside an async function.`,
},
],
});
A simple piece of code from an example where AI was able to open the needed page. I thought “now I am a damn AI engineer.”
I knew that everything that came to my mind was readily implemented, so while browsing https://github.com/dashboard, I came across the repository https://github.com/browser-use/browser-use.
I will leave a short video showing how it works and which explains better than I do. Just a simple prompt:
“Go to the site https://www.saucedemo.com/ and validate that any product can be added to the shopping cart. Credentials are on the login page”
Demo of the prompt: https://youtu.be/H3qw4lXEw7c
The fundamental concept behind browser-use is giving an LLM such as GPT-4 the ability to control a real browser (e.g., Chromium) in real time with Playwright! help. Effectively, this creates a loop where the LLM can “see” and “respond to” what’s happening inside the browser, allowing it to iterate on tasks just like a human would.
It brought up a lot of additional questions:
- How many requests were made to the OpenAI API?
- How much does running one test cost and is it profitable now?
- How long does it take?
- What are the limitations?
- How to solve a problem of non-determinism?
- etc.
But it will be a next iteration of experiments.
Why could this be useful for testing?
This opens up the possibility of creating autonomous testing agents that not only “open pages” or “click buttons” but also analyze the context of the page, formulate hypotheses about potential errors and independently suggest new scenarios for verification.
Especially interesting is that such AI agents can not only click on elements but also perform scenarios related to security (for example, attempts of SQL injections or XSS) providing a more intelligent approach to pentest research.
Final Vision
This is not ready to substitute traditional test automation approaches.
Ultimately, we are progressing towards testing that gradually becomes a continuous mental process, where AI not only performs the manual tasks but also assists in decision-making. And this is no longer the distant future - it’s happening right now.
Alternatives
Additionally, there is a new project by Google DeepMind called https://deepmind.google/technologies/project-mariner/
Look at the presentation: https://www.youtube.com/watch?v=2XJqLPqHtyo
Anthropic
Claude ‘Computer use’ https://www.youtube.com/watch?v=ODaHJzOyVCQ
Other
WebVoyager https://github.com/MinorJerry/WebVoyager and paper https://arxiv.org/abs/2401.13919
Ferret-UI for mobile (paper https://arxiv.org/abs/2404.05719)