AI meets the Browser: What’s next for software testing?

Published on January 10, 2025

Some time ago I found a video called ‘1000 AI NPCs simulate a CIVILIZATION in Minecraft’ that caught my attention.

This experiment highlights the potential of AI in creating and managing complex virtual societies, offering insights that could be applied to various fields, including urban planning, economic modelling, and social science research.

This got me thinking about an interesting idea like “simulating an environment for an application where AI agents can test it”. I found some tools that allow the creation of a local environment for cloud applications (e.g. Colima, Kind and Metalb). However, I was specifically interested in simulating a world where AI tests a web application.

An idea popped into my mind about how to integrate Playwright and OpenAI. Imagine that AI not only opens the page but also analyzes the DOM in real-time, corrects its own errors (for example, if an element is not found), and depending on the results chooses alternative test steps.

The first thing that came to mind was that OpenAI needs to be able to control the browser at runtime. I remembered that there is a REPL in Node. Of course, to avoid wasting a lot of time myself and to perform a quick analysis, I talked to ChatGPT and got (here is a simple piece of code from an example where AI was able to open the needed page):

 const browser = await chromium.launch({ headless: false });
    const page = await browser.newPage();
    global.page = page;

    rl.on('line', async (input) => {
        if (input.startsWith('ai:')) {
            const userQuery = input.replace('ai:', '').trim();
            try {
                const response = await openai.chat.completions.create({
                    model: 'gpt-4',
                    messages: [
                        {
                            role: 'user',
                            content: `
                                    Generate a valid Playwright command for the following user request:
                                    "${userQuery}"

                                    The command must assume that the 'page' object is already defined and initialized in the Playwright context. 
                                    Only return the JavaScript command itself, without additional explanations or formatting like code blocks. 
                                    Ensure the command can be executed directly inside an async function.`,
                        },
                    ],
                });

A simple piece of code from an example where AI was able to open the needed page. I thought “now I am a damn AI engineer.”

I knew that everything that came to my mind was readily implemented, so while browsing https://github.com/dashboard, I came across the repository https://github.com/browser-use/browser-use.

I will leave a short video showing how it works and which explains better than I do. Just a simple prompt:

“Go to the site https://www.saucedemo.com/ and validate that any product can be added to the shopping cart. Credentials are on the login page”

Demo of the prompt: https://youtu.be/H3qw4lXEw7c

The fundamental concept behind browser-use is giving an LLM such as GPT-4 the ability to control a real browser (e.g., Chromium) in real time with Playwright! help. Effectively, this creates a loop where the LLM can “see” and “respond to” what’s happening inside the browser, allowing it to iterate on tasks just like a human would.

It brought up a lot of additional questions:

How many requests were made to the OpenAI API?
How much does running one test cost and is it profitable now?
How long does it take?
What are the limitations?
How to solve a problem of non-determinism?
etc.

But it will be a next iteration of experiments.

Why could this be useful for testing?

This opens up the possibility of creating autonomous testing agents that not only “open pages” or “click buttons” but also analyze the context of the page, formulate hypotheses about potential errors and independently suggest new scenarios for verification.

Especially interesting is that such AI agents can not only click on elements but also perform scenarios related to security (for example, attempts of SQL injections or XSS) providing a more intelligent approach to pentest research.

Final Vision

This is not ready to substitute traditional test automation approaches.

Ultimately, we are progressing towards testing that gradually becomes a continuous mental process, where AI not only performs the manual tasks but also assists in decision-making. And this is no longer the distant future - it’s happening right now.

Alternatives

Google

Additionally, there is a new project by Google DeepMind called https://deepmind.google/technologies/project-mariner/

Look at the presentation: https://www.youtube.com/watch?v=2XJqLPqHtyo

Anthropic

Claude ‘Computer use’ https://www.youtube.com/watch?v=ODaHJzOyVCQ

Other

WebVoyager https://github.com/MinorJerry/WebVoyager and paper https://arxiv.org/abs/2401.13919

Ferret-UI for mobile (paper https://arxiv.org/abs/2404.05719)

Continue reading on website

Other news

🌸 Spring bingo - Wellness challenge - Halfway! 🌸

April 15, 2025

Hey Hivebriters! Quick check-in on our April Wellness Challenge - Spring Bingo! We're halfway through the month, and it's the perfect time to jump in if you haven't started yet (or keep going if you have)! Quick Reminders:Complete rows or columns for 5 raffle entries eachSquares with 📷 require photo submissions in the commentsSubmit completed rows/columns through the form by April 30thBonus entri