30 DAYS OF AI IN TESTING – DAYS 11-15

Published on March 15, 2024

Currently doing the 30 Days of AI in Testing offered by the excellent Ministry of Testing.

https://www.ministryoftesting.com/events/30-days-of-ai-in-testing

This blog is a record of Days 11 to 15, rather than having them just on the Club. Its quite a lot of writing and I like to have my writing in one place where possible.

Day 11

Quick one today.

A little while ago I asked ChatGPT to generate me a bash script to generate contacts.vcf files to help with some device testing:

#!/bin/bash

# Function to generate a random 10-digit number
generate_phone_number() {
    echo $((7000000000 + RANDOM % 1000000000))
}

# Function to select a random item from a list
select_random_item() {
    local list=("$@")
    local num_items=${#list[@]}
    local random_index=$((RANDOM % num_items))
    echo "${list[random_index]}"
}

# List of first names
first_names=("Emma" "Liam" "Olivia" "Noah" "Ava" "William" "Sophia" "James" "Isabella" "Oliver" "Charlotte" "Benjamin" "Amelia" "Elijah" "Mia" "Lucas" "Harper" "Mason" "Evelyn" "Logan" "Abigail" "Alexander" "Emily" "Ethan" "Elizabeth" "Michael" "Avery" "Daniel" "Sofia")

# List of last names
last_names=("Smith" "Johnson" "Williams" "Jones" "Brown" "Davis" "Miller" "Wilson" "Moore" "Taylor" "Anderson" "Thomas" "Jackson" "White" "Harris" "Martin" "Thompson" "Garcia" "Martinez" "Robinson" "Clark" "Rodriguez" "Lewis" "Lee" "Walker" "Hall" "Allen" "Young" "Hernandez")

# Function to generate the contacts
generate_contacts() {
    local num_contacts=$1
    for ((i=1; i<=$num_contacts; i++)); do
        echo "BEGIN:VCARD"
        echo "VERSION:3.0"
        first_name=$(select_random_item "${first_names[@]}")
        last_name=$(select_random_item "${last_names[@]}")
        echo "FN:$first_name $last_name"
        echo "N:$last_name;$first_name;;;"
        echo "EMAIL;TYPE=INTERNET;TYPE=HOME:$[email protected]"
        echo "TEL;TYPE=CELL:$(generate_phone_number)"
        echo "END:VCARD"
    done
}

# Main script
num_contacts=$1
if [[ ! $num_contacts =~ ^[0-9]+$ ]]; then
    echo "Usage: $0 "
    exit 1
fi

generate_contacts $num_contacts > contacts.vcf
echo "Generated $num_contacts contacts in contacts.vcf"

There were quite a few revisions, while I arrived at the right prompt. I wish I had done some research into prompt engineering. Although in fainess to me, I added an example of the VCF file format. 🙂

In terms of evaluation…

How easy was it to generate the data?

As long as you know how to make bash scripts executable, it would be very easy. There was some assumed knowledge though, but I liked that it was configurable in terms of number of contacts and produced a file that could be shared to a device for testing.

How flexible is the data generation?

You could only request low numbers of contacts if you wanted many unique combinations of names. It was for a small job though, so I was willing to sacrifice flexibility for expediency.

c. Did the generated data meet your needs? Was it realistic?

At first we went with just random strings which is fine but is poor quality test data, you can miss issues like sorting. I suppose its the limitations of bash scripts, could have done it in faker or something similar.

Day 12

Research AI Risks: Find and read an introductory article on AI Risks and problems

I will go with this: ‘The 15 Biggest Risks Of Artificial Intelligence 1 – Forbes, Bernard Marr’

  • Lack of transparency – it is hard to see what a model has been trained on and how it made its decisions, which leads to a lack of trust.
  • Bias and discrimination – models can perpetuate ot amplify societal biases.
  • Privacy concerns – can collect and analyse large amounts of data of a personal nature.
  • Ethical dilemmas – AI systems need moral and ehtical values to guide them in decision making.
  • Security – hackers or malicious actors could developer more advanced cyber attacks
  • Concentration of power – AI development dominated by a small number of corporations could exacerbate inequality and limit diversity.
  • Dependence – overreliance on AI may lead to a loss of creativity and critical thinking skills.
  • Job displacement – potential job losses across industries, although I’m not sure about the ‘low paid’ statement here.
  • Economic inequality – consequence of the concentration of power with a growing income gap and even less social mobility.
  • Legal and regulatory challenges – laws and regulations don’t keep pace with changing technologies.
  • AI Arms Race – rapid technological development without considering the consequences.
  • Loss of human connection – humans will suffer from diminished social skills.
  • Misinformation and manipulation – AI generated content such as deepfakes influencing public opinion and elections.
  • Unintended consequences – as a result of the lack of transparency and naively trusting AI decision making.
  • Existential risks – AI (especially as it gets closer to AGI) may not be aligned with human values or priorities.

Consider the role of AI in Testing: for your testing context, the ways that AI could be used

  • Identify which AI Risks might impact the quality of testing in your context
    • Lack of transparency – our oracles become less transparent and only accesible via the right prompt and only in part.
    • Bias and discrimation – development teams have biases, but using an AI system to aid testing might reinforce that, missing out accessibility needs for example.
    • Dependence – if its easier to ask a model what to test, why would you go through the hassle?
    • Loss of human connection – teams are hard to form and testing depends a lot on communication.
    • Unintended consequences – this can happen anyway but combine the above and even more surprises might occur!
  • Examine how one or more of these AI Risks might impact your testing
    • Dependence could lead to not thinking beyond the acceptance criteria, about edge cases, what might go wrong, how users might subvert the functionality.
  • Think about how you might safeguard against these risks becoming issues in your context?
    • I think I would generate a guide for the model I was using to make sure it had context, plus examples via generating some ideas independent of the model.

Day 13

The As-Is: Consider your team’s current testing practices, how work flows from feature to delivery, and the role of testing in that flow.

Test Design

Of the areas listed in the challenge, I think test design is our biggest challenge. I try and assist developers to do exploratory testing, but often the tests are limited to the acceptance criteria, rather than searching for hidden requirements and looking at how changes in one place can affect another.

Where does AI add value?

My ideal would be:

  • Create an internal model from open source, one from hugging face perhaps.
  • Train it on a few things:
    • Our internal model for how the app, backend and web systems work (we have diagrams, specs etc)
    • Jira tickets and associated comments
    • Test results from our CI
    • Commit history in Github for hotspots of change.
  • Then we can ask questions of what to cover testing wise.
  • I would also add some conditions on top to filter the model’s answer, to fit with how and what we want to test. Plus how much depth, with a few previous examples to guide it.

Become an AI in Testing champion: Imagine you need to convince your peers, manager or company to invest in AI in Testing. Based on your ideas from the previous tasks, create a visual or short report that outlines your approach.

I would try and convince people with:

  • Time saved generating tests and designing them.
  • Better able to find more important problems first.
  • The whole team can test to a consistent set of scenarios.

And its a very cool thing to do of course. 🙂

Day 14

Post Bot

Just used a simple get address endpoint that I had been using, which already had a couple of tests. Response codes and array of results.

Add more tests

pm.test("Response Content-Type is application/json", function () {
    pm.expect(pm.response.headers.get("Content-Type")).to.include("application/json");
});


pm.test("Id in the results is a non-empty string", function () {
  const responseData = pm.response.json();
  pm.expect(responseData.results).to.be.an('array');

  responseData.results.forEach(function(result) {
    pm.expect(result.id).to.be.a('string').and.to.have.lengthOf.at.least(1, "Id should not be empty");
  });
});


pm.test("Point object contains valid latitude and longitude values", function () {
  const responseData = pm.response.json();

  pm.expect(responseData.results[0].point).to.be.an('object');
  pm.expect(responseData.results[0].point.latitude).to.be.a('number');
  pm.expect(responseData.results[0].point.longitude).to.be.a('number');
});


pm.test("LocationMeta object contains expected fields", function () {
    const responseData = pm.response.json();

    pm.expect(responseData.results[0].locationMeta).to.be.an('object');
    pm.expect(responseData.results[0].locationMeta).to.include.all.keys('w3w', 'address', 'street_formatted_address');
});

I like that PostBot built on the existing tests, although it repeated checking if an array of results was returned. When I ran ‘Fix tests’ it removed this repetition.

It generated contract style tests rather than values, which are more repeatable, but perhaps less specific depending on what I was testing for. I could have been more specific in the prompt, but I wanted to see what PostBot returned.

Save a field from response

var placeName = pm.response.json().results[0].placeName;
pm.globals.set("placeName", placeName);

I’m guessing it was the first field after the id, so disregarded the id. I guess this is reasonable, but it wasn’t transparent that this is what had occurred.

Fix tests

The first time I ran Fix tests, it tidied up the formatting and removed repetition. It left me wondering if Add more tests and fix tests have different prompts behind them. It also added more keys to the LocationMeta test. Not sure why it didn’t add them in the first place/

I then changed this test in a couple of ways:

pm.test("Response Content-Type is application/json", function () {
    pm.expect(pm.response.headers.get("Content-Type")).to.include("application/octet-stream");
});

pm.test("Response Content-Type is application/octet-stream", function () {
    pm.expect(pm.response.headers.get("Content-Type")).to.include("application/json");
});

How did it know which was right? Perhaps the response was wrong, or the test name or the assert? It corrected each instance of the test to use application/json. Again, lack of transparency, as to what oracle was used for the truth. I suspect the response.

Generate Documentation

This endpoint makes an HTTP GET request to retrieve location information based on the provided query parameters. The response will contain details about the location, including its ID, name, formatted address, latitude, longitude, and additional location metadata.
The response will have a status code of 200, and the content type will be in JSON format.
The response body will include an array of "results", where each result object will contain the location's ID, place name, formatted address, latitude, longitude, and location metadata, including what3words address and structured address details.

A reasonable description, can’t see anything too obvious missed.

PostBot didn’t show what it was doing or why, so you could end up with a bunch of tests that you possibly don’t understand. However, with better prompts rather than using the buttons I think it could provide better outcomes. I would need more time to investigate.

Day 15

Currently, I use AI in my testing in 3 ways:

  • Generating utility scripts for bespoke test data, usually throwaway.
  • Putting together tool recommendations, comparing options, surfacing ways of doing things I don’t know.
  • Coming up with further test design ideas, based on a list I have generated.

However, in all of these things, I have to sanitise prompts somewhat so not to risk exposing sensitive information.

I have been looking at GPT’s by ChatGPT: https://openai.com/blog/introducing-gpts

You can opt out (apparently) from using the data from your chats for overall Chat GPT training. Alternatively, deploying a model locally and training it using select internal data.

There seems to be a few ways to do this, so will start to experiment with them:

https://www.infoworld.com/article/3705035/5-easy-ways-to-run-an-llm-locally.html

I personally need more depth in the area, beyond naively using tools, maybe giving them the right information maybe not. However, the prompt engineering part of 30 Days of Testing in AI has been really, really useful. I can already see the benefits in output from the models I do use.