A Short Digression on Test Data

Published on May 17, 2024

Lately I’ve been thinking about test automation and test data. I’ve come to a rather philosophical opinion.

Test data is always the problem, and test data is never the problem.

Test data is always the problem: Reading posts on generating test data for unit tests but also how to wrangle test data for various environments shows a lot of conflicting thoughts. Some say to never use random test data, some say they love it. Most people agree having “realistic” testing data is helpful but often for security or practical reasons, user data can’t be directly imported and used for testing internally. I think in part there’s so much conflicting experience because test data is unique to each team working with. Trying to use testing data to test an online banking application is nothing like data needed to test a mobile video game, which are both nothing like needing test data for enterprise accounting software. Throw in the fact that automation and manual approaches also have different needs and you’ll see even more confusion.

Test data is unique to every team’s problem. Even similar domains or applications may have subtle differences. Teams need to work things out themselves eventually.

Of course there’s more to testing than simply data management, which leads to the second part of this, that test data is never the problem.

One of the biggest challenges I’ve had as a test automation specialist is getting people to understand what test automation is and why it’s valuable. Even when I can achieve this and get support, test writing and strategy are still tricky sometimes. Getting developers - and their product managers! - to test and to test well are often bigger problems than whether we have good test data or not. In the words of wonderful Gerald Weinberg, it’s always a people problem.

All problems with testing are due to problems of test data, until they aren’t.