
Quality Coaching Scenario: This code won’t take tests
Quality Coaching Scenario: This code won’t take tests
In this post I’ll take a hypothetical testing scenario and then set out the thinking I’d take around it to support the organisation as a quality coach. The aim here is to help people to see the thinking that I’d undertake as a way of helping to shape other people’s thinking and provide some materials for teaching.
The Scenario: A Legacy application that cannot take unit tests
A hypothetical organisation that has a legacy application that’s a monolith with functions / feature files that have hundreds of lines of logic in them. The team hasn’t invested in automation and any testing that happens is manual, from a staging or production environment.
This has made it hard to gain confidence in deployments without resorting to lengthy test cycles which can be skipped, leading to defects found in live.
You have buy in from the team (who are all software engineers), who have asked for your support and recommendations and have approval from leadership for tooling you may need.
What I would do
In this scenario we should look at the reality of what’s in front of us, as traditional models of testing won’t work. We have a:
- Legacy application, meaning we might not have documentation or requirements to hand.
- Engineering team, so we can’t assume deep testing knowledge and expertise.
- Monolith with huge functions, meaning it may not take close to the code (unit) tests
- Live product, so we want to start derisking deployments as much as possible.
What might help us the most here is probably the opposite of what one might think, based on conventional wisdom.
Flip the testing pyramid on its head
The testing pyramid is a model that guides engineers on how to structure their automated tests. With a greater proportion of fast, cheap, and easily maintained unit tests at the base, with fewer, more expensive, and slower end-to-end tests at the top. This is a really prevailing model (maybe the most well known testing model) that most testers and software engineers know.

But if we can’t immediately add unit tests because the code wouldn’t take them then we need to do something else. I’d recommend that we start outside-in and create end to end tests initially as this would be the cheapest to do, require no refactoring of the application and could start giving us a safety net of regression tests.
This will mean talking to the team to bring them on the journey of why we’re using end to end tests (seen as more expensive) and helping them to select a suitable framework.
Use characterisation tests
This is where we write tests that just confirm the existing behaviour of the product as written (sometimes called Golden Master Testing or Snapshot Testing). Rather than testing to confirm a requirement or acceptance criteria, we write tests that just check that existing behaviour continues unchanged. Usually used at the unit testing layer to test for code logic, it can also work well at the end to end or business functional layer. It’s especially useful for testing in situations where we might not have documented requirements because they’re lost or in some cases were only ever in people’s heads.

To support this testing we’re going to have to know what the application does. That means we’ll need a round of exploratory testing to uncover and document the existing behaviour set to inform our characterisation tests. This might be something we can teach the team to do, or might be something that we can pick up to help the team.
The output of this exploratory testing should be a scope of application behaviours to be automated and an understanding of the workflows for these behaviours. As a secondary activity, these can be prioritised (using team input) for what matters most to test; I would probably work with the team to help them drive out priorities.
Not test in a deployment pipeline
End to end tests are big and expensive, so we’re going to have to recommend not adding them into a pipeline. Instead these are probably best being manually triggered to run in a staging or test environment after a deployment (or on a schedule).
This means longer feedback loops if it’s the only testing we do, so we can recommend complimenting this with manual testing on dev machines for new features and critical areas of regression. We might also point the end to end tests (or a subset of them) at the dev machine for testing really complicated changes.
If teams are worried about not having testing to support a CI/CD pipeline then we can reassure them not to worry – this will only be a first phase. Having these tests in place will allow us to mature our automation testing through refactoring (see below).
Look to move on by refactoring
The long term goal should be to flip the flipped pyramid and create unit tested code that we can deploy in a pipeline as this will speed up delivery of code changes.

The way I would recommend that would be to take code that had end to end tests and refactor them to be able to take unit tests. Basically a form of outside-in TDD using the end to end tests as the system requirements for functionality where we refactor code to make it testable and add what unit tests we can. As we test functionality close to the code we can then retire the lots of end to end tests that we have!
This might be slow, expensive and time consuming; so we have to relay how this will take time to address safely. The team or company may decide that they don’t want to do this and instead continue using the end to end tests, something that will slow their capability to make safe changes at speed in the future.
Another recommendation I would make is to use boy scouting so that when we add or touch code as part of feature delivery then we make sure to write / refactor code as best we can to include unit tests. That way we can incrementally make things better without having to run separate big pieces of tech debt work, we can chip away as we go.
Have a maintenance plan
Tests need some love; you have to maintain your test suite to stop it being flakey or becoming out of date. I’d recommend that teams have a way of working that includes adding tests to new features as well as keeping an eye on existing tests for fixing / deprecating.
I’d recommend having a way to continually track test coverage (possibly manually given its end to end tests) and I’d also try to see if the test tool can throw alerts for failing tests. Then I’d work with the team to implement process around jumping in to fix tests when they break.
Why this approach?
This approach runs counter to what traditional testing models tell us to do but is focused around pragmatism. When we’re faced with a reality where doing the “technically best thing” won’t work, we have to be able to work around that; if we stick to sticking rigidly to models and best practice then we’ll fail at coaching.
“But Callum… engineers should focus on the code testing.”
The whole team is software engineers, so we need them to pick up all levels of testing. We might be able to recommend bringing in a dedicated tester to write end to end tests (or doing is ourselves) bust is that cost effective? Also if we do the testing ourselves, will the team of software engineers feel the pain enough to start refactoring things?
“End to end tests are more expensive, we shouldn’t run them”
They might not be ideal but they’ll work to get a basis of automated regression tests in place.
If we cannot add the perfect tests, then isn’t it better to have something rather than nothing? This way we’ll have started testing and providing more confidence to the team about their releases and will stop having to do so much manual testing.
“Why not get them to start by refactoring the code and adding unit tests?”
Refactoring the code without a safety net of some tests would be risky, we wouldn’t have any warning that the logic of the system wouldn’t change or regress. Before changing the code, we’d want to have a way to ensure that the behaviour is tested so that we don’t break things for our customers.
Having end to end tests in place allows us to create that safety net before we dive in and change code.
“Why can’t the team define the scope of the tests? Why do we need to explore?”
Maybe they can, but in my experience if the legacy system is old then people will have forgotten behaviour or won’t know what they don’t know. In that case it makes sense to do some discovery to identify (and prioritise) a scope of everything the application can do.
Hopefully this is a useful coaching tool to people, please feel free to reach out to me with thoughts or ideas that you’ve had.
This scenario and recommendations in this blog are entirely hypothetical and are for coaching purposes only.