
Questions to answer
(This is a cross-posting from my general blog, Steer for the Deep Waters Only.)
In the Post Office Horizon public enquiry, today has been the first day when a Fujitsu representative has taken the stand. Gareth Jenkins will answer questions for four days, more than any other witness. Today has concentrated on his role as an expert witness in the prosecutions of sub-postmasters (SPMs), and so the questioning has been concentrating mainly on judicial and quasi-judicial matters.
My take on this from the moment I first heard about the issues with Horizon has been from the point of view of a software tester, which profession I followed for more than 25 years. My perspective has been “What tests would I want to have done if I had been working on Horizon?”. And I’m also driven to ask how important Fujitsu and the Post Office considered testing to be. It’s possible to define the importance of testing according to a scale of impacts:
- Is there a risk of death or injury associated with the system?
- Is there a risk of injustices arising from faults in the system?
- Is there a risk of financial loss if there are faults in the system?
- Is there a risk of reputational damage if the system throws up errors?
Clearly, at least three of these apply in the case of Horizon, though I doubt that the worst case scenario of people being sent to prison would have been on anyone’s whiteboard when Horizon was still in the planning stage. It ought to have been, of course, but unless people have had direct contact with instances of offences within the postal system, I doubt it was even mentioned when Horizon was being scoped out at the beginning of the development process. Nonetheless, the question should have been asked: “If this goes wrong, what’s the worst that could happen?”
The questions I would be asking now are these:
- What testing was done?
- Was there extensive testing done in real life situations with real life users?
- Or was there merely a test system in a test environment on someone’s desktop?
- There is a school of thought that says that in situations where a system is going to be rolled out across many points of physical contact with users and across a geographical area, then simulation running of a system ought to be done on a physical basis, with an entire trial system running in a number of locations to simulate real-world communication problems, with scenarios to probe various failure states – loss of physical lines, loss of power either remotely or centrally, inability to access helpdesk resources – that can be anticipated. Gareth Jenkins has mentioned “pilot schemes”; their extent and suitability should be explored.
- How much regression testing was done?
- Regression testing is performed when a new version or a bug fix is rolled out. It explores whether the fix or new version impacts the existing version in any way.
- It should not only be done during development, but also after deployment.
- What version control was there?
- In development; and
- In deployment: in other words, were bug fixes and new versions tested against existing versions once Horizon had gone live?
- How were upgrades and bug fixes rolled out?
- Importantly, were SPMs told about the necessity to keep their versions of Horizon up to date? Was the significance of upgrades impressed upon them?
- Were upgrades rolled out during expected Post Office downtime hours? Were they automatic, or did they require SPMs to trigger them? Could SPMs defer upgrades to less busy times?
- Did upgrades apply to specific versions of the software? Were SPMs warned if an upgrade was about to be applied to an out-of-date version? Would the system permit that, or would it force multiple (and time-consuming) upgrades if it detected an out-of-date system?
- Did Post Office investigators check which version of Horizon was being run before continuing with criminal investigations?
All these are important questions which ought to have been asked in the system scoping and design sessions before a line of code was written. This is something professional testers do. But too many companies consider that testing is a mechanical process that just involves pressing buttons and seeing that the system delivers the right responses, and that “anyone” can do it; and, indeed, that it can be automated. I’ve written elsewhere about the fallacy of these views, and if there are to be any takeaways for anyone involved in system design and development, it ought to be that.
I’ve also written recently about what I’ve dubbed the “Goldfinger Heuristic”*, summed up in the opening lines of Ian Fleming’s novel of the same name: “Once is happenstance, twice is coincidence, but three times must be enemy action.” In this case, “enemy action” is simply something happening the the real world. If three people independently report a problem, then that problem is in all likelihood real, and IT managers should at the very least consider that there may be an underlying problem with the system, even if it’s not immediately obvious how to replicate that problem. I once spent three months trying to pin down a problem that I wasn’t even certain that I’d seen, but which was registering with me, almost subconsciously, as “wrong”. In the end, it was a real problem caused by the interaction of screen controls, mouse movements and the design of different workflow stages. An automated test system would never have found it.
Testing can never address all the possible failure points of a system before it goes live; sometimes problems may not emerge for years, only arising with a particular combination of circumstances. Testing and monitoring of systems must therefore continue after the system is deployed, and those dealing with systems management must always be prepared to ask the questions that a tester would ask before jumping to any conclusions about what has happened. This does not seem to have been done with Horizon. I hope that the remaining stages of the public enquiry will drill down to root causes in its remaining sessions.
*In this context, a “heuristic” is a rule of thumb about the real world and real user behaviours that identify a range of situations that testers should look out for, or at least be aware of, when testing systems.