You’re probably wrong

Published on May 5, 2024

In this article I make a suggestion that might sound absurd: when developing software, you should assume that you’re wrong. Once you make this assumption, it can influence quite a lot of what we do and how we do it in ways that I think help us to deliver better software.

Why do I think you’re probably wrong?

It’s not because I think that I’m clever or you’re stupid, or because I like to pick fights with strangers on the Internet. It’s because you’re human, and humans make mistakes. (If you’re a computer reading this, then you’re often wrong too, but you often hide this wrongness behind a veneer of misplaced confidence in your correctness.)

An ice cream cone that has been dropped onto the pavement — Image by Caro Wallis, shared under CC BY-NC-ND 2.0 Deed

The famous quote (probably not coined by Einstein) goes: Insanity is doing the same thing over and over again and expecting different results. Even knowing this, we think that this time, we’re going to deliver bug-free software that ends up being useful to our users. All we need to do is try a bit harder, and then all will be well.

I really don’t like advice, or feedback from retrospectives, that is based on people just trying harder this time. This is sometimes what’s needed, but often it’s wishful thinking that papers over the cracks. In order to actually address the problem, we need to do something different to before.

I think that the doing things differently starts with the assumption that, despite our skill, experience, motivation and best efforts, we’ve got something wrong. In the legal system in the UK and many other countries, someone is assumed to be innocent until they are proven guilty. I think it is helpful to adopt a similar kind of assumption when developing software, but the assumption is: we’re wrong until we find evidence that shows we’re not.

This requires quite a lot of humility, which can be hard work. If it helps, think of how scientists work. Based on a particular context of information, they create a hypothesis. Note that at this point that’s all it is: a hypothesis, and not a theory or law. They have to think about the evidence that will help test that hypothesis, to suggest it’s right or wrong. The evidence drives the design of an experiment to gather that evidence, which they must then conduct with enough care so that they receive the evidence they think they should. Only after they have designed and run the experiment and interpreted the evidence do they make claims about how true the hypothesis is. We should work like this as software developers.

What could be wrong?

The main things we should assume we’re wrong about are:

Our ideas are wrong;
Our implementation of those ideas is wrong.

The main idea we care about is: Task X is the best thing for us to work on next. This is usually because we think that ratio of X’s benefit to its cost is the best (a lot of benefit for little cost). Once we’ve chosen a task to work on next, we should assume that we’re implementing it badly, i.e. there are bugs in the software we write to make the idea a reality.

You might think that the task-choosing question isn’t one you need to worry about, because your development organisation doesn’t worry about estimates of how long work will take. Even if you don’t worry too much about estimates, there is always opportunity cost. If you’re working on a task that ends up with something that’s not valuable to users, you can’t be working on a better task. The better task is delayed until you finish the worse one.

Cost-effective evidence

Once you have accepted the idea that you’re wrong until you have evidence that suggests otherwise, your mind might turn to what this evidence is and how you collect it. This is where a little nuance is important – we need to continually gather the most cost-effective evidence. That’s to say, the evidence that has the best balance of value and cost.

If there’s an idea that’s a candidate for what to work on next, is there a quick way that can be dismissed? This could be something as simple as thinking about it for an afternoon, talking to a colleague, or sketching out the flow a user would go through on a whiteboard and talking to a friendly customer. Once it has cleared these hurdles, it might be that making a clickable prototype in something like Figma gives feedback that the idea’s not worth pursuing. We might not get all the evidence we need in one go, but instead things have to run the gauntlet of a series of increasingly helpful but also increasingly costly tests.

There are software development gurus who say the only feedback worth having is from when it’s in the hand of users. This certainly has the biggest value, but it also usually has the biggest cost. One cost is the direct cost of implementing all of the feature. Another cost, that is often overlooked, is when the new software makes users unhappy.

It could be that this feature passed all its automated tests, but:

The fonts and colours in one part of the system don’t match those used in another part of the system;
The steps a user must go through to accomplish a particular task are very different from the steps to accomplish a similar task, so the user can’t use the familiarity they’ve built up with one part of the system to help them use another.

You could get this feedback directly from the user, but it will come at the cost of unhappy users. A tester colleague doing exploratory testing first could get you the same feedback at lower cost – no new upset to your users. Assuming the tester doesn’t unearth anything bad, this can then be followed by feedback from users, which is more valuable but also more costly.

The order of implementation work

Another way to improve the cost-effectiveness of the feedback we get from implementation is to tackle the riskiest bit of implementation first. It might be that when we tackle this that we realise that the whole idea is going to be not worth implementing, or at least not yet. This is valuable (but painful) feedback. If we leave the riskiest bit to last, then we will have already incurred the cost of implementing all the other stuff, so the same feedback is costlier.

Basecamp from 37 signals is project management software for things including software development. It has a kind of chart called a hill chart, where the work to accomplish a task is split into an uphill part followed by a downhill part. During the uphill part there are still big decisions to make, things to learn and risk to deal with. At a certain point, all the work remaining is relatively straightforward (downhill, compared to uphill). There’s less to learn, less risk to deal with and fewer decisions to make.

If you do the risky parts of a task first, the uphill portion ends relatively quickly. If you put the risky parts off, you extend the uphill part unnecessarily.

Reducing the cost of evidence through automation

You should automate as much as possible of the process that takes code from a developer to the user (although bear in mind the point I made earlier about human, non-automated, testing). This has two benefits: the first is that it reduces the cost of getting evidence from users, because the one-off cost of creating the automation usually more than pays for itself over time in terms of less manual effort and reducing the risk that the process gets messed up via human error.

The second benefit is that it reduces the overhead of building and deploying software, so it’s something that’s economical to do often. This means that software can be deployed as soon as it’s ready, rather than having to wait around for a release to be filled up with all the other tasks scheduled to be released at the same time. This increases the value of the evidence, as changes will tend to make their way into users’ hands one at a time (often) rather than in a big batch along with other changes. If the user is unhappy, which parts of the big batch of changes are the cause of this? If there’s only one change, the feedback will be much more tightly tied to one change being the cause.

Gathering the evidence

It’s worth thinking about the evidence you hope to gather up front. In fact, if you use user stories as a guide to software development, it might be worth changing the format slightly and including evidence. For instance:

We hypothesise that it would help type of user

To accomplish user’s goal

If we changed the system to do X.

If the hypothesis is correct, we expect to see type[s] of evidence

It might be that the evidence you hope to see is that some important quantity gets better – latency decreases, customer satisfaction increases, cost to serve decreases etc. A precise change might not be as important as it’s going in the right direction by enough. It’s worth pausing here to check that you know the value that this important thing has before you change the code. What is latency, customer satisfaction, cost to serve etc. now? It might be that the first bit of work you do is to add extra monitoring or logging so that you can get a baseline for the important thing.

Also, the evidence might be qualitative data. It’s worth thinking before you reject qualitative data as acceptable evidence. It might be that you think that latency of API X is valid evidence, and that might be true. However, is API X called only as part of web page Y? In which case, if API X is fast but other APIs called by the same page are slower, then the loading time of web page Y will still be poor. So, you might increase the range of evidence to also include the loading time of web page Y. That might be the end of the story, but how fast does web page Y have to be? What if it loads quickly but is hard to use, doesn’t do the things that a user could reasonably expect it to, etc? You are back to the important but qualitative data – how happy is the user?

Summing up

This article was inspired by a video by Jez Humble that I now can’t find, but that shares a lot with things such as a ThoughtWorks article on hypothesis-driven development.

We shouldn’t be surprised when we implement the wrong thing, or we implement the right thing in the wrong (buggy) way. This is because software is made by fallible humans. Instead of just hoping that magically the humans around us (including ourselves) are much less fallible than average, or less fallible than they were yesterday, we should assume that mistakes will happen and design how we approach work accordingly.

Continue reading on website

Other news

🌸 Spring bingo - Wellness challenge - Halfway! 🌸

April 15, 2025

Hey Hivebriters! Quick check-in on our April Wellness Challenge - Spring Bingo! We're halfway through the month, and it's the perfect time to jump in if you haven't started yet (or keep going if you have)! Quick Reminders:Complete rows or columns for 5 raffle entries eachSquares with 📷 require photo submissions in the commentsSubmit completed rows/columns through the form by April 30thBonus entri