You’re probably not working at Facebook

Published on May 11, 2024

This article is about deciding when to implement architectural changes and large functional changes to a software system.  There is no general right or wrong answer, because it depends on the context of your users/customers and your organisation.  Even though they are the best-known, Big Tech companies such as Facebook are in the small minority, so you probably don’t work there.  Their context is probably different from yours, so the options that are good for them might not be as good for you.

Working on big things

I’ve seen people online describing how they’ve built a new system using things like Kubernetes, microservices etc.  This is even though things are small and simple enough to be OK with a monolith running on simple infrastructure.  (By small and simple I mean things like the features, number of users, response times etc.)

When someone else points out that this complexity might be overkill, or at least premature, the response is often along the lines of: it’s hard to add support for scalability afterwards.

I agree that it will take time and effort to break up a monolith, move to Kubernetes etc.  However, from personal experience I know that it can be hard to add support for the following afterwards too:

  • Thread safety
  • Internationalisation and localisation
  • Fine-grained authorisation (splitting your users into different kinds of user, where the different kinds can access different parts of your system)

All these involve assumptions that the code used to make safely no longer being true.  To fix things you need to track down each place in the code where the broken assumptions show up, remove the bad code, and insert some new code – probably using some new central machinery that you need to write or at least configure.  Tracking down every bit of bad code can be tedious and painful, and testing the new version of things can also be tricky.

There are other things that I don’t have experience of, but I know are also hard to add support for afterwards:

The systems I was helping to develop at the time didn’t support these things from the beginning, but were changed several years into their life so that they supported them.  That wasn’t because the managers or architects at the time were incompetent. On the contrary: The decisions about when to release and when to work on what features were based on considering the balance of costs and benefits.

Costs and benefits

Everything we do as programmers comes with a package of costs and benefits.  There is the cost of implementing the code, which includes the opportunity cost of delaying working on something else.  There is now more code that could contain a bug, and more code that new staff need to learn when they join the team.

Sometimes a new bit of code effectively introduces a new dimension to the system, so that a future seemingly unrelated change will be costlier because of the interaction with this extra dimension.  For instance, you introduce new code that sends alerts to users if certain conditions are met.  Each time you add a new feature after this you must consider if alerts apply to it and, if so, do the work to connect them to the alerts.

These are costs that apply to most code.  I’ll now take microservices as a specific example and consider its benefits and the costs that are specific to it.

Microservices have benefits, including different parts of a development organisation being more able to release their code independently of the code produced by the rest of the organisation.  It can be used to scale up the system, and for different parts of the system to scale at different rates.  They can be used to enforce boundaries in the system’s design, and so can be a good fit for domain-driven design.

However, microservices also have costs.  They increase the minimum amount of complexity that you need to negotiate when you look across large parts of the system, making it harder to reason about.  There are more moving parts than in a monolith, leading to more complex deployment and configuration (there are tools that help with these, but the underlying complexity is still there).  Tracking down the chain of cause and effect, for instance when trying to debug, can be hard.

Data is also made more complex.  It becomes much harder, if not impossible, to have proper consistency rather than eventual consistency between what different parts of the system think is the state of the world.  The proliferation of data stores means a point in time back-up and restore is harder.  Microservices taking a local copy of data sourced from a different microservice means that poor data quality, as caused by bugs in source microservices, can be harder to root out.

Microservices can be worth it, but only if their benefits to you are greater than their costs to you.  Does your organisation, which is probably smaller than a Big Tech company, currently struggle with teams being held up by other teams, or struggle with needing to scale different parts of the system at different rates?  Then maybe microservices are a good thing to work on.

As I mentioned above, this isn’t intended to be an attack on microservices.  It’s trying to show that there are things on both the cost and benefit sides of the ledger with any technology.

Motivations

I think that the key issue isn’t a specific feature, such as microservices.  More important is what you work on, when and why.  Like with writing technical documents, it’s worth examining the real motivation behind wanting to work on something next:

  • You think it would be good for your CV (CV-driven development)?
  • You think the cool kids (Big Tech etc.) work with it?
  • You think that it’s the thing that will best help you keep your customers happy or to get new ones?

Flight corridor

I’m not saying any particular big thing (micro-services, internationalisation etc.) is good or bad.  However, for a particular product, market and development organisation there are better and worse times to introduce each of them.

If you introduce them too early, you are spending time on something that you don’t need yet, and so aren’t spending time on some better thing you could be building.  (There is an opportunity cost to the big thing.)  If you introduce them too late, then you will struggle to keep your users or colleagues happy – for instance, users will get worse response times or availability because your servers are running too close to maximum capacity, or users will struggle to use the system or they feel unwanted because it’s not presented in their language.

It’s similar to a plane coming in to land at an airport.  There’s an imaginary corridor through the sky connecting the plane where it is now down to where it is once it has landed safely on the runway.  If the plane loses altitude too slowly or too quickly it will miss the runway.  Incurring large infrastructure cost when building software is something you can do too slowly or too quickly.

long-exposure photo of a plane arriving at or leaving from an airport, leaving a trail of light through the sky

Summing up

Because you and your colleagues are human, you will make mistakes.  This includes when you make architectural and design decisions, and not just coding mistakes that show up as bugs.  The problem is that architectural decisions are often costlier to fix that coding bugs.  An architectural mistake can be to introduce a good thing at the wrong time, which makes it a bad thing for you.

Be honest about your motivations, and realise that there are costs as well as benefits to every option.