
Linky #15 - Beyond Root Causes and Simple Fixes
This week’s Linky is all about working in complexity and the value of human judgment. From principles for using LLMs, to why root cause analysis often fails, to safety in discomfort. These are all reminders that quality isn’t about certainty or control, but about navigating risk, coordination, and the unknowns together.
Latest post from the Quality Engineering Newsletter
This week, I was delivering a new talk on the importance of studying how quality is created, maintained and lost in complex systems. That got me digging around in the archive, and I came across an old post on nudging and boosting complex software systems.
Developing my understanding of complexity in software has really shaped how I think about quality. I used to believe that if you built it right and tested it well, you’d create high-quality. But it never worked out that way. No matter how hard we tried, something would inevitably go wrong.
It was only once I began to account for the complexity of the systems we work with, and accepted that you can’t see and know everything, that my approach shifted. You have to expect things to go wrong and work with that reality rather than against it.
From the archive
Last Monday (September 1st) marked my 22-year anniversary of working in the software industry 😅. So I've dug out one of my first posts here on QEN about how I got started. This is actually a video of a talk I gave in 2023, so give it a watch and let me know what you think.
Principals for using LLMs
Three that stood out to me:
6. These systems can output long, convincing “scientific” documents full of fabricated metrics, invented methods, and impossible conditions without flagging uncertainty. They cannot be trusted for policy, healthcare, or serious research, because they are far too willing to blur fact and fiction.
7. These systems can and should be used only as a drafting assistant (structuring notes, summarising papers) with all outputs fact-checked by humans that are capable in the field. Think of these systems as a calculator that sometimes “hallucinates” numbers - it should never be trusted to do your tax return.
8. The persuasive but false outputs can cause real harm. These systems are highly persuasive and are designed to be this - hence coherence, the appearance of "helpfulness" and the use of authoritative language.
The space where I find LLMs most useful is where I can already do the work and easily fact-check their output. When I’m less sure, they tend to slow me down. Via Going through another AI horror story. These tools are great but please remember the following when you are using LLMs | Simon Wardley | LinkedIn
To watch: Simple made easy
"Simple Made Easy" - Rich Hickey (2011) - YouTube
This keynote was given at Strange Loop 2011, and is perhaps the best known and most highly regarded of Rich's many excellent talks, ushering in a new way to think about the problems of software design and the constant fight against complexity.
I’ve only watched a part so far, but I like how Rich explains the origin of “simple”:
Simple = sim-plex (one fold/braid)
Complex = com-plex (multiple folds/braids)
Looking forward to watching more. Thanks to Florian Sommerfeldt for surfacing this via a comment on Complicated, Complex, and Everything in Between.
I share Linky each week alongside my Quality Engineering Newsletter. If you’d like to join the 500+ others exploring quality, complexity and testing, you can subscribe below.
Knowledge work is bound by skill, risk and coordination
…anaesthesiologists' jobs look easy until they don't. And why it's foolish to imagine easily automating their roles away.
Johnson & Johnson's Sedasys machine from the late 1990s illustrates the challenge. Designed to automate propofol delivery, it never gained traction, even after limited FDA approval, largely because of concerns about managing airway emergencies without an anesthesiologist.
*Today, AI is democratizing medical knowledge like never before. But the crisis reminded me of something Sangeet Paul Choudary captures in his outstanding new book 𝘙𝘦𝘴𝘩𝘶𝘧𝘧𝘭𝘦: knowledge work is bounded by three constraints—skill/knowledge, risk, and coordination.*
AI can make knowledge abundant—but in many clinical situations, it cannot yet reliably manage risk or coordinate action under pressure.
Risk and coordination are why I believe quality engineers will still be needed in an AI world. LLMs lack the real-world context of teams, systems, and environments. You can give them that context, but it slows you down.
Via Last week, I was halfway through a colonoscopy when the patient began coughing | Spencer Dorn | LinkedIn
Safety in discomfort
Psychological safety is about safety in discomfort.
Safety in dissent.
Safety in conflict.
Safety in being different.
Safety being “wrong” whilst striving for “right”.And especially safety in telling people (especially those with more power) things they don’t want to hear.
I was running a psychological safety workshop this week with leaders in engineering teams, and this framing really resonated. Safety isn’t comfort, it’s the ability to lean into the discomfort. Via Psychological safety isn't about comfort, it's about safety in discomfort, dissent, and growth. | Tom Geraght | LinkedIn
Why root cause analysis (RCA) doesn’t help to fully understand incidents involving people
𝗦𝗼 𝘄𝗵𝘆 𝗵𝗮𝘀 𝗥𝗖𝗔 𝗽𝗿𝗲𝘃𝗮𝗶𝗹𝗲𝗱 𝗼𝘃𝗲𝗿 𝘁𝗵𝗲 𝘆𝗲𝗮𝗿𝘀:
• RCA is time and resource efficient; it is a match made in heaven for industries, organisations and people who crave for quick fixes and simple solutions.
• RCA recognises that when things go wrong, humans cannot hold doubt for too long; they want a cause. Any cause is better than none.
• RCA offers an existential solution (i.e. do this and you will be saved) to problems that are complicated, wicked and unsolvable.
• RCA gives the organisation a legitimate basis to reveal just enough information without disrupting the status quo.
• RCA is a cleansing agent and a business friendly tool. It will purge impurities and ensure business as usual post accident.
𝗪𝗵𝗲𝗿𝗲 𝗥𝗖𝗔 𝗯𝗲𝗰𝗼𝗺𝗲𝘀 𝗽𝗿𝗼𝗯𝗹𝗲𝗺𝗮𝘁𝗶𝗰:
1. RCA gives disproportionate powers to the investigators and auditors.
2. RCA seeks rational explanation to non-rational decisions made in a moment of uncertainty.
3. RCA create a delusion of certainty and control where there is none.
4. RCA claims to bring objective truth from subjective persons. (?)
5. RCA is silent on sensemaking and meaning making (foundational to understanding cultural problems in organisations).
That’s why blameless postmortems are more useful: they help us understand failures in context, rather than reducing them to one root. Via Why Root Cause Analysis (RCA) is flawed | Nippin Anand | LinkedIn. Nippin is running a webinar on how to learn from accidents on the 11th, which I'm going to try to make.
Close
When I look across these links, a theme stands out: complexity resists simple fixes. Whether it’s AI tools, team safety, or incident analysis, we need approaches that embrace uncertainty, rather than ignoring it. That’s where quality engineering lives, in the messy, human work of making sense of systems together.