Top three talks from TestBash 2025

Published on October 8, 2025

I came back from TestBash Brighton 2025 with pages of notes and plenty to think about. Rather than trying to cover everything, I wanted to share my notes from the three talks that stuck with me the most. The ones that challenged how I think about testing, leadership, and quality.

Before you dive in, I’d love your input. I’ve been experimenting with different ways to share what I learn from conferences, from full notes to short summaries and even live Q&As.

What would you like to see more of?

Here are the three talks I’ve shared notes from:

If you want the full event write-up, you can read it here: TestBash Brighton 2025: Reflections on Two Days of Quality.


Tip: For the best experience viewing my notes posts, use a desktop browser. This allows you to easily navigate between chapter headings using the contents panel on the left side of the page.


What a tester’s role in evaluating and observing AI systems

By Carlos Kidman

Summary

  • Really interesting talk on how testers can systematically test AI systems. It walked us through different evaluation tools, showing how they help us understand where uncertainty lies in AI and what quality attributes we should be assessing. The tools themselves weren’t out of reach for testers, you don’t need to be an AI specialist to use them. You just need the willingness to get involved and get stuck in.

  • One powerful point that stood out for me was that teams often don’t do this kind of testing simply because they’re not used to thinking in that way. That’s exactly where the real value of testers and quality engineers comes in: bringing a quality mindset to engineering teams, helping them identify the attributes that matter, and guiding how to assess them in a systematic way.

  • Another brilliant session, well worth looking into if you’re trying to understand how to evaluate AI systems or are now being asked to.

Key takeaways

  • Testers can apply systematic evaluation techniques to AI systems without needing to be AI experts.

  • Existing testing skills (designing experiments, defining metrics, building test datasets) transfer directly to AI evaluation.

  • Benchmarking and evaluators (like annotations, custom code, or LLM-as-judge) make AI performance measurable.

  • Quality engineers play a vital role in helping teams identify which quality attributes matter and how to assess them.

  • Tools such as LangSmith build observability into AI systems and make testing more transparent.

  • Evaluating AI is about managing uncertainty - making it visible, measurable, and actionable.

  • The real value of testers is in shifting how teams think about quality and helping them test AI more systematically.


My notes

Read more