The AI Testing Report: PNSQC 2025 — Quality in Our Own Backyard

Published on October 12, 2025

The AI Testing Report: PNSQC 2025 — Quality in Our Own Backyard

I’m excited to be at PNSQC this year — and instead of just talking about AI and testing, I decided to bring data. My AI testing agents have been quietly scanning and analyzing the websites of this year’s PNSQC sponsors and speakers — surfacing real bugs, accessibility issues, and usability gaps.

This isn’t a “gotcha.”
It’s an invitation.

Even the best testing teams — including those at a testing conference — have escaped issues. And that’s exactly where AI testing agents shine: they don’t replace testers, but they catch what slips through the cracks, adding a layer of scalable, unbiased coverage that pairs beautifully with human intuition.

Why I Did This

I wanted to show, not tell, how AI can elevate our craft.

A few years ago, I could test maybe one app a week.
Now, autonomous AI agents can test dozens of sites in parallel, across thousands of checks — from WCAG accessibility to security headers, to visual and emotional impressions from simulated personas.

The result?
A conference-wide benchmark of our community’s digital quality — a snapshot of how we, the testers, are doing when it comes to our own front doors.

The Results

Here’s the Quality Quadrant generated by testing the PNSQC ecosystem — all the sponsors and speakers’ websites plotted by Quantitative Quality (technical issues) vs. Qualitative Quality (user impressions).

🟩 = Above average in both
🟧 = Gaps or weaker areas

What’s Measured

Quantitative Quality (Y-axis):
The number and severity of bugs found — broken links, 404s, missing accessibility labels, security gaps, and performance bottlenecks.

Qualitative Quality (X-axis):
Feedback from simulated user personas judging usability, clarity, emotional tone, and brand trust.

What the Agents Found

🟩 High-High (Leaders)

Harmonic Northwest, Plaid, EverDriven, TTC Global
These stood out for combining strong technical hygiene with good UX polish.

  • Harmonic Northwest — Clean WordPress implementation and strong design, but could improve clarity on pricing and case studies.
  • Plaid — Visually sleek and modern, but flagged for missing form labels (WCAG violation).
  • EverDriven & TTC Global — Balanced, consistent experiences with stable technical foundations.

⚙️ Technically Sound, Less Engaging

Microsoft, McAfee, Sun Life Financial
These sites passed most functional and compliance checks but felt heavy and corporate.

  • Microsoft — Multiple 404 resource errors; testers described the experience as “overwhelming.”
  • McAfee — DNS resolution issues and accessibility gaps despite solid content and visuals.

❤️ User-Friendly but Buggy

Zions Bank and LogiGear
Testers loved the layout and tone but found fixable issues that impact reliability.

  • Zions Bank — Great aesthetic, but a broken internal URL still made it to production.
  • LogiGear — Easy to navigate but hurt by caching and configuration inefficiencies.

🚧 Needs Attention

SmartSense Consulting, Meta Superintelligence Lab, Deloitte
These teams can benefit most from AI-assisted testing.

  • SmartSense Consulting — Polished design, but missing accessibility metadata.
  • Meta Superintelligence Lab — Unstable page performance and rendering errors.
  • Deloitte — Consent banner inconsistencies, missing resources, and weak privacy signals.

Quality Highlights from the Field

  • Nike — Gorgeous visuals, but no Content Security Policy (CSP) — a real XSS risk for a global brand.
  • Microsoft — Several 404s and confusing navigation flow.
  • McAfee — Persistent resource loading failures (net::ERR_NAME_NOT_RESOLVED).
  • Zions Bank — Invalid URLs generated dynamically.
  • Plaid — Accessibility gaps in form labeling.
  • Harmonic Northwest — Excellent WordPress work, but lacking transparent pricing or case studies.

Why It Matters

At a testing conference, quality should be part of our collective reflection — not judgment, but data-driven humility.

Quality DimensionWhat It MeansReliabilityDoes it actually work?AccessibilityCan everyone use it?UsabilityIs it clear and intuitive?TrustAre privacy and consent handled responsibly?EmotionDoes it make users feel confident and inspired?

These metrics remind us: testing isn’t just about finding defects — it’s about measuring trust and experience at scale.

What I’m Doing at PNSQC

I’ll be sharing these AI findings with the teams who can fix them — engineers, designers, accessibility advocates, and test leaders — right here at the conference.
My goal isn’t to point fingers, but to spark conversations about practical AI–tester collaboration.

If you’d like a quick look at your own site’s quality report — like the ones above — stop by and I’ll show you in real time how AI agents evaluate your site, or you can try it at testers.ai.

If you’d rather have expert testers help run these AI-powered checks for you, check out IcebergQA — same AI, human-reviewed results.

Here’s to learning together, fixing real bugs, and improving our own digital quality — right here at PNSQC.

Jason Arbon, CEO @ testers.ai | Principal @ IcebergQA