Selenium WebDriver BiDi: Kismet Child of WebDriver Classic and Chrome DevTools Protocol

Published on November 29, 2024

WebDriver BiDi Overview for QA Engineers Who Interact with Web Browsers, Test Web Apps and Plan for the Future

Testing web applications can be challenging due to their need to operate across multiple platforms and devices. These applications must remain robust regardless of the form factor or browser used. Quality engineers employ various technologies to automate web browsers, with a particular emphasis on facilitating automated testing of web applications within those browsers. This article provides an overview of browser automation APIs that are important for both testers and developers. We traverse the evolution of Selenium WebDriver from it’s seedling stage into the current state of affairs.

Several years ago, the Selenium maintenance team introduced an enticing new ‘portal’ into the bidirectional functionality of web sockets. QA and developers can now use the WebDriver BiDi API to enhance robustness of their test suites. In the automation world, a few various terms have been coined — “Codified DevTools”, “Black Ops QA”, “Future of Cross Browser Automation”, or “BiDi-Powered Selenium”. For brevity, we’ll be referring to WebDriver BiDi as “𝘞𝘋 𝘉𝘪𝘋𝘪” and WebDriver Classic as “𝘞𝘋 𝘊𝘭𝘢𝘴𝘴𝘪𝘤” respectively.

Why Are Browser Automation APIs Important?

What’s Missing From the Web: https://mdn.dev/archives/insights/reports/mdn-web-developer-needs-assessment-2020.html
  • Modern websites are often complex applications — the modern web is used to deploy complex application software, and shipping reliable software requires testing. Modern development processes focus on extensive automated test suites, usually as part of a Continuous Integration and Deployment pipeline.
  • For the web to be a compelling platform it must be easy to create reliable web apps — In order to enable application authors to write software for the web, it must be easy for them to write automated tests for those applications.
  • This requires testing across multiple browser engines — Being an open platform with multiple independent implementations, the web presents some additional complexities compared to other competing platforms. Despite W3C’s best efforts at standardization and the good intentions of the browser engineers and extensive test suites, we do have differences between engines.
  • MDN Developer Needs Assessment 2021 rated “Testing across browsers” as a top pain point — Developers want to test that their web app works across multiple engines. And this isn’t always easy. Indeed, the MDN Developer Needs Assessment shared that cross-browser testing was among the top five pain points identified by web developers. And this is one of the problems that W3C was to address.

Evolution of WebDriver Classic

We need to understand the evolution of WD Classic for easier understanding of the WD BiDi concept. Selenium WebDriver was created back in 2005. It’s been a 20-year-long journey. The open project has grown into a vast ecosystem, consisting of a number of drivers, bindings, plugins, and frameworks created and maintained by third parties. WD has been used as the backbone for many test automation tools (Selenium, WebDriverIO, etc.).

WD Classic — Automation Tools

https://www.selenium.dev/ecosystem/#frameworks

Selenium WD

  • Open-source suite of tools (ecosystem) for automating web apps.
  • Used for testing and simulating user interactions in the browsers.

WebDriverIO

  • A test automation tool for web apps.
  • Offers a simple syntax and built-in commands.
  • Supports multiple browsers and devices for efficient and effective testing.

NightWatchJS

  • An automated testing tool based on Node.js.
  • Offers a simple syntax and built-in WD support.
  • Supports E2E testing and browser automation.

Appium

  • Open-source ecosystem for mobile automation.
  • Allows testing of native, hybrid and mobile web apps on iOS and Android.
  • Uses WD protocol.

Katalon, SeleniumBase, etc.

  • Other tools use the Selenium WD under the hood.
  • E.g., Katalon uses the Selenium jar file in the back-end, but it still communicates to the browsers via the web DevTools protocol.

WD Classic — Timeline

2004 — Selenium RC

  • Jason Huggins introduces the fist automation tool in 2004. RC worked by installing a remote control server in the browser machines. It sent a command to a JS engine called Selenium Core. And the Selenium Core, residing in the browser engine, executed that command. RC used to be quite popular, but came to a disadvantage because of its complex architecture. Moreover, its API was not purely object oriented.

2005 — WebDriver

  • Simon Stewart introduces the WedDriver protocol in 2005. The driver communicates to browsers via JSON wire protocol WD API.

2009— Selenium WebDriver

  • Selenium and WebDriver merge into a new tool called Selenium WebDriver. JSON wire protocol was still used to communicate with the browsers.

2018— WD becomes a W3C standard

  • W3C makes WD a standard specification. Now, Selenium WD communicates with the browsers via the W3C protocol, not JSON wire. Selenium v4 and other tools that use WD communicate via W3C protocol. The protocol was introduced to provide consistency and stability for test cases when using the WD.

WD Classic — W3C Recommendation

W3C Specification for automation — for a while W3C focused on the WD Classic specification. WD provides an HTTP-based protocol for simulating user interaction with a website, e.g., clicking on elements and filling in forms. It also can do things that users can’t do, like executing a script. It was originally based on the Selenium testing framework.

Focus on end-to-end testing that simulates user interactions — The model of WD is in simulating user interaction to provide end-to-end testing, the kind of thing that we could ask a real human to do.

Protocol is simple command/response:

  • Transport layer is JSON-over-HTTP — And the HTTP-based protocol means that WD is basically command and response, which is actually really good in many ways.
  • Easy to write synchronous client code — It’s a very simple linear control flow. It means that writing tests is very easy in a wide variety of different programming languages.
https://www.selenium.dev/documentation/webdriver/

W3C Recommendation since 2018, widely implemented — WD spec is complete. It went into the W3C recommendation in 2018 and has been widely implemented. And while it might be still undergoing updates, it isn’t likely to undergo substantial revision at this point.

WD Classic — Limitations

The fact that in 2020 we were still seeing problems with developers writing cross-browser tests suggested that there was more to be done. A more fundamental change was needed, rather than just an incremental update to WD. To understand what’s required going forward, we need to look at the limitations of the WD protocol.

Modern web apps have a lot of internal state — modern web applications often don’t follow the historic model of interconnected pages with limited internal state. Instead, they have rich interaction inside the page itself, often with a lot of behavior driven by scripting. Those scripts might invoke additional I/O in the form of network requests. And this introduces possible sources of non-determinism/flakiness.

  • Ongoing network requests, JS execution, storage, etc.

Mismatch between concurrency inside a browser and synchronous automation makes writing stable tests hard — In the face of all this complexity, it’s quite hard to write a reliable test in the command/response model of WD Classic. And, indeed, flaky tests are one of the most common things that users complain about. So, for a testing tool to meet the needs of modular applications, we really want it to have more of a browser’s eye view of what’s going on, instead of just concentrating on simulating user interaction.

Test authors want access to web app’s internal data — We want to be able to observe internal state changes that are happening in the browser, just like we can, for example, in DevTools.

  • Console logs, network traffic, etc.

These requirements don’t map well onto the WD design — And the command/response model of HTTP-based WD Classic just isn’t a good fit for these requirements.

Impact of Browser-Specific Automation Tools on Cross-Browser Testing and Web Standards

The aforementioned problems were already having an effect in the real world.

Modern automation tools have started to use browser-specific protocols to provide low-level control — Modern testing tools started to use non-WD-based protocols to enable low-level access and control. These other protocols aren’t based on open standards, but they are often things like the browser dev tools protocols or other custom protocols that are invented by the individual automation tools.

This means users have to choose between cross-browser testing and advanced functionality — Even these tools that developers are using aren’t cross-browser. Or they have to do a lot of additional implementation work for each browser they want to support. And that’s a problem for the open web. When automation tools are tied to specific browsers, developers have to make a decision on whether to opt for advanced features or to have cross-browser support. This increases the opportunity cost of making a site work across multiple browser engines. To address this, it is crucial for the health of the web that the browser automation ecosystem is based on standards. As a result of this, W3C’s browser testing and tools working group started focusing on WebDriver BiDi.

Evolution of Chrome DevTools Protocol (CDP)

Time has come to pivot to understanding the role of CDP in formation of the open BiDi protocol. And WebSocket is the key term to understand here.

WebSocket APIs enable web applications to maintain bidirectional communications with server-side processes.

WebSockets have been in use for quite some time, offering a full-duplex communication channel that allows two-way interaction. This means that: 1) the client can send messages to the server and receive responses, and 2) the server can proactively send messages to the client, functioning like a broadcast. For instance, a client might ask the server, “Can I subscribe to updates on this topic? Let me know when something relevant occurs.” The server then broadcasts updates to the client as events happen. This capability highlights the elegance and efficiency of WebSockets.

DevTools Protocol operates by leveraging WebSockets.

DevTools Overview

  • Essential for web developers and testers to instrument and debug the application.
  • Useful for debugging the UI part of a web app.
  • Integrated into modern browsers like Chrome, Edge, Safari, and Firefox.

Allow us to:

  • Check the look and feel of our application.
  • Verify that correct styles are applied.
  • View the HTML and DOM structure at a high level.
  • Drill down into specific details as needed.
Open Chrome DevTools: Command + Option + J (Mac) or Control + Option + J (Windows)

CDP Definition

Need for Protocol:

  • Automation Support
  • Extended Debuggability
  • Consistency

You might wonder why a protocol is necessary since the browser inherently understands all sorts of events. The need for a protocol arises from the necessity of a contract between two communicating parties. Each party must understand the common semantics, the expected responses, and the response formats. This ensures that the receiver can interpret the content they receive and respond appropriately, creating a foundation for any global protocol.

The Chrome DevTools Protocol (CDP) exists to facilitate communication between DevTools and the browser. For example, in a Chrome browser, all events — such as DOM loading, network requests, and performance-related information — are known to the browser. However, this information needs to be communicated to the DevTools interface for verification. CDP provides the common language for this communication, ensuring consistency across all Chromium-based browsers like Edge and Chrome. This is why these browsers have a DevTools window available.

Additionally, CDP enables communication with browser drivers, such as the Chrome Driver, making it easier to use for automation purposes. The Chrome Driver, a binary used for testing, understands how to communicate with and control the browser. Thus, the protocol not only standardizes communication but also simplifies browser automation.

CDP Structure

  • Domains
  • Commands
  • Events
CDP Structure: Domains, Commands, Events

CDP is divided into Domains, Commands, and Events. In the OOP analogy, we have classes which group the related methods and properties together. Similarly, a Domain is a special logical grouping of all the Commands and Events that keeps them all in one place. That is how it’s structured.

Commands give instructions to the browser to do something. We give it a command, instruct it to perform a certain action, or instruct it to retrieve certain data.

An Event is a signal that something has happened in the system. Browsers are intrinsically even-driven in nature. When the DOM is loaded, there are lots of events that are fired, and the browser picks these up and uses this information in the DevTools window. DOM is just one of many examples. The Protocol avails lots of events which the browser is driven by.

Domains, Commands, and Events are the 3 core concepts of CDP.

Selenium CDP Support

The Selenium project provides support for Chrome DevTools. For example, we can set a cookie with the help of CDP Network API.

setCookie_cdp_network.java:

public void setCookie() {

ChromeDriver driver = new ChromeDriver();
DevTools devTools = driver.getDevTools();
devTools.createSession();

devTools.send(
Network.setCookie(
"cheese",
"gouda",
Optional.empty(),
Optional.of("www.selenium.dev"),
Optional.empty(),
Optional.of(true),
Optional.empty(),
Optional.empty(),
Optional.empty(),
Optional.empty(),
Optional.empty(),
Optional.empty(),
Optional.empty(),
Optional.empty()));

driver.get("https://www.selenium.dev");
Cookie cheese = driver.manage().getCookieNamed("cheese");
Assertions.assertEquals("gouda", cheese.getValue());
driver.quit();
}

Let’s walk through this code sample step by step for better comprehension. First, we initialize the ChromeDriver. A simple session gets created with that driver.

ChromeDriver driver = new ChromeDriver();

We get the DevTools instance (for later use).

DevTools devTools = driver.getDevTools();

Then we establish a session, which involves starting the initial WebSocket handshake connection. When we call createSession(), this sets up the WebSocket connection. During this process, certain aforementioned Domains, Events, and Commands must be called to execute the handshake — these calls occur in the background as part of the session creation.

devTools.createSession();

Now we move on to actually sending the commands. We send a command by performing devTools.send().

devTools.send(
Network.setCookie(
"cheese",
"gouda",
Optional.empty(),
Optional.of("www.selenium.dev"),
Optional.empty(),
Optional.of(true),
Optional.empty(),
Optional.empty(),
Optional.empty(),
Optional.empty(),
Optional.empty(),
Optional.empty(),
Optional.empty(),
Optional.empty()));

After that, we can use traditional Selenium WD [Classic] commands to go to a particular URL, verify the results, and quit the session.

driver.get("https://www.selenium.dev");
Cookie cheese = driver.manage().getCookieNamed("cheese");
Assertions.assertEquals("gouda", cheese.getValue());
driver.quit();

Console Logs

Let’s review another CDP domain supported by Selenium — the Logging Domain:

ChromeDriver driver = new ChromeDriver();
DevTools devTools = driver.getDevTools();
devTools.createSession();
devTools.send(Log.enable());
devTools.addListener(Log.entryAdded(),
logEntry -> (
System.out.println(“log: ” + logEntry.getText());
System.out.println(“level: ” + logEntry.getLevel());
));
driver.get(“http://example.com”);
// Check the terminal output for the browser console messages.
driver.quit();

This example is very similar to what we saw earlier. In the first 3 lines, we initialize a driver, get DevTools instruments, and create a session.

ChromeDriver driver = new ChromeDriver();
DevTools devTools = driver.getDevTools();
devTools.createSession();

Then we enable the Log-related domain. This is the instruction which does not return any information, it merely instructs to enable the Log domain.

devTools.send(Log.enable());

Now we add a listener. Adding a listener is what allows us to listen to events.

devTools.addListener(Log.entryAdded(),
logEntry -> (
System.out.println(“log: ” + logEntry.getText());
System.out.println(“level: ” + logEntry.getLevel());
));

We tell the script to inform us when any entry of log events happens on the server.

Selenium’s CDP Support: Challenges and Interim Solutions

This section explains how Selenium is able to support CDP domains, events, and commands for nearly any protocol version. However, this support is version-specific. Code that works today might not work tomorrow if parameters are added, event names are changed, or features are deprecated. We must be very cautious about these version specifics.

Selenium supports raw CDP, which requires SDETs to understand the existing commands and domains, their functions, and their purposes. This involves a lot of processes and overhead for testers to comprehend and execute in its raw form. Selenium provides this support because the maintainers have not fully implemented the BiDi protocol yet. The CDP support is merely a stop-gap solution. However, this method is not recommended due to the high overhead for anyone maintaining the code.

If necessary, this is the way for a tester to start testing over WebSockets, because it is supported in all Selenium language bindings. Despite the complexity and overhead, it offers a temporary solution until the BiDi protocol is fully integrated.

CDP Documentation

One source to check for the Selenium CDP documentation is https://www.selenium.dev/documentation/webdriver/bidi/cdp/

selenium.dev/documentation/webdriver/bidi/cdp/

Another source is the CDP web site — https://chromedevtools.github.io/devtools-protocol/. Though, be cautious with the experimental features, as those might get deprecated at any time.

chromedevtools.github.io/devtools-protocol/

Selenium supports all the CDP Domains, Events and Commands listed in the protocol page above. Since it’s version-specific, with each new Chrome version the BiDi maintainers download the protocol, which is available in the open source repository, do the mapping, generate all the classes with respect to the binary, which is attached to Selenium libraries and sort of ship it out. It’s a lot of work per version that they try to keep up with. When a new version comes in, they try to release it within a week, so that they can support the latest CDP protocol.

CDP — Automation Tools

Playwright and Puppeteer use CDP to control Chromium-based browsers (Chrome and Edge) programmatically for web automation and testing purposes.

For example, in order to click a button, WD Classic would identify the button element, move the mouse cursor over the element (it injects a mouse event) and perform the click action on that element. CDP simulates the analogous click action on the button in a different way. CDP has a different implementation in the back-end, when we perform that action.

In short, CDP performs 3 actions in the back-end:

  1. DOM.performSearch -- performs the search operation to identify the element
  2. DOM.querySelector -- uses querySelect to select the element
  3. DOM.dispatchEvent -- dispatches an event to simulate the Click action on that element

Here is a specific example of CDP implementation by Puppeteer:

await page._client.send('DOM.performSearch', { query: buttonSelector });
// id for search results
await page._client.send('DOM.performSearch', { query: ' ', searchId });
await page._client.send('Input.dispatchMouseEvent', { type: 'mousePressed', ... });
await page._client.send('Input.dispatchMouseEvent', { type: 'mouseReleased', ... });

WD Classic vs CDP

In recent years, CDP has gained a lot of traction in the automation industry. Let’s compare how WD Classic and CDP implementation differs side-by-side.

🔵 WD Classic is a standard protocol designed according to the W3C specification. Provides multiple language bindings for full flexibility.

🔵 CDP is a protocol, but it only supports Chromium-based browsers, such as Chrome and Edge.

🔴 WD Classic starts an HTTP server in the back-end and sends the commands to the browser driver. The driver carries these instructions on to the browser. Communication happens via the traditional HTTP response/request protocol in the Rest API format. To wait for an element, we do long-polling asking the server if an element if available, often multiple times.

🔴 CDP uses a WebSocket which is bidirectional in nature. WebSocket has the capacity to send the commands and concurrently listen to the events/messages from the server in real time.

🟣 WD Classic can perform operations in the browser UI, but cannot perform those operations in the DevTools console. It can’t control the DevTools programmatically. We cannot access network requests, console components, errors or events that happen in the DevTools.

🟣 CDP has a the power of accessing the browser DevTools. It can get the messages or errors from the console, mock the network requests, or wait until the DOM changes.

WD Classic — Disadvantages:

Synchronous Nature

  • WD commands are generally synchronous in nature. It means that the client sends and HTTP request and waits for a response from the browser server before proceeding to the next command.
  • E.g., if we want to click a button, first we need to verify that the button is enabled and is clickable, and then perform the click action. To achieve this, WD sends 3 synchronous requests one-by-one in order to make sure that the element is 1) visible and 2) clickable, and 3) performs the click action.
  • Due to its synchronous nature, WD waits until an operation is processed on the server side — this is a performance concern.

Limited Low-Level DevTools Control

  • Some of the low-level DevTools controls, such as Performance Profiling, Network Interception, Advanced DOM Inspection, and JavaScript Console Interactions, are not available in WD Classic.

Unidirectional Communication

  • Traditional HTTP-based communication, while suitable for certain scenarios, is not designed for persistent, low-latency connections. This is where WebSockets come into play. WebSockets provide a bi-directional communication channel between a client and a server, enabling real-time data transfer over a single TCP connection.
  • WD Classic is slow because it lacks the BiDirectional communication with the browser. It means the users have to poll for element availability or visibility, which leads to delays in test automation.
  • Because WD Classic is synchronous and unidirectional, we can send a request and receive response messages for that request at a later time. But we can’t actually know what’s happening on the browser server side.

CDP — Disadvantages:

Browser Compatibility — Chromium Only

  • This specific protocol is designed to be consistent only for the Chromium browsers. CDP doesn’t work with other browsers, like Safari and Firefox. Other browsers on the market have their own proprietary protocols and interfaces. Mozilla has done a great job at implementing a subset of CDP, but that is just for Puppeteer support. The support is rather incomplete, so anytime we are using Firefox, there’s a good chance things will break while using CDP, because of this partial half-baked support.

Version-Specific Dependencies

  • CDP has a caveat — it’s specific to the Chrome browser version. For every Google Chrome version release, there is a respective new release of a CDP version. This might cause a breaking change. Certain features might get deprecated or modified, affecting backward compatibility. We might write our code today and send CDP commands via this code. A test script written for the current browser version might not work for a previous or future browser version. Let’s say we’re sending 4 parameters. And tomorrow with the new Chrome version it might require 5 parameters, and our code will break. It affects the durability. We don’t want the code to break, hence due to frequent Chrome releases, we’ll have to deal with overhead.

Lacks Accommodation for Automation Needs

  • And while CDP supports automation, it’s important to understand that it was not designed with automation in mind. It was designed to provide this physical DevTools experience. It doesn’t keep that need for common automation use cases. It doesn’t necessarily address that in a straightforward manner.

Understanding WD BiDi

WD BiDi = CDP + WD Classic

Now that we have a solid understanding of the WD Classic and CDP, it’s easier to understand why BiDi was created and came into play. At a certain point, Chromium developers wondered why shouldn’t they merge both WD Classic and CDP tools into one protocol, so that they could utilize the power of both tools. That’s how WD BiDi was born.

WD BiDi is a new standard protocol, but it was not built entirely from scratch. It’s developed over WD, allowing us to continue working with WD Classic while utilizing the power of CDP.

WD BiDi is a cross-browser automation protocol. It’s an open standard that works across browsers, fast by default, and comes packed with all the features you need for test automation. How? It takes the best of Chrome DevTools Protocol (e.g., fast bidirectional messaging and low-level control) and WebDriver Classic (e.g., best cross-browser support, W3C Standard, testing-oriented), and combines them into the extraordinary WebDriver BiDi protocol. The vision behind BiDi is to give you full flexibility and let you write tests using any of your favorite tools and automate them in any browser or driver.

WebDriver BiDi: Kismet Child of WebDriver Classic and Chrome DevTools Protocol

This is certainly an exciting future for test automation. It takes a huge effort from various vendors working together to ensure this future.

BiDi Tools/Frameworks

Despite being a work in progress, popular automation tools like Selenium, WebDriverIO, and Puppeteer already have partial support for WebDriver BiDi.

WD BiDi Ecosystem
  • Selenium has adapted BiDi for multiple languages. The team is working towards having the high-level API that you see from this version release.
  • WebDriverIO has a dedicated package for WD BiDi. You just want to use that package out of the box, run with it and do BiDi tests.
  • Appium has also adapted to WD BiDi and added the APIs to support that.
  • Puppeteer is now a cross-browser automation tool, thanks to BiDi it has adopted.

Additionally, cloud providers, such as BrowserStack and SauceLabs, are following the trend of supporting BiDi.

Why BiDi?

How can we take advantage of WD BiDi in our automation scripts? WebDriver BiDi, or BiDirectional, represents a significant evolution from WD Classic in browser automation. Unlike WD Classic, which relies on HTTP, BiDi uses JSON payloads over WebSockets. This shift enables direct communication between the automation script and the browser without the need for a separate browser driver. Commands are sent directly to the browser, which responds with success or failure notifications.

WD BiDi: JSON-over-WebSocket Wire Protocol

The bidirectional nature of the protocol allows browsers to send events back to the automation tool asynchronously, opening up new possibilities for automation scenarios. This capability mirrors the functionality of CDP but extends it to a standardized approach that aims to be browser-agnostic, not limited to Chromium-based browsers.

From an automation perspective, Selenium clients can leverage the BiDi protocol to gain deeper insights and control over browser activities such as accessing console logs, monitoring network traffic, and intercepting requests. This bidirectional flow enhances debuggability and offers functionalities previously available only through non-standard automation protocols.

As browsers keep integrating the BiDi protocol natively, Selenium and other automation frameworks can now communicate directly with browsers without requiring separate WebDriver bindings, streamlining the testing process and enhancing compatibility across different browser environments.

CDP vs BiDi

Let’s break down the differences between CDP and BiDi protocols in various domains.

CDP vs BiDi

Browser Support

BiDi is designed for full flexibility, supported by all major browsers. It’s a web standard that they adhere to. Whereas, CDP is only meant for Chromium-based browsers.

Stability

CDP lacks backward compatibility, making it prone to breaking with browser updates. This issue often requires switching between Chrome versions, which can disrupt tests. In contrast, WebDriver has maintained stability over the years, remaining unaffected by browser upgrades and eliminating the need for constant test updates. This challenge with CDP necessitates the Selenium team to release version-specific bindings with every new Chrome version. Without regular updates to match browser changes, tests using CDP risk sudden and unpredictable failures. BiDi, however, was designed with standardization and long-term stability in mind, much like the enduring reliability of WebDriver.

Event Subscription

Those testers who have been closely working with CDP often ask how they can listen to events or something of interest happening in multiple windows or multiple tabs in a window. With CDP, we cannot do that. We’ll actually have to go to the window, send some commands to attach to it, then switch out and create a separate CDP session with a separate tab or window of interest. We can’t get all the global events. But BiDi makes it easier for us. With BiDi, we have an option to choose if we want to 1) subscribe to a single tab, 2) subscribe to all the windows, or 3) subscribe to all the tabs in a window. This choice is ours, and it provides full flexibility. And we can listen to all our events, like console logs, JS exceptions, and detect if the user prompt is open or closed.

Use Case

CDP and DevTools are excellent tools for debugging and troubleshooting applications, making the lives of testers and developers significantly easier. However, they are primarily developed and maintained by the Chrome DevTools team with a focus on debuggability. In contrast, BiDi is specifically designed for automation, catering to a different use case. Debugging often requires fine-grained control, which isn’t always necessary for automation. BiDi strikes a practical balance by providing the essential features needed for automation while ensuring the browser remains secure and efficient, avoiding memory leaks or performance issues. This thoughtful design simplifies testing and enhances the overall experience for testers.

Complexity

In CDP, performing a single action often requires sending multiple commands. In contrast, BiDi is designed to minimize round trips, offering a more streamlined and user-friendly experience that is easy to adopt and adapt to.

To demonstrate the complexity, let’s consider a code snippet that contains line devTools.createSessionIfThereIsNotOne(driver.getWindowHandle()) which creates a CDP session.

CDP session creation

When debugging just that one line of code, we observe logs containing strings “send”. For example, the command for CDP session creation generates 4 log lines that read something like:

Nov 27, 2024 12:38:42 PM org.openqa.selenium.devtools.Connection send

Here’s a snippet from the IDE console logs:

CDP session creation takes 4 ‘send’ commands

There are 4 “send” commands merely for creating a session and attaching to a single tab, so that we could listen to events from there. It has taken us 4 commands to create a CDP session with the WebSocket communication. That’s not counting the 2 HTTP requests we send earlier to identify the version of Chrome and get the correct end point for it.

Now, let’s look at a BiDi test. CompletableFuture future = new CompletableFuture(); is the line we are interested in.

BiDi session creation

When examining the logs generated during debugging, we notice that the IDE console doesn’t really show any logs. With BiDi, upgrading an HTTP connection to a WebSocket is seamless, requiring no additional handshake steps or commands to initiate a session. The browsers are already synchronized to provide the WebSocket connection, making it ready to use immediately. Simply connect and start writing your test.

BiDi session creation takes minimal steps. We can upgrade our HTTP to a WebSocket connection for BiDi without any additional handshake steps required or any more commands to start a session. The browsers are in sync with this, they just provide us with an already available WebSocket.

In comparison, CDP session creation involves over 4 WebSocket connections and 2 HTTP connections, while BiDi requires just a single session creation connection. This 1:6 efficiency ratio is evident during local runs but becomes even more pronounced when tests are executed on cloud providers, where the additional calls significantly increase latency.

BiDi: Revolutionizing Browser Automation

Standardized approach — W3C compliance

  • By adhering to the W3C WD specification, BiDi ensures standardized browser automation for consistency and compatibility with various tools and frameworks.
  • WD BiDi is developed by the W3C committee, which developed the WD protocol. WD is not hosted and maintained by one organization. It’s a standardized approach — all browser vendors have to implement the same API to allow testing. Any action we perform with WD BiDi is the same action in all the browsers. The approach that BiDi follows is relatively the same as WD, but it uses web sockets.

Cross-browser support — not specific to browser versions

  • WD BiDi follows W3C standards, enabling cross-browser compatibility, allowing the same test scripts to automate tests across different browsers without major modifications, saving time and effort.
  • Unlike CDP, as a W3C specification BiDi is not an individual stable implementation.
  • It’s not browser-specific. Once the BiDi protocol is in place, version specifics become a noise for a browser. Browsers are required to adapt the entire protocol as it is. They might do it incrementally until they complete it. And the protocol itself is something that can evolve (add/delete/modify items), but it will never lead to breaking changes.

Created with automation scenarios in mind

  • The protocol is developed with keeping automation scenarios in mind. It provides domains, events and commands, but it will never try to increase the burden on someone who’s testing. It tries to keep in mind a person who understands testing, and a person who understands browsers. There’s a difference.
  • For example, because CDP was devised to communicate with the browser, the commands are very specific to how the browser functions. Typically we, as testers, don’t know how the browser functions or how these commands map to what we want to do. But instead the aim here is to design BiDi in a way that an end-user/tester understands. Those user needs and scenarios are kept in mind.

Complementary (value add) to WD Classic protocol, with an option to only have the BiDi connection.

  • Complementary and Interoperable — The WebDriver BiDi protocol is set to revolutionize cross-browser automation by complementing and operating in parallel with the existing WD protocol. This dual approach allows for seamless integration, where both protocols can be used concurrently in testing environments. When a WD protocol session is initiated, a BiDi session can also be created, providing the necessary information to connect directly to the socket. This interoperability ensures that testers can leverage the strengths of both protocols without having to choose one over the other.
  • Seamless Transition — A key advantage of the BiDi protocol is its ability to facilitate a smooth transition for existing WD users and automation tools. Designed with backward compatibility in mind, the BiDi protocol ensures that elements identified through the find_element HTTP command in a WD test can be directly used with the BiDi protocol. This means that automation tools and testers can adopt the BiDi standard without losing any functionality, making it easier to integrate and utilize new testing capabilities. The interoperability between the protocols ensures a more robust and flexible testing environment, accommodating the evolving needs of web developers and testers.
  • Direct Browser Communication — Furthermore, browser vendors are working towards embedding the BiDi protocol within their browsers, eliminating the need for users to download specific drivers. This integration allows for direct communication with browsers for testing purposes, streamlining the testing process and enhancing efficiency. By implementing the BiDi protocol within browsers, vendors are paving the way for more efficient and effective cross-browser automation, ensuring that testers can perform comprehensive and accurate tests across various platforms. This advancement not only simplifies the testing process but also enhances the overall quality and reliability of web applications.

Low latency, bidirectional communication. Performance-driven design.

  • WD BiDi enables snappy bidirectional communication, allowing the browser to send real-time updates to the test script, improving synchronization and making testing faster and more reliable.
  • Our automation scripts will be faster than the previous implementation, because WD BiDi communicates via a bidirectional web socket. We can be aware of what’s happening in the browser in real time without sending a synchronous request.
  • Designed to make sure we get maximum out of a single command, so that we make minimum round trips. Performance enhancement is especially noticeable when executing tests remotely on a device farm or a cloud provider.

Low-level debugging control

  • WD BiDi offers low-level browser control, enabling advanced interactions, complex scenario simulation and thorough testing, especially useful for JS-heavy or browser-specific features in web apps.
  • Even when we use WD Classic in our scripts, we can still add WD BiDi and gain access to the DevTools to monitor console messages to help verify the uncaught exceptions and intended logs. We can also listen to JS exceptions, console logs, mock test data, intercept a network request, etc.
  • It gives us all the low-level debugging capability that we have in CDP. It doesn’t mean that CDP is the same as BiDi, that we just lift and pick the same protocol. A lot of thought process goes into what is required, what is needed, what is not heavy for the browsers. Keep in mind, CDP is very heavy and complex for browsers to implement. BiDi is designed keeping both the browser vendors and the end users in mind.

Developed by leading browser vendors together, keeping browsers in mind

  • There’s a working group that meets every month to discuss, design, and iterate over this protocol. That group consists of members from Puppeteer, Selenium, all browser vendors coming together to pitch ideas, to discuss what happens. They publish the meeting agenda and minutes publicly. It’s a good collaborative effort keeping the entire automation in mind.

WD BiDi is undoubtedly the future of browser automation!

Low-Level Debugging Control (Expanded)

Lower level BiDi protocol means more features for authors.

Lower-level than WebDriver — WD BiDi is aiming to be a slightly lower level protocol than WD Classic. Instead of starting from the premise that we should replicate the interaction of a real user with a single browser page at a time, WD BiDi provides the ability to run commands, receive events from all the loaded browsing contexts and different scripting realms, including, for example, running a script in the context of a worker. This will allow people writing tests to observe and interact with the full internal state of their web app.

  • Not so tied to end-to-end testing
  • Interact with all browsing context and realms, not just one active context

Below listed are some of the BiDi features testers can implement to achieve more precise debugging:

Listen to JS errors

  • WD BiDi listens to JS errors, allowing real-time detection and reporting, enhancing debugging capabilities during test execution.
  • If we have a JS-rich app that throws an error at some point in our automation, we can listen to those errors with WD BiDi and debug them to fix our issue.

Listen to console logs

  • WD BiDi enables real-time capture and analysis of console logs, aiding in debugging and logging during test execution.
  • We can listen to the browser console in the app and fetch the logs for the debugging purpose.

DOM Mutation

  • WD BiDi allows monitoring and reacting to changes in the DOM, facilitating dynamic web app testing and validation.
  • We can monitor the DOM structure/tree. BiDi can trigger an event whenever we want to inject any script.

Network Interception

  • WD BiDi enables capture and manipulation of network requests, facilitating advanced testing and analysis of web app performance.
  • We can listen to the incoming network requests/responses for the browser and intercept them.
  • We can filter and manage network commands, preventing certain commands from loading and redirecting others. This is highly powerful for mocking purposes, eliminating the need for service virtualization.

Communication between browsing context groups and other features that don’t fit into WD Classic model.

  • With BiDi, we can also use the Browser Context module, inject scripts, and perform all sorts of DevTools low-level controls.

WD BiDi Implementation Status

WD BiDi specification is a WIP, supported by diverse collaborators, such as:

  • Browser Vendors: Chrome (Google), Edge (Microsoft), Firefox (Mozilla), Safari (Apple)
  • Open Source Automation Projects: Selenium, WebDriverIO, Puppeteer, Appium.
  • Companies offering Browser Automation Solutions: BrowserStack, SauceLabs.
https://www.w3.org/testing/browser/

WD BiDi is a work in progress, under development for several years, with ongoing efforts to fully implement it. The Selenium team has been diligently integrating BiDi capabilities into tools like WebDriver and WebDriverIO, enhancing their suitability for testing modern web applications. This progress has been driven by a collaborative effort involving significant contributions from the above mentioned vendors, ensuring the new standard addresses the needs of the broader web development community. This partnership aims to deliver long-term stability and innovation in browser automation.

Source: https://youtu.be/V8lD0q29wAk?si=pSA_I9vsolf076xT&t=869

All browser vendors are actively contributing to the standard, but not all of them have implemented WD BiDi into their browsers yet:

  • Firefox already has advanced support available. For example, allowing basic network interception, starting with Firefox 124. And the Mozilla team will continue to implement new WD BiDi features over the year.
  • Chrome has partial support available. Google is committed to implementing missing APIs in both Puppeteer and Chrome to make WD BiDi production-ready by the end of the year. The implementation work the Google team is doing for WD BiDi is happening as part of the Chromium project, which Microsoft’s Edge is using and contributing to as well. The compatibility of Edge and every other Chromium-based browser will progress together with Chrome.
  • Safari is actively contributing to the standard, but hasn’t implemented WD BiDi yet.

Automation testers are looking forward for WD BiDi to be supported by all browsers. Because by being bidirectional, it will enable a lot of use cases that haven’t been possible before, all while being cross-browser compatible. What’s great is while browser vendors are working eagerly to put the finishing touches on WD BiDi to make it available for us, we can already get started with browser automation now. Through tools like Puppeteer, Selenium, or WebDriverIO, we can use the existing WD standard to automate all browsers, including Safari and Edge.

And for use cases that allow it, we can start using WD BiDi today. A few modules (Domains) have already been implemented for Selenium and WebDriverIO and can be used in automation scripts.

wpt.fyi/results/webdriver/tests/bidi

Domains/Modules:

  • session/ - a basic module where we can start a BiDi session
  • script/ - use this module to inject JS code when using WD BiDi
  • network/ - a module which can perform network interception
  • log/ - can read actual console or JS logs
  • input/ - events that happen when, for example, submitting a form. WD BiDi supports input events such as the keyboard strokes we perform.
  • errors/ and - browsing_context/ are high level modules

We can understand the content further by examining the domain. When we open the browsing_context/ module, we observe various sub-modules such as:

  • capture_screenshot/
  • classic_interop/
  • close/
  • context-created/
  • create/
  • dom_content_loaded/
  • fragment_navigated/
  • get_tree/
  • load/
  • navigate/
  • print/
  • reload/
  • set_viewport/

We can capture a screenshot, open a new browser, close it, create a new tab/window, navigate to a URL, print pages, or save to PDF, etc.

Web Platform Tests

Web Platform Tests uses WebDriver:

  • For runner
  • For testdriver APIs

So far, we’ve covered a lot about why WD is needed to help web testers and developers with cross-browser testing. But for this audience, it’s also useful to see how it’s going to help write tests for the browser engine themselves.

Currently, most testing of browser engines happens with web-platform-tests. And the web-platform-tests runner uses WD to schedule tests and provide features, such as the ability to generate trusted click events through the testdriver API. So this means that the limitations of the WD Classic (HTTP) protocol show up as limitations of what we can do on web-platform-tests. For example, it used to be quite difficult to use testdriver across multiple browsing contexts. And even more difficult when there were multiple browsing context groups involved. The solution was feasible. But the fact that WD Classic has a strict command/response structure only allows interacting with a single window at a time, makes this kind of feature involving multiple browsing context groups very difficult to implement technically.

With WD BiDi, many of these restrictions are being lifted, and it’s becoming possible to write tests that do things like inspect network requests or send messages between different browsing contexts, without resorting to server-side Python code.

Sample server-side Python code. Source: bit.ly/bidi-demo-2023.

BiDi should also make it easy to add features for testing other feature specifications, where the nature of the specification — for example, if it involves some kind of hardware — means that it’s not well suited to a command/response API.

The WD BiDi work should improve the ability to test the platform and make sure that new complex APIs could be vetted by all vendors, with as few interoperability issues as possible. Note that, as mentioned before, each browser vendor works at its own speed to deliver the BiDi features.

Each browser provider works at its own speed to deliver the BiDi features.

WD BiDi Challenges

  • Transitioning to BiDi presents challenges, particularly with asynchronous programming. Languages like Java and Ruby will require a combination of threads and lambdas, while Python and .NET will use async-await patterns. This shift necessitates a learning curve for testers, as they adapt to new programming paradigms and data structures.
  • Multiplexing WebSockets for Selenium Grid is another area SDETs will need to familiarize with.
  • Testers will also need to gain proficiency in the business of network loads.

The Selenium team is committed to integrating BiDi, phasing out vendor-specific CDP support. This transition will involve maintaining compatibility with existing Selenium functionality while gradually incorporating BiDi features.

BiDi API Code Samples

Let’s go over a couple of BiDi API demos.

Console Logs

Click on button “Click me for console logs” to generate a log entry in the DevTools Console.

In this scenario, we navigate to a sample AUT https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html to test if user messages get generated correctly in the browser console. I’m switching gears from Java to Python. Python code for BiDi is much more scarce than Java (at the time of writing this article), so I got a little curious about exploring the extent of its availability.

bidi_console_logs.py:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait

options = webdriver.ChromeOptions()
options.enable_bidi = True
driver = webdriver.Chrome(options=options)
wait = WebDriverWait(driver, 5)
try:
driver.get('https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html')
log_entries = []
driver.script.add_console_message_handler(log_entries.append)
driver.find_element(By.ID, "consoleLog").click()
wait.until(lambda _: log_entries)
assert log_entries[0].text == "Hello, world!"
log_entries.clear()
except Exception as e:
print(f"Error: {e}")
finally:
if driver is not None:
driver.quit()

To use WD BiDi, setting the capability in the browser options will enable the required functionality. In Python, enable_bidi enables the WebSocket connection for bidirectional communication:

options.enable_bidi = True

We then instantiate a ChromeDriver session:

driver = webdriver.Chrome(options)

We are using the script domain of the BiDi protocol to add a handler for the console log events.

driver.script.add_console_message_handler(log_entries.append)

We click a button, an event that triggers log generation.

driver.find_element(By.ID, "consoleLog").click()

Next, we use a lambda function to ensure that the test script does not continue execution until the log_entries list has been updated (i.e., a console message has been logged and captured by the handler). Without this wait, the assertion assert log_entries[0].text == "Hello, world!" might fail because the log entries might not yet be available when that line is executed.

wait.until(lambda _: log_entries)
assert log_entries[0].text == "Hello, world!"

Finally, we tidy up by clearing out the logs with log_entries.clear() and quitting our driver session with driver.quit() to release resources.

JavaScript Exception Logs

Click on button “Click me for console error” to generate a log entry in the DevTools Console.

Here’a another code snippet, very similar to the previous example. We are still using the script domain of WD BiDi.

bidi_js_exceptions.py:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait

options = webdriver.ChromeOptions()
options.enable_bidi = True
driver = webdriver.Chrome(options=options)
wait = WebDriverWait(driver, 5)
try:
driver.get('https://www.selenium.dev/selenium/web/bidi/logEntryAdded.html')
log_entries = []
driver.script.add_javascript_error_handler(log_entries.append)
driver.find_element(By.ID, "jsException").click()
wait.until(lambda _: log_entries)
assert log_entries[0].text == "Error: Not working"
log_entries.clear()
except Exception as e:
print(f"Error: {e}")
finally:
if driver is not None:
driver.quit()

This time, however, we utilize method add_javascript_error_handler() in lieu of add_console_message_handler(). We add a handler specific to events that relate to JS exceptions.

Summary

WebDriver BiDi stands for BiDirectional and aims to revolutionize cross-browser communication, addressing the limitations of current automation tools like Chrome DevTools. Initially, Chrome developers used the Chrome debug protocol to interact with the browser internals for testing and development. Tools like Puppeteer leveraged this to provide low-level browser access, but constant changes in browser internals created stability issues, necessitating frequent updates to test code.

Recognizing these challenges, the Selenium team collaborated with browser vendors to create a standardized protocol, WebDriver BiDi, which every browser vendor can implement. This standardization will allow developers to write code once and have it work across all major browsers, including Chrome, Firefox, Safari, and Edge.

WD BiDi differs from current Selenium implementations by providing more detailed, real-time information about browser events, such as console logs and JavaScript errors. It operates asynchronously, delivering information as events occur, which can enhance debugging but may increase network traffic.

The development of WD BiDi has been a collaborative effort, with significant contributions from browser vendors and the Selenium team. This collaboration aims to improve cross-browser testing and ensure the health of the web by adhering to open standards.

While the transition to WD BiDi presents challenges, especially in languages like Java and Ruby that handle asynchronous operations differently, it promises powerful features for testers. These include advanced network interception, basic authentication capabilities, and more precise control over browser automation.

Overall, WebDriver BiDi represents the future of cross-browser testing, offering a more robust and consistent approach aligned with modern web standards. As the Selenium team continues to develop and implement BiDi, testers can expect powerful new features that enhance their ability to automate and test web applications across multiple browsers.

𝓗𝒶𝓅𝓅𝓎 𝓉𝓮𝓈𝓉𝒾𝓃𝓰 𝒶𝓃𝒹 𝒹𝓮𝒷𝓊𝓰𝓰𝒾𝓃𝓰!

I welcome any comments and contributions to the subject. Connect with me on LinkedIn, X , GitHub, or Insta. Check out my website.

If you find this post useful, please consider buying me a coffee.


Selenium WebDriver BiDi: Kismet Child of WebDriver Classic and Chrome DevTools Protocol was originally published in Women in Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.