Appium Architecture

Published on July 4, 2025

Architecture & Ecosystem Overview — Part 1

Table of Contents — Part 1:

Introduction
Client/Server HTTP Model
WebDriver Protocol
WebDriver Protocol API Examples
WebDriver Protocol Versions
Client/Server Communication Flow
Appium Protocol Extensions
Conclusion

Introduction

Appium is one of the most powerful open-source automation frameworks for mobile applications, supporting iOS, Android, and hybrid apps through the familiar WebDriver protocol. While you can write Appium tests without understanding its internals, knowing the architecture becomes crucial when debugging failures or optimizing performance.

In this guide, we’ll explore Appium’s client-server architecture, examine how it builds upon the WebDriver protocol, and discover the mobile-specific extensions that set it apart from web automation tools. Whether you’re a seasoned engineer or newcomer, this architectural knowledge will help you write more robust tests and troubleshoot issues with confidence.

Client/Server HTTP Model

Appium is an HTTP server that speaks the WebDriver protocol.

At its core, Appium is an HTTP server that communicates using the WebDriver protocol. This protocol was originally developed by the Selenium Project for automating web browsers, which is why it’s known as WebDriver. Over time, it was adopted as a W3C standard, making it universally supported by all major browsers. Appium builds on this foundation - just like Selenium - but instead of targeting browsers, it enables automation for mobile apps. In essence, Appium functions as a specialized web server that speaks the same protocol as Selenium, tailored for mobile platforms.

You don’t interact with Appium directly - instead, a client library converts your commands into HTTP requests and sends them to the Appium server.

What this means is that when you’re writing an Appium test and you’re When you write an Appium test, you’re not calling Appium commands directly. Instead, you’re using a client library that takes the commands from your test script, converts them into HTTP requests, and sends them to the Appium server. The server then interprets these requests and translates them into automated actions on the device - though it doesn’t perform these actions directly itself.

Appium turns incoming requests into automation behaviors on a device by forwarding the request to a 𝚍𝚛𝚒𝚟𝚎𝚛.

Usually, it works by sending the command from your HTTP request to a component called a driver. This driver is responsible for executing the automation on a particular device and platform

The driver returns the result of the command to Appium, which then sends it back to the Appium client as an HTTP response. This response is parsed into a format that your test code can use.

After the automation is executed, the driver determines whether the action was successful or, if the command was meant to retrieve information from the device, generates the appropriate response. This result is sent back to Appium, which then formats it into an HTTP response following the WebDriver protocol. The response is returned to your Appium client library, which parses it into a format native to your programming language - for example, a specific class in Java or a Python object. You can then interact with this object just like any other in your language, making it easy and convenient to automate using Appium from your preferred programming environment.

WebDriver Protocol

A browser automation API built on HTTP and object-oriented design

Let’s take a closer look at the WebDriver protocol. At its core, it’s an object-oriented API for automating browsers, and it operates over HTTP. But what does that mean?

“Object-oriented” refers to how different elements in the browser - like windows, elements, or frames - are treated as objects. These objects are recognized and maintained by both the API and the browser itself, allowing them to persist and be interacted with consistently throughout an automation or test session.

WebDriver Commands Use HTTP Methods, Routes, and JSON Bodies

The WebDriver protocol provides a set of commands - things like retrieving a page’s source, locating elements, or typing into a text field. These commands often require parameters to define how they behave or what they act on.

In WebDriver, command parameters are expressed in three main ways:

HTTP Method: The protocol uses just three - GET, POST, and DELETE - to distinguish the type of action.
Route: This is the part of the URL after the domain (e.g., in example.com/session/123, the /session/123 is the route). It helps identify what resource the command targets.
JSON Body: For POST requests, additional parameters are often included in a JSON payload. This is where things like element selectors or text inputs are specified.

Together, the HTTP method, route, and body define the structure and intent of each WebDriver command.

Command responses are returned as JSON objects

After a command is executed, the result is sent back to your test script as a JSON-formatted response. This is how the WebDriver API communicates outcomes - whether it’s a success, failure, or the data you requested.

That’s the basic structure of how WebDriver works under the hood. It may sound a bit abstract when explained in theory, so let’s dive into a few concrete examples to bring it to life.

WebDriver Protocol API Examples

𝙿𝙾𝚂𝚃 /𝚜𝚎𝚜𝚜𝚒𝚘𝚗 - Starts a new WebDriver session with configuration parameters; returns a unique session ID.

This is our first WebDriver command example. It uses the POST method with the /session route to tell the server - like Appium - that we want to initiate a new automation session.

To do this, the client sends a JSON body along with the request. This body includes capabilities like the device type, platform, app details, and more. The server uses this information to configure the session accordingly.

In response, the server returns a JSON object containing a unique session ID. This ID represents the session and must be included in future requests. Even though each HTTP request is stateless by nature, the server keeps track of the session using this ID behind the scenes.

This is where the object-oriented nature of the WebDriver API begins to show: the session becomes the first persistent “object” created in your automation workflow.

𝙿𝙾𝚂𝚃 /𝚜𝚎𝚜𝚜𝚒𝚘𝚗/:𝚜𝚎𝚜𝚜𝚒𝚘𝚗_𝚒𝚍/𝚎𝚕𝚎𝚖𝚎𝚗𝚝 – Locate an element in the UI. Return an element ID.

The next API command we’ll look at is find element. This command relies on an existing WebDriver session, so it requires a valid session ID, which you would have received earlier from a POST /session call.

The command is defined using the POST method and the route /session/:session_id/element. The :session_id placeholder represents the ID of the active session. You can’t use this command without first creating a session and referencing its ID - this tells the server which session context you're working in.

Along with the session ID, the client sends a JSON payload that describes what element to find - usually by a selector like ID, XPath, or accessibility label (this part isn’t shown here).

If Appium successfully finds the element, it doesn’t send the actual UI element back - that’s not possible over the network. Instead, it generates a unique element ID, which it returns to the client. This ID acts as a reference to the real UI element on the device.

This introduces the second object in the WebDriver model: the element object. While the physical UI element only exists on the device, your test script interacts with it indirectly via the element ID.

𝙿𝙾𝚂𝚃 /𝚜𝚎𝚜𝚜𝚒𝚘𝚗/:𝚜𝚎𝚜𝚜𝚒𝚘𝚗_𝚒𝚍/𝚎𝚕𝚎𝚖𝚎𝚗𝚝/:𝚎𝚕𝚎𝚖𝚎𝚗𝚝_𝚒𝚍/𝚌𝚕𝚒𝚌𝚔 – Tap on a specific element within an active session.

POST /session/:session_id/element/:element_id/click

This command demonstrates how to use an element ID to perform an action. It uses the POST method and the route /session/:session_id/element/:element_id/click.

In this case, :session_id refers to the active automation session, and :element_id identifies a specific UI element previously located using a find element command.

Together, these IDs tell Appium: “Within this session, tap on this particular element.”

The client sends this command, and Appium performs the click (or tap) action on the device.

𝙿𝙾𝚂𝚃 /𝚜𝚎𝚜𝚜𝚒𝚘𝚗/:𝚜𝚎𝚜𝚜𝚒𝚘𝚗_𝚒𝚍/𝚎𝚕𝚎𝚖𝚎𝚗𝚝/:𝚎𝚕𝚎𝚖𝚎𝚗𝚝_𝚒𝚍/𝚟𝚊𝚕𝚞𝚎 – Send text input to a found UI element.

POST /session/:session_id/element/:element_id/value

Here’s another example following the same API structure. This command uses an existing session and a previously located element to send keystrokes - typically to a text field.

The route ends in /value and the request includes a JSON body with the text to be entered. Appium then sends that text to the specified element on the device.

Once you understand this pattern, the rest of the WebDriver API becomes easy to follow.

𝙶𝙴𝚃 /𝚜𝚎𝚜𝚜𝚒𝚘𝚗/:𝚜𝚎𝚜𝚜𝚒𝚘𝚗_𝚒𝚍/𝚎𝚕𝚎𝚖𝚎𝚗𝚝/:𝚎𝚕𝚎𝚖𝚎𝚗𝚝_𝚒𝚍/𝚝𝚎𝚡𝚝 – Retrieve the visible text of a UI element.

While most WebDriver commands use POST, some use GET. This command retrieves the text currently displayed in a specific UI element, using both the session ID and element ID.

It tells Appium: “What’s the current text shown in this element?” - perfect for assertions or validations during a test.

𝙳𝙴𝙻𝙴𝚃𝙴 /𝚜𝚎𝚜𝚜𝚒𝚘𝚗/:𝚜𝚎𝚜𝚜𝚒𝚘𝚗_𝚒𝚍 - Ending a WebDriver Session

After you’ve completed your automation steps, it’s important to end the session. This signals the Appium server to clean up and free the device for future use.

To do this, you send a DELETE request to /session/:session_id, specifying which session to close. It’s the final step in the test lifecycle, wrapping up the interaction cleanly.

Once you understand this pattern, the rest of the WebDriver API simply builds on it with more commands.

You don’t need to understand the raw HTTP layer to write Appium tests - your client library handles all of that behind the scenes in your preferred programming language. But it’s helpful to know how the WebDriver API works under the hood. In fact, you could run tests without writing any code at all - just by sending HTTP requests directly using tools like cURL or Postman.

WebDriver Protocol Versions

There are essentially two versions of the WebDriver protocol:

The older JSON Wire Protocol, used by Selenium and Appium in earlier years
The W3C WebDriver spec, now the official standard and widely adopted

Most modern Appium and Selenium servers support the W3C spec, so you’ll likely never encounter the old one. But if you do run into strange behavior or compatibility issues, it might be due to an outdated client or server still using the JSON Wire Protocol.

While both versions are very similar - many commands are even identical - the main differences lie in request/response formatting and parameter structures.

Client/Server Communication Flow

Let’s break down how an Appium test actually runs — step by step — focusing on the client-server communication that drives it all.

Every Appium test starts with the client, which is responsible for initializing the test session. The first thing the client does is define the capabilities, a.k.a. caps, — a set of key-value pairs that specify the parameters for the session. Think of it as a configuration blueprint: what platform, OS version, device name, and app under test you’re targeting.

For example, specifying platformName: iOS, platformVersion: 26, and deviceName: iPhone Simulator tells Appium to launch an iOS 26 simulator. The app capability points to the path of your .app or .ipa file. These are essential capabilities you'll use frequently.

There are also optional capabilities, like noReset, which tells Appium to skip resetting the app state between sessions. While not always needed, these flags can significantly affect how tests behave across runs — and there are at least 100+ capabilities available, so it's worth exploring them based on your use case.

Depending on the language you’re writing in, these capabilities might look like a dictionary in Python or a custom object in Java — but ultimately, they’re sent to the Appium server as a JSON payload.

Once the capabilities are sent, the Appium server processes the request, determines the type of automation needed (e.g., iOS, Android), and launches the corresponding driver. At this point, the test session is officially live, and your application may also be launched on the target device.

In return, Appium provides a session ID, which the client retains behind the scenes. As a test author, you typically don’t deal with this ID directly — the client library manages it for you to simplify subsequent command interactions.

Now that the session is active, the client can send automation commands — like finding UI elements, clicking buttons, or verifying text. Each command is translated into a JSON request, sent to the server, routed to the correct driver, executed on the device, and then returned to the client with a result (or null if no return value is needed).

The client parses this response and presents it in a native format (e.g., an object, string, or boolean) for your test code to use. This loop — send command, get result, assert or verify — continues until your test logic is complete.

Finally, when the test is done, you call driver.quit() (or the equivalent in your language) to end the session. Appium then shuts down the app, releases the device, and performs cleanup tasks to prepare for future sessions.

This entire lifecycle — from capability negotiation to session teardown — is governed by the WebDriver protocol, which Appium extends for mobile automation.

Appium Protocol Extensions

The WebDriver spec was built for browsers, so it doesn’t support every action needed for mobile apps.

The WebDriver spec was originally designed for browser automation - so it doesn’t cover everything we might need when testing mobile apps. To bridge this gap, Appium introduces protocol extensions. These are custom commands that go beyond the standard spec, allowing Appium to support mobile-specific actions that aren’t possible in web automation. So how does Appium make this work? Let’s take a look…

Appium enhances the spec with custom extensions to support advanced mobile automation tasks.

To support mobile-specific features not covered by the WebDriver spec, Appium introduces its own protocol extensions. These additions unlock powerful automation capabilities - like pushing a file to a device’s file system, something irrelevant in browser automation but essential for mobile testing (e.g., uploading a photo to the camera roll on Android). These extensions fill in the gaps, making mobile automation as versatile as it needs to be.

These extensions follow the WebDriver spec format but aren’t included in the official specification - they’re documented in the Appium docs.

Appium’s custom commands follow the same structure and style as those in the official WebDriver spec - they use the same HTTP methods and route formats. However, they aren’t listed in the W3C WebDriver documentation because they’re not part of the official standard. Instead, these extensions are documented in the Appium docs. The WebDriver protocol was designed to be extensible, so Appium’s additions are fully compliant - they just go beyond what’s needed for browser automation and cater specifically to mobile testing.

Conclusion

Understanding Appium’s architecture makes you a more effective mobile automation engineer. By grasping the client-server model, WebDriver command flow, and mobile extensions, you’re equipped to tackle real-world testing challenges with confidence.

The beauty lies in Appium’s simplicity: every test command follows the same pattern of 𝚌𝚕𝚒𝚎𝚗𝚝 𝚛𝚎𝚚𝚞𝚎𝚜𝚝 → 𝚜𝚎𝚛𝚟𝚎𝚛 𝚙𝚛𝚘𝚌𝚎𝚜𝚜𝚒𝚗𝚐 → 𝚍𝚛𝚒𝚟𝚎𝚛 𝚎𝚡𝚎𝚌𝚞𝚝𝚒𝚘𝚗 → 𝚛𝚎𝚜𝚙𝚘𝚗𝚜𝚎. Understanding this flow helps you debug faster, optimize performance, and make informed automation decisions.

Ready for more? In Part 2, we’ll explore the ecosystem of Appium clients, drivers, and plugins that bring this architecture to life.

🐞 𝓗𝓪𝓹𝓹𝔂 𝓣𝓮𝓼𝓽𝓲𝓷𝓰 & 𝓓𝓮𝓫𝓾𝓰𝓰𝓲𝓷𝓰!

P.S. If you’re finding value in my articles and want to support the book I’m currently writing - Appium Automation with Python - consider becoming a supporter on Patreon. Your encouragement helps fuel the late-night writing, test case tinkering, and coffee runs. ☕📚
👉 patreon.com/LanaBegunova 💜

Continue reading on website

Other news

🌸 SPRING BINGO CHALLENGE: WE HAVE A WINNER! 🌸

May 5, 2025

The Results Are In (Drumroll Please...)April has officially sprung its last days, and our wellness warriors have completed their final bingo squares! Time to announce who's taking home the glory (and the dinner reimbursement)!🏆 GRAND CHAMPION EXTRAORDINAIRE: Romain !Congratulations to our Spring Champion! A winner was drawn randomly out of the participants and the fate stopped on our one and only