How APM Turns Load Testing from Meh to Magic

Published on January 4, 2025

APM + Load Testing = The Ultimate Performance Power Couple

Before I start my TED Talk on load testing nightmares and the so-called ‘learnings’ along the way, can we at least agree on what load testing is?

Load testing is a way to see how well a system (like a website or app) can handle a large number of users simultaneously. It’s like seeing if a bridge can handle heavy traffic without collapsing. The goal is to find the breaking point and fix any weaknesses before real users face issues.

I only stumbled upon load testing six months into my job as a QA. My definition of it has evolved over the last 1.5 years. My first attempt at load testing? I watched a random YouTube video on JMeter(an application used to load tests), threw in 3–4 APIs, bombarded the app with 10,000 requests, and declared it a success because the machine didn’t explode(just exaggerated “stopped running”).

Classic, right?

It worked, so I thought I had aced load testing. Now, I know I was missing about 99% of what load testing is supposed to check. I felt that as long as there were no errors in the response or I didn’t see any 500 errors(the backend server is not running status), the load testing was a success.

However, this approach is insufficient to ensure the application under test is truly robust and reliable.

The Vitals in the JMeter summary were useful with information on:

Error % — The percentage of requests that failed during the test.
Throughput — The number of requests processed by the server per second or minute
Average: The average response time (in milliseconds) for the samples collected.
Min: The minimum response time observed for the samples.
Max: The maximum response time recorded during the test.
Std. Dev.: The standard deviation of the response times, showing how much the response times deviate from the average. A higher value indicates more variability.
Received KB/sec: The amount of data received from the server per second, measured in kilobytes.
Sent KB/sec: The amount of data sent to the server per second, also in kilobytes.
Avg. Bytes: The average size of responses in bytes.

Why JMeter’s Summary is not Enough?

We gathered quite a bit of information from JMeter, but as a tester, it wasn’t enough for me to truly understand how the application performs in the wild(real world).

The information that JMeter provides is specific to API performances but, there is more than just API performance to be monitored. Here is where APM comes into the picture.

What is APM?

APM stands for Application Performance Monitor/Management. It’s like a health tracker for your app, showing you how well it’s running in real time. APM tools help you:

See how fast your app responds to users
Find and fix slow parts of your app
Understand how different parts of your app work together
Get alerts when something’s not working right

APM gives you a fuller picture of your app’s performance than just looking at API stats.

The stats include:

CPU % : The percentage of CPU resources utilized at a specific point in time
Memory Usage : The amount of RAM currently in use.
Heap Memory Usage : This refers to the amount of RAM consumed from the allocated memory for the application. Once the heap memory is exhausted, the application cannot exceed the allocated memory, thus avoiding interference with the memory allocated to other applications.
Garbage Collection : The process in which the Java Virtual Machine (JVM) automatically reclaims memory by identifying and removing objects that are no longer in use, freeing up space for new object allocations and improving application performance.
Thread Pool Utilisation : For applications with multi-threading (like Java), monitoring thread pool usage helps ensure there are enough available threads to handle requests efficiently and avoid bottlenecks.
Latency : Measures the time taken for data to travel from the client to the server and back. High latency can result in slow application response times.
Apdex Score (Application Performance Index) : A user satisfaction measurement based on response times. It scores app performance as satisfactory, tolerating, or frustrating.
Database Query Performance : Tracks the time taken for database queries and the number of queries executed per request. Slow or frequent queries can impact overall application performance.

Let me share some conditions I have come across in my journey:

The CPU:

The CPU usage should remain below 80%, but it can peak up to 90% during high-demand periods.

It’s also crucial to check the alerting systems during load testing to ensure that when thresholds are reached, there’s sufficient time to scale up the CPU and minimize potential downtime. In cases like upcoming sales, we typically double the server capacity during the first and last few days, especially when working within a tight budget.

While these practices are common in established companies, in startups, it’s important to communicate the expected maximum throughput (number of calls per unit time) to stakeholders so they can take the necessary precautions for a smooth workflow.

Memory:

RAM plays a vital role in boosting application performance without the high expense of upgrading the CPU.

Not always but only when the application needs some extra space during unforeseen circumstances like high memory-consuming API calls like synchronization data calls which are not triggered very often. Increasing RAM is not the only solution that will solve memory consumption there is another elephant in the room I found

which is…

Heap Memory:

Heap memory is the memory an application can use Even though the virtual machine or the on-prem machine has tons of memory if the heap memory is set to a lower value by default, the application will only be able to utilize the allocated portion of memory.

This constraint can lead to OutOfMemoryError if the application tries to use more memory than allocated in the heap.

Garbage Collection:

If it occurs in a regular rhythm, the system is mostly stable.

However, if it becomes erratic or irregular, there may be a memory leak that needs to be addressed. We can still dive deep into the rabbit hole of different garbage collector algorithms but those are for another day. When there is an anomaly in the garbage collector the developer usually gets a dump of the data and analyses it. You can also take part in finding the root cause.

A Good Performing Garbage Collector Graph

A Struggling Garbage Collector due to Memory Leak

Thread Pool Utilization:

When working with a Java-based web application, multiple threads are created to handle incoming requests. In scenarios involving high demand or memory leaks, the number of threads can grow excessively, potentially causing the server to crash. To mitigate this, it is recommended to configure proper timeouts to terminate threads that remain idle or unresponsive for too long. While increasing the thread pool size is a cost-effective way to handle sudden traffic spikes, this solution is temporary. If the traffic surge persists, the CPU may struggle to process tasks as fast as they accumulate in memory. This imbalance can lead to performance bottlenecks, eventually requiring a CPU upgrade to maintain stability and responsiveness.

Latency:

The acceptable latency varies depending on the type of application.

For example: Stock exchanges and gaming platforms require extremely low average latency (around 17 milliseconds) for optimal performance. • Internal communications between microservices in certain applications might demand latency measured in nanoseconds. • For typical web-based applications, a latency of 30–40 milliseconds is considered ideal, while latencies below 1000 milliseconds are generally acceptable during peak demand periods.

While increasing the CPU capacity can help reduce latency, the improvement is not always linear. Additionally, user location plays a significant role in determining latency, as physical distance and network conditions can impact response times.

Apdex Score:

I was unfamiliar with this score until I researched the topic. It provides an overall assessment of how well an application performs.

The score ranges from 0 to 1, with 1 representing perfect user satisfaction. For example, if your application consistently responds within the defined threshold (typically under 4 seconds), it will achieve a higher Apdex score. This metric is especially valuable during load testing as it offers a clear, numerical representation of user satisfaction levels.

An excellent score ranges from 1.00 to 0.94, a good score falls between 0.93 and 0.85, a fair score is between 0.84 and 0.70, and a poor score is between 0.69 and 0.49. Any value lower than this is considered unacceptable. These thresholds may vary depending on the context of the application.

              SatisfiedCount + (ToleratingCount * 0.5) + (FrustratedCount * 0)
Apdex t =    ----------------------------------------------------------------
	                          		TotalSamples

Database Query Performance:

Slowness in the application might be caused by database queries taking too long to execute. You can identify these issues by monitoring the query execution times. For database-intensive applications, analyze the queries to identify opportunities for optimization and improve overall performance.

As a tester, having a foundation in computer science fundamentals and understanding application architecture can be invaluable. This knowledge enables more meaningful conversations with developers and DevOps teams when discussing performance optimization strategies.

When you understand concepts like memory management, threading, and database optimization, you can:

Provide more specific and actionable feedback during performance testing
Better interpret performance metrics and identify potential bottlenecks
Contribute meaningful suggestions for performance improvements
Collaborate more effectively with technical teams during optimization discussions

Remember, the goal isn’t to become an expert in development or DevOps, but to build enough technical knowledge to be a more effective bridge between quality assurance and technical teams. This collaborative approach leads to better performance outcomes for the application.

Happy Testing!

Author: M M Kishore

Stay connected and stay informed with ProductPanda! If you’re a recent graduate who’s hungry for tech-knowledge, make sure to follow us 🎉 on our journey 🚀

Continue reading on website

Other news

🌸 Spring bingo - Wellness challenge - Halfway! 🌸

April 15, 2025

Hey Hivebriters! Quick check-in on our April Wellness Challenge - Spring Bingo! We're halfway through the month, and it's the perfect time to jump in if you haven't started yet (or keep going if you have)! Quick Reminders:Complete rows or columns for 5 raffle entries eachSquares with 📷 require photo submissions in the commentsSubmit completed rows/columns through the form by April 30thBonus entri