How to save time in your programming tasks.

Published on March 3, 2024

How multiprocessing and multithreading can help you save time in programming tasks.

Hey everyone, just like every developer, I’ve been stuck in a bottleneck during work, and the solution I found to resolve it might help you too. So, I’ll briefly describe the situation for you: I’m automating a strategy for stock market operations (Day Trading), and I found myself in a situation where two techniques are competing against each other. I could run these techniques separately, but that compromises the profitable behavior of the strategy as a whole, which means I need to figure out a way for both to run simultaneously, separately, and independently. And there’s more! There’s a condition that both techniques share common parameters.

The answer to this came after a training course in time series analysis in Python, where the concept of multiprocessing emerged as the magic formula for my problem. The two main advantages are: the number of tasks that can be performed simultaneously and the time savings in processing all tasks. So, before showing some examples of how to use it, let’s level up the knowledge here. There are essentially two ways for you to manage different tasks to be executed in your program, which would be using multithreading or multiprocessing.

Threads are lightweight execution units with the power to transform the way programs operate, providing concurrency and parallelism within a process. If you delve deeply into the subject, you’ll notice a dynamic architecture that allows the “simultaneous” execution of multiple tasks, enhancing the potential for performance and responsiveness. Multitasking, where several tasks are performed at the same time, becomes possible through the concept of multithreading.

Each thread has its own state — whether it’s runnable, blocked, or running. These dynamic states are managed by the operating system, which orchestrates the dance between the threads, allowing them to cooperate and share resources without conflicts. The creation and management of threads involve the programmer’s skill in ensuring that these entities are orchestrated safely, avoiding errors and race conditions.

When I tested multithreading, I had already solved my problem, but another one emerged: the program’s runtime almost doubled, which became too costly for me to continue in that way.

Later, when I looked into it closely, I understood how the Global Interpreter Lock (GIL) comes into play. The GIL is a unique piece in the Python puzzle. It acts like a orchestrator, allowing only one thread to execute Python code at a time. This creates a scenario where, even on systems with multiple cores, the simultaneous execution of Python code is limited. The GIL is a double-edged sword: while preventing race conditions, it can also become a bottleneck for applications aiming to take full advantage of multi-core systems.

The good news is that there is an elegant workaround to overcome the limitations of the GIL: multiprocessing. I discovered this module while exploring ways to save time in training prediction and classification models, and this was during the training course I mentioned earlier.

Multiprocessing in Python is an approach that circumvents the GIL problem by creating separate processes instead of threads. Each process has its own Python interpreter and memory space, so there is no interference from the limitations imposed by the GIL. This allows multiple Python processes to run in parallel and effectively utilize the available CPU resources on multi-processor systems.

Now, knowing what multiprocessing is, you can download it using pip install multiprocessing or pip3, depending on the machine you are using. I'll start by providing a simple code snippet using the Python time module to simulate methods that take time to process, and I'll run them within a Jupyter Notebook. Here’s something that may hinder the reproduction of this experiment: the behavior of the multiprocessing module can vary depending on the IDE you are using. In the case of Jupyter, the methods used need to be imported from a separate library/class. Some people have reported this kind of issue on Stack Overflow, feel free to check this later.

def timer_2sec(loops):
    for i in range(loops):
        sleep(2)

def timer_5sec(loops):
    for i in range(loops):
        sleep(5)

def timer_7sec(loops):
    for i in range(loops):
        sleep(7)

from methods import timer_2sec, timer_5sec, timer_7sec
from time import perf_counter

print('Starts...')
b1 = perf_counter()
timer_2sec(3)
timer_5sec(1)
timer_7sec(1)
e1 = perf_counter()

print(f'Time diff: {e1 - b1}')

Here theres is nothing new: i executed these methods to check how long usually cost to run this code. The result was 18 second, ok, but when it comes down to multiprocessing, things starts getting different.

Check out this code below:

from time import perf_counter
from methods import timer_2sec, timer_5sec, timer_7sec
from multiprocessing import Process

if __name__ == '__main__':
    p1 = Process(target=timer_2sec, args=(3,))
    p2 = Process(target=timer_5sec, args=(1,))
    p3 = Process(target=timer_7sec, args=(1,))
    
    print('Starts...')
    b = perf_counter()
    p1.start()
    p2.start()
    p3.start()
    
    p1.join()
    p2.join()
    p3.join()
    e = perf_counter()
    print('... finished \n')
    
    print(f'time spent: {e - b}')

Here, I create instances of ‘Process’ to execute each of the methods. The method selection is indicated with the “target” parameter, and the method arguments are passed in “args,” which only accepts tuples. The multiprocessing execution needs to be indented within an ‘if name == ‘main’:’ block. For more information on how to use the module, you can check this link by clicking here. When I call ‘.start’ on the created instance, I signal when the method should be executed, and ‘.join’ makes the program wait for the processes to finish and get the results.

The use of multiprocessing resulted in a time savings of 11 seconds, less than half of what would naturally be required. Impressive, isn’t it?

This example was to show you how to apply the module directly, without any fuss, and prove to you how it can be a game-changer in your day-to-day tasks. Regarding the problem I mentioned earlier, to make processes communicate by sharing the same parameters, you can achieve that with a class from the module itself, such as: from multiprocessing import Array, Value, where you designate a list or a specific value to be shared.

This significantly eased my life in general. Knowledge I acquired during machine learning model training classes for time efficiency can be applied in any field, regardless of the area of expertise. Today, I am using this knowledge to diversify the way I:

Run tests on databases;
Reduce the execution time of machine training tasks;
Speed up automated tests for web app interfaces with Selenium…

See ya !

Continue reading on website

Other news

🌸 Spring bingo - Wellness challenge - Halfway! 🌸

April 15, 2025

Hey Hivebriters! Quick check-in on our April Wellness Challenge - Spring Bingo! We're halfway through the month, and it's the perfect time to jump in if you haven't started yet (or keep going if you have)! Quick Reminders:Complete rows or columns for 5 raffle entries eachSquares with 📷 require photo submissions in the commentsSubmit completed rows/columns through the form by April 30thBonus entri