
Boost Your Data Analysis Efficiency
đSave Time Looping Through DataFramesđ
A few months ago, I shared a tip on how to optimize your code, and I was thrilled to receive such positive feedback from the community. If you missed it, check out my post on using Pythonâs Multiprocessing package. In that post, I demonstrated how to divide your code to run in parallel across different processes, significantly improving performance.
Today, Iâm excited to share another time-saving tip that will help keep data analysts motivated and efficient in their work. Stay tuned for a practical technique that will streamline your data analysis process!

Nowadays, itâs becoming increasingly common to use AI agents to assist with daily work activities. Today, anyone can access high-quality information for free and leverage it to adapt their workflow using a Large Language Model (LLM) trained specifically for their tasks.
Creating an assistant for yourself depends on several variables, which Iâll be happy to explain in another post. However, today I want to focus on the time it takes to manipulate data, particularly the training data used to train an LLM. One of the most exhausting tasks is preparing datasets for training supervised machine learning components, which involves a lot of effort to get the training dataset ready for ML.
Another example Iâve noticed lately is the time spent applying complex calculations to dataframes with 60k instances. For anyone who has had to process large amounts of data, youâve probably experienced waiting with folded arms for a Jupyter Notebook cell to finish executingâââsometimes 10, 20, 40 seconds, or more. Initially, this might not seem like much, but over time, as you need to refactor code and consider new parameters, this waiting process can become discouraging. And trust me, it is!
So today, I will share a tip on how to iterate through a dataframe with a reasonable number of columns, making it easier to apply complex calculations to instances and reduce the time by 2â3 times, without using multiprocessing.
To prove this, letâs conduct a small experiment with financial market data (yet again). Set aside a dataset with data from variable income asset, the mini-dollar contract (WDO), and I will apply the Relative Strength Index (RSI) calculation to the closing price data on a 5-minute timeframe. Think of the timeframe as the interval between each instance in the time series. In other words, this will be applied to a dataframe that spans 1 month of trading for the asset, with intervals of 5 minutes.
The mathematical formula for the RSI is:
IFR = 100 - (100/(1+FR))
FR = MH / ML
Where:
- IFR (Ăndice de Força Relativa) or RSI: This indicator measures the relative strength of price movements. Itâs the Portuguese version of the RSI (Relative Strength Index).
- FR (Força Relativa): This is the division between the average of closing prices of upward movements by the average of closing prices of downward movements.
I applied this calculation considering a dataset of 2,506 instances. In Figure 1, I show the time it took to generate a result, and after a mere change in the code, I obtained the result in Figure 2.


Isnât that a significant improvement? Less than half the time it used to take. Iâll provide more details now.
As I mentioned before, when youâre analyzing data in a dataframe, youâre probably iterating over it like this:
for index, row in enumerate(df.itertuples()):
# rest of the code below #
And thereâs nothing wrong with using it like this, but, from what I understood what happened, pandas takes quite some time to associate the value of the row with its respective column.

So, instead of calling the row values using ârow.columnâ, change the way you iterate over columns, and treat it like a list, using numbers. To do this, start by replacing âdf_test.itertuples()â with âdf_test.valuesâ. Notice with figures 3 and 4 that the first line of both are the same, but one is in dataframe format, and the other each row is a list.

You might be wondering,
âSure, it improves processing speed, but wonât using a list compromise code readability? Associating index numbers with dataframe columns might not seem like the most intuitive approach.â
Fear not, as thereâs a solution. Instead of relying on index numbers, consider preparing a base, such as a dictionary, where you pair the column name (Key) with its index (value). This approach maintains the clarity of your code while enhancing performance.
âSo how does this work?â
Hereâs the breakdown: You still call the dataframe row data by the column name, but behind the scenes, youâre effectively organizing indices in a list and retrieving the appropriate value in the dataframe. Despite the added layer of complexity, this method proves to be faster than the standard way of iterating over a dataframe.
By implementing this approach, you can significantly enhance the processing speed of your dataframes without sacrificing code readability. Itâs a win-win situation that streamlines your data analysis workflow and boosts productivity.
Stay tuned for more insights on optimizing your Python code for efficient data analysis!