Pandas, similar to Numpy, is one of Python data analysis libraries. It has a feature that allows us to filter rows from a DataFrame based on certain criteria and it’s really easy to do it. For instance, I have a DataFrame df that has four columns, e.g. source IP address, source port, destination IP address, and destination port. When I want to search for a pair of source IP address and port, I can use this following line:

rows = df[(df["src_ip_addr"] == "") & (df["src_port"] == 12345)]

However, I found out that this approach is really slow, particularly when you have a huge DataFrame. Surprisingly, there is a simple trick to fasten that code by using the .values attribute in the filtering criteria. So, the previous command can be modified as follows:

rows = df[(df["src_ip_addr"].values == "") & (df["src_port"].values == 12345)]

In my case, that change gave me 4 times faster running time.

Leave a comment

Your email address will not be published. Required fields are marked *

%d bloggers like this: