Pandas, similar to Numpy, is one of Python data analysis libraries. It has a feature that allows us to filter rows from a DataFrame based on certain criteria and it’s really easy to do it. For instance, I have a DataFrame df
that has four columns, e.g. source IP address, source port, destination IP address, and destination port. When I want to search for a pair of source IP address and port, I can use this following line:
rows = df[(df["src_ip_addr"] == "192.168.66.6") & (df["src_port"] == 12345)]
However, I found out that this approach is really slow, particularly when you have a huge DataFrame. Surprisingly, there is a simple trick to fasten that code by using the .values
attribute in the filtering criteria. So, the previous command can be modified as follows:
rows = df[(df["src_ip_addr"].values == "192.168.66.6") & (df["src_port"].values == 12345)]
In my case, that change gave me 4 times faster running time.
One thought on “Speeding Up Pandas Rows Filtering”
Good trick!