Pandas, similar to Numpy, is one of Python data analysis libraries. It has a feature that allows us to filter rows from a DataFrame based on certain criteria and it’s really easy to do it. For instance, I have a DataFrame
df that has four columns, e.g. source IP address, source port, destination IP address, and destination port. When I want to search for a pair of source IP address and port, I can use this following line:
rows = df[(df["src_ip_addr"] == "192.168.66.6") & (df["src_port"] == 12345)]
However, I found out that this approach is really slow, particularly when you have a huge DataFrame. Surprisingly, there is a simple trick to fasten that code by using the
.values attribute in the filtering criteria. So, the previous command can be modified as follows:
rows = df[(df["src_ip_addr"].values == "192.168.66.6") & (df["src_port"].values == 12345)]
In my case, that change gave me 4 times faster running time.