Speeding Up Pandas Rows Filtering

baskoroOctober 27, 2017Deep Learning, Tutorial1 comment

Pandas, similar to Numpy, is one of Python data analysis libraries. It has a feature that allows us to filter rows from a DataFrame based on certain criteria and it’s really easy to do it. For instance, I have a DataFrame df that has four columns, e.g. source IP address, source port, destination IP address, and destination port. When I want to search for a pair of source IP address and port, I can use this following line:

rows = df[(df["src_ip_addr"] == "192.168.66.6") & (df["src_port"] == 12345)]

However, I found out that this approach is really slow, particularly when you have a huge DataFrame. Surprisingly, there is a simple trick to fasten that code by using the .values attribute in the filtering criteria. So, the previous command can be modified as follows:

rows = df[(df["src_ip_addr"].values == "192.168.66.6") & (df["src_port"].values == 12345)]

In my case, that change gave me 4 times faster running time.

Tags:
pandas
python

baskoro

One thought on “Speeding Up Pandas Rows Filtering”

Mex February 22, 2018 at 2:20 pm

Good trick!

Reply

Speeding Up Pandas Rows Filtering

Like this:

Related

baskoro

One thought on “Speeding Up Pandas Rows Filtering”

Leave a comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Timeline

Share this:

Like this:

Related

baskoro

One thought on “Speeding Up Pandas Rows Filtering”

Leave a comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Timeline