All the Ways to Filter Pandas Dataframes • datagy (2023)

Pandas is by far one of the essential tools required for data work within Python. It offers many different ways to filter Pandas dataframes – this tutorial shows you all the different ways in which you can do this!

Specifically, you’ll learn how to easily use index and chain methods to filter data, use the filter function, the query function, and the loc function to filter data.

Table of Contents

Filter Pandas Dataframes Video Tutorial

Loading the Sample Dataframe

Let’s begin by loading a sample dataframe that we’ll use throughout the tutorial.

import pandas as pddf = pd.read_excel('https://github.com/datagy/mediumdata/raw/master/sample_pivot.xlsx', parse_dates=['Date'])print(df.head())

We used the parse_dates parameter to ensure that the Dates column was read as datetime.

This returns:

 Date Region Type Units Sales0 2020-07-11 East Children's Clothing 18.0 3061 2020-09-23 North Children's Clothing 14.0 4482 2020-04-02 South Women's Clothing 17.0 4253 2020-02-28 East Children's Clothing 26.0 8324 2020-03-19 West Women's Clothing 3.0 33

Filter Pandas Dataframe by Column Value

Pandas makes it incredibly easy to select data by a column value. This can be accomplished using the index chain method.

(Video) Python Pandas Tutorial (Part 4): Filtering - Using Conditionals to Filter Rows and Columns

Select Dataframe Values Greater Than Or Less Than

For example, if you wanted to select rows where sales were over 300, you could write:

greater_than = df[df['Sales'] > 300]print(greater_than.head())print(greater_than.shape)

This returns:

 Date Region Type Units Sales0 2020-07-11 East Children's Clothing 18.0 3061 2020-09-23 North Children's Clothing 14.0 4482 2020-04-02 South Women's Clothing 17.0 4253 2020-02-28 East Children's Clothing 26.0 8325 2020-02-05 North Women's Clothing 33.0 627(648, 5)

We can see from the shape method that 352 rows have been filtered out of the dataset.

Check out some other Python tutorials on datagy, including our complete guide to styling Pandas and our comprehensive overview of Pivot Tables in Pandas!

Filtering a Dataframe based on Multiple Conditions

If you want to filter based on more than one condition, you can use the ampersand (&) operator or the pipe (|) operator, for and and or respectively.

Let’s try an example. First, you’ll select rows where sales are greater than 300 and units are greater than 20. Then you’ll do the same with an or operator:

and_operator = df[(df['Sales'] > 300) & (df['Units'] > 20)]print(and_operator.head())print(and_operator.shape)or_operator = df[(df['Sales'] > 300) | (df['Units'] > 20)]print(or_operator.head())print(or_operator.shape)

This returns:

 Date Region Type Units Sales3 2020-02-28 East Children's Clothing 26.0 8325 2020-02-05 North Women's Clothing 33.0 6277 2020-03-25 East Women's Clothing 29.0 6099 2020-11-03 East Children's Clothing 34.0 37414 2020-01-07 East Men's Clothing 30.0 360(299, 5) Date Region Type Units Sales0 2020-07-11 East Children's Clothing 18.0 3061 2020-09-23 North Children's Clothing 14.0 4482 2020-04-02 South Women's Clothing 17.0 4253 2020-02-28 East Children's Clothing 26.0 8325 2020-02-05 North Women's Clothing 33.0 627(790, 5)

Filter a Dataframe Based on Dates

Pandas also makes it very easy to filter on dates. You can filter on specific dates, or on any of the date selectors that Pandas makes available.

If you want to filter on a specific date (or before/after a specific date), simply include that in your filter query like above:

(Video) Complete Python Pandas Data Science Tutorial! (Reading CSV/Excel files, Sorting, Filtering, Groupby)

# To filter dates following a certain date:date_filter = df[df['Date'] > '2020-05-01']# To filter to a specific date:date_filter2 = df[df['Date'] == '2020-05-01']

The first piece of code shows any rows where Date is later than May 1, 2020. You can also use multiple filters to filter between two dates:

date_filter3 = df[(df['Date'] >= '2020-05-01') & (df['Date'] < '2020-06-01')]

This filters down to only show May 2020 data.

Using Pandas Date Selectors to Filter Data

Pandas date selectors allow you to access attributes of a particular date. Let's see how these work in action:

df['Months'] = df['Date'].dt.monthdf['Weekday'] = df['Date'].dt.weekdaydf['Year'] = df['Date'].dt.yeardf['Weekday Name'] = df['Date'].dt.weekday_nameprint(df.head())

Here we've assigned new columns, based on accessing just a single part of the Date column:

 Date Region Type Units Sales Months Weekday Year Weekday Name0 2020-07-11 East Children's Clothing 18.0 306 7 5 2020 Saturday1 2020-09-23 North Children's Clothing 14.0 448 9 2 2020 Wednesday2 2020-04-02 South Women's Clothing 17.0 425 4 3 2020 Thursday3 2020-02-28 East Children's Clothing 26.0 832 2 4 2020 Friday4 2020-03-19 West Women's Clothing 3.0 33 3 3 2020 Thursday

You can use these date selectors to filter your data.

If you only wanted to show data for Tuesdays, you could use the weekday selector:

tuesdays = df[df['Date'].dt.weekday == 2]print(tuesdays.shape)# (153,5)

This type of selecting data is incredibly helpful if you want to filter down to a specific year or month, and don't want to type and conditions.

It makes your code much easier to write (and to read).

Filter a Dataframe to a Specific String

If you want to filter rows to only show rows where there is a specific exists, you can do this also with the index method. Say you wanted to select only rows from East region:

(Video) Speed Up Your Pandas Dataframes

east = df[df['Region'] == 'East']print(east.shape)# Returns: (411, 5)

Filter To Show Rows Starting with a Specific Letter

Similarly, you can select only dataframe rows that start with a specific letter. For example, if you only wanted to select rows where the region starts with 'E', you could write:

e = df[df['Region'].str[0] == 'E']print(e.shape)# Returns: (411, 5)

Select Dataframe Rows based on List of Values

If you want to select rows matching a set of values, you could write long "or" statements, or you could use the isin method.

For example, if you wanted to select records from East and West Regions, you could write:

east_west = df[(df['Region'] == 'West') | (df['Region'] == 'East')]

Or, you could use the isin method, to make this much easier:

east_west2 = df[df['Region'].isin(['West', 'East'])]

This returns the same thing but is much easier to write!

Select Dataframe Rows Using Regular Expressions (Regex)

You can use the .str.contains() method to filter down rows in a dataframe using regular expressions (regex).

For example, if you wanted to filter to show only records that end in "th" in the Region field, you could write:

th = df[df['Region'].str.contains('th$')]

To learn more about regex, check out this link.

Select Null or Not Null Dataframe Rows

Pandas makes it easy to select select either null or non-null rows.

(Video) Python Pandas Tutorial 26 | How to Filter Pandas data frame for specific multiple values in a column

To select records containing null values, you can use the both the isnull and any functions:

null = df[df.isnull().any(axis=1)]

If you only want to select records where a certain column has null values, you could write:

null = df[df['Units'].isnull()]

To select only records with non-null records

To select only records that are not null, you can use the notnull function:

notnull = df[df['Units'].notnull()]

How to use the Pandas Query Function

The Pandas query function takes an expression that evaluates to a boolean statement and uses that to filter a dataframe.

For example, you can use a simple expression to filter down the dataframe to only show records with Sales greater than 300:

query = df.query('Sales > 300')

To query based on multiple conditions, you can use the and or the or operator:

query = df.query('Sales > 300 and Units < 18')# This select Sales greater than 300 and Units less than 18

How to use the Loc and iloc Functions in Pandas

The loc and iloc functions can be used to filter data based on selecting a column or columns and applying conditions.

Tip! To get a deep dive into the loc and iloc functions, check out my complete tutorial on these functions by clicking here.

For example, to select data from East region, you could write:

(Video) Python Filter Function - Intermediate Python Tutorial

loc = df.loc[df['Region'] == 'East']

Conclusion

In this post, we covered off many ways of selecting data using Pandas. We used examples to filter a dataframe by column value, based on dates, using a specific string, using regex, or based on items in a list of values. We also covered how to select null and not null values, used the query function, as well as the loc function.

To learn more about date selectors, check out the official documentation here.

Videos

1. Style Python Pandas DataFrames! (Conditional Formatting, Color Bars and more!)
(datagy)
2. Different Ways to Create a Pandas DataFrame | GeeksforGeeks
(GeeksforGeeks)
3. Python Pandas Tutorial 2: Dataframe Basics
(codebasics)
4. Basic Guide to Pandas! Tricks, Shortcuts, Must Know Commands! Python for Beginners
(Python Simplified)
5. Pandas Tutorial #1 - DataFrames (Python for Data Science)
(The Morpheus Tutorials)
6. Python Pandas Tutorial (Part 2): DataFrame and Series Basics - Selecting Rows and Columns
(Corey Schafer)
Top Articles
Latest Posts
Article information

Author: Moshe Kshlerin

Last Updated: 04/14/2023

Views: 6223

Rating: 4.7 / 5 (57 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Moshe Kshlerin

Birthday: 1994-01-25

Address: Suite 609 315 Lupita Unions, Ronnieburgh, MI 62697

Phone: +2424755286529

Job: District Education Designer

Hobby: Yoga, Gunsmithing, Singing, 3D printing, Nordic skating, Soapmaking, Juggling

Introduction: My name is Moshe Kshlerin, I am a gleaming, attractive, outstanding, pleasant, delightful, outstanding, famous person who loves writing and wants to share my knowledge and understanding with you.