| by Arround The Web | No comments

Pandas Filter by Column Value

The Pandas library supports various methods for performing data selection techniques in Python. Sometimes, we must select only specific parts of the data that meet certain criteria/conditions. This selection and filtration make the data easier to understand. Data filtration is normally performed in Pandas DataFrame using the label and position of rows and columns. To filter data based on the columns, several methods are utilized in Python.

This blog provides a detailed tutorial on filtering Pandas DataFrame based on the value of the column using the below content:

Method 1: Filter Pandas DataFrame by Column Value Using “df.loc[]”

The “df.loc()” method selects rows and columns of specified Pandas DataFrame using the label’s value. This can also filter DataFrame based on the column’s value. Let’s understand it via the following examples:

Example 1: Using Single Condition

The following example code filter Pandas DataFrame based on the column value using a single condition:

import pandas
df = pandas.DataFrame({'Name':['Anna', 'Joseph', 'Henry', 'Lily'],
                       'Team':['A', 'B', 'C', 'A'],
                       'Age':[24, 19, 17, 15],
                       'Height':[4.3, 5.6, 6.3, 8.5]})
print(df, '\n')
print(df.loc[df['Height'] > 6.1])

In the above code:

  • The “pandas” library is imported.
  • The “DataFrame” is created with multiple columns using the “DataFrame()” function.
  • The “loc[]” method filters the Pandas DataFrame based on the specified column’s value using a particular condition.
  • According to the specified condition, only the value with a height greater than “1” will be returned in the output.

Output

The input DataFrame has been filtered based on the column value “Height”.

Example 2: Using Multiple Condition

The below code filter DataFrame of Pandas using the specified column values:

import pandas
df = pandas.DataFrame({'Name':['Anna', 'Joseph', 'Henry', 'Lily'],
                       'Team':['A', 'B', 'C', 'A'],
                       'Age':[24, 19, 17, 15],
                       'Height':[4.3, 5.6, 6.3, 5.5]})
print(df, '\n')
print(df.loc[(df['Age'] >= 17) & (df['Height'] < 5.5)])

Here in this code:

  • The “loc[]” method takes the multiple conditions based on the specified column value of DataFrame as an argument to filter the DataFrame.
  • According to the condition, the data having an age greater than “17” and a height smaller than “5.5” will be returned in the output.

Output

The input DataFrame has been filtered by column value according to multiple conditions.

Method 2: Filter Pandas DataFrame by Column Value Using “Square Bracket”

The “Square Bracket” can also be used to filter Pandas DataFrame by column value. Here is an example:

import pandas
df = pandas.DataFrame({'Name':['Anna', 'Joseph', 'Henry', 'Lily'],
                       'Team':['A', 'B', 'C', 'A'],
                       'Age':[24, 19, 17, 15],
                       'Height':[4.3, 5.6, 6.3, 5.5]})
print(df, '\n')
print(df[df['Age'] > 17])

In the above code:

  • The “df[df[‘Age’] > 17]” syntax is used to filter Pandas DataFrame based on the specified column value.
  • The condition indicates that only the rows with an “Age” column value greater than “17” will be returned.

Output

The DataFrame has been filtered according to the column value.

Method 3: Filter Pandas DataFrame by Column Value Using “isin()” Method

The “df.isin()” method of Pandas DataFrame is used to verify if the particular value exists/present in the DataFrame. This can be utilized to filter DataFrame according to the particular column value. Here is an example:

import pandas
df = pandas.DataFrame({'Name':['Anna', 'Joseph', 'Henry', 'Lily'],
                       'Team':['A', 'B', 'C', 'A'],
                       'Age':[24, 19, 17, 15],
                       'Height':[4.3, 5.6, 6.3, 8.5]})
print(df, '\n')
df1 = df[df['Team'].isin(['A', 'B'])]
print(df1)

In the above code:

  • The “isin()” method filters DataFrame by column value passed as an argument.
  • In this case, the “Team” column value “A” and “B” is passed to the “isin()” method to filter DataFrame.

Output

The DataFrame has been filtered based on the “Team” column value.

Method 4: Filter Pandas DataFrame by Column Value Using “query()” Method

The “query()” method evaluates the Pandas DataFrame by taking the query string expression as an argument. Here we use this method to filter Pandas DataFrame according to the column value:

import pandas
df = pandas.DataFrame({'Name':['Anna', 'Joseph', 'Henry', 'Lily'],
                       'Team':['A', 'B', 'C', 'A'],
                       'Age':[24, 19, 17, 15],
                       'Height':[4.3, 5.6, 6.3, 8.5]})
print(df, '\n')
df1=df.query("Age >= 19")
print(df1)

In this code block:

  • The “query()” method is used to filter Pandas DataFrame by taking the specified query expression “Age >= 19”.
  • This query expression indicates that DataFrame with an “Age” column value greater than or equal to “19” will be returned to output.

Output

The DataFrame has been filtered by the specified column value.

Conclusion

The “df.loc[]”, “Square Bracket”, “isin()”, and “query()” methods are used to filter Pandas DataFrame based on the specified column value. These methods can be utilized to filter Pandas DataFrame by column value using single conditions or multiple conditions. This guide provided a tutorial on filtering Pandas DataFrame according to the particular column value.

Share Button

Source: linuxhint.com

Leave a Reply