| by Arround The Web | No comments

Pandas Case When

In this tutorial, we will perform different operations by using case statements and if-else statements. A case statement makes it possible to compare the value of a variable to a range of potential values. When the set of values is referred to or passed in the case statement, each value inside the set is checked by the cases or conditions inside the statement.

Case statement in the Pandas DataFrame provides an output or returns a value if the condition is satisfied.

Scenario 1: Use numpy.where() as Case When

In this scenario, we will check each condition for the value and, if the condition is satisfied, will generate the output or return the value against the condition. We need to import the NumPy module. Otherwise, a module exception is thrown.

Here, numpy.where() accepts the condition as the first parameter. If the condition is true, then the value specified in the second parameter is assigned to the row. If the condition fails, then the value/condition specified as the third parameter is assigned. This way, we can implement the Case Statement in the Pandas DataFrame.

Syntax

DataFrame_object[‘column’] = numpy.where(condition 1 , ‘value’,

numpy.where(condition 2, ‘value’,

……

numpy.where(condition n, ‘value’, ‘value ’)))

Example 1

Let’s create the Pandas DataFrame named score that holds one column named “Rating” with 5 numeric values. Then, create a new column named “Grade” and specify values for each row based on the condition specified inside numpy.where() clause.

If the Rating is less than or equal to 50, then assign Grade as “D”. Otherwise, assign Grade as “A”.

# Import the DataFrame from the pandas module

from pandas import DataFrame

# Import the numpy module

import numpy

 

# Consider the DataFrame that holds 5 rows in column - "Rating"

score = DataFrame({'Rating': [25,50,100,56,89]})

 

# Actual DataFrame

print(score)

# Specify the conditions and store the value in the new column - "Grade".

score['Grade']=numpy.where(score['Rating']<=50,"D","A")

 

print(score)

Output

Rating

0 25

1 50

2 100

3 56

4 89

Rating Grade

0 25 D

1 50 D

2 100 A

3 56 A

4 89 A

Explanation

As per the output, two Ratings are less than or equal to 50. So “D” is assigned as Grade for these two rows. For the remaining rows, “A” is assigned.

Example 2

Let’s create the Pandas DataFrame named score that holds one column named “Rating” with 5 numeric values. Then, create a new column named “Grade” and specify values for each row based on the conditions specified inside numpy.where() clause.

  1. If the Rating is less than or equal to 25, then assign Grade as “D”.
  2. If the Rating is less than or equal to 50, then assign Grade as “C”.
  3. If the Rating is less than or equal to 75, then assign Grade as “B”.
  4. Otherwise, “A” Grade is assigned.
# Import the DataFrame from the pandas module

from pandas import DataFrame

# Import the numpy module

import numpy

 

# Consider the DataFrame that holds 5 rows in column - "Rating"

score = DataFrame({'Rating': [25,50,100,75,90]})

 

# Actual DataFrame

print(score)

# Specify the conditions and store the value in the new column - "Grade".

score['Grade']=numpy.where(score['Rating']<=25,"D",

numpy.where(score['Rating']<=50,"C",

numpy.where(score['Rating']<=75,"B","A")))

 

print(score)

Output

Rating

0 25

1 50

2 100

3 75

4 90

Rating Grade

0 25 D

1 50 C

2 100 A

3 75 B

4 90 A

Explanation

As per the output,

  1. There is one Rating that is less than or equal to 25. So “D” Grade is assigned.
  2. There is one Rating that is less than or equal to 50. So “C” Grade is assigned.
  3. There is one Rating that is less than or equal to 75. So “B” Grade is assigned.
  4. The Remaining two rows were assigned Grade “A”.

Scenario 2: Use apply() as Case When

As we know, lambda takes a conditional expression such that we can include an if-else statement as a conditional expression. If the condition results as True, then Value1 is placed across the rows that match the condition. If the condition fails, then Value2 is placed across the rows that fail the condition. It is important to understand that lambda expression is passed as a parameter to the apply() function. So it is necessary to use the apply() function, which follows the column name of the DataFrame object.

Syntax

DataFrame_object['new column'] = DataFrame_object['weight'].apply(lambda x: Value1 if condition else Value2)

Here:

  1. We can specify the condition using the Relation operator.
  2. New column is the column name.
  3. We need to pass a lambda expression inside the apply() function that will take the condition as a parameter. Based on the condition, Value1 is placed across the row that matches the condition. Otherwise, Value2 is placed.

Example 1

Let’s create the Pandas DataFrame named animals_number that hold one column named “weight” with 6 numeric values and create a new column and set value to ‘Yes’ where weight is greater than 75 and set value – ‘No’ where weight is less than 75.

# Import the DataFrame from the pandas module

from pandas import DataFrame

# Consider the DataFrame that holds 6 rows in column - "weight"

animals_number = DataFrame({'weight': [10,23,90,45,78,104]})

# Actual DataFrame

print(animals_number)

# Create a new column and set value to 'Yes' where the weight is greater than 75.

# and set the value - 'No' where the weight is less than 75.

animals_number['Animal weight greater than 75'] = animals_number['weight'].apply(lambda x: 'Yes' if x > 75 else 'No')

print(animals_number)

Output:

weight

0 10

1 23

2 90

3 45

4 78

5 104

weight Animal weight greater than 75

0 10 No

1 23 No

2 90 Yes

3 45 No

4 78 Yes

5 104 Yes

Explanation

As per the output, There are three values in the weight column which are less than 75. In the “Animal weight greater than 75” column, ‘No’ is placed, and for the remaining values ‘Yes’ is placed.

Example 2

Let’s create the Pandas DataFrame named animals that hold one column named “name” with 6 values and create a new column named “Type” and set value to ‘Wild’ where the name equals ‘Tiger’ and rest of the rows to ‘Domestic’.

# Import the DataFrame from the pandas module

from pandas import DataFrame

# Consider the DataFrame that holds 6 rows in column - "name"

animals = DataFrame({'name': ['Tiger','donkey','cow','deer','camel','horse']})

# Actual DataFrame

print(animals)

# Create a new column named Type and set value to 'Wild' where name is 'Tiger' and

# Set value to 'Domestic' where value is not equal to 'Tiger'.

animals['Type'] = animals['name'].apply(lambda x: 'Wild' if x == 'Tiger' else 'Domestic')

print(animals)

Output:

name

0 Tiger

1 donkey

2 cow

3 deer

4 camel

5 horse

name Type

0 Tiger Wild

1 donkey Domestic

2 cow Domestic

3 deer Domestic

4 camel Domestic

5 horse Domestic

Explanation

As per the output, There is only one value in the name column, which is equal to ‘Tiger’. So In the “Type” column, ‘Wild’ is placed, and for the remaining values, ‘Domestic’ is placed as they are not equal to ‘Tiger’.

Conclusion:

In this article, when the condition is satisfied, a statement of this type, called a case statement, returns a value. We have seen how you can create a case statement to perform a required operation or task. In this tutorial, we have used the np.where() function and the apply() function to create case statements. We implemented a few examples to teach you how to use Pandas case statements by using the where() function and how to use the apply() function to create case statements.

Share Button

Source: linuxhint.com

Leave a Reply