| by Arround The Web | No comments

Pandas Create Column Based on Condition

Python data science libraries, such as NumPy, Pandas, and others are used by Data scientists to perform fast, modular, and efficient data analysis. We can use the methods and functions of these libraries to perform certain tasks on our data. For example, if we want to create a new column based on particular conditions various methods are used in Python.

In this guide, you will be able to create a DataFrame column based on the condition using the following methods:

Method 1: Create a DataFrame Column Based on Condition Using “List Comprehension”

The “List Comprehension” method is used to create/construct a DataFrame column based on the condition. Here, the new column “Group” is created based on the age value greater or equal to “18”:

import pandas
df = pandas.DataFrame({'Name':['Lily', 'Joseph', 'Anna', 'Sam', 'Henry'],
                       'Age':[19, 15, 12, 18, 21], 'Sex': ['F', 'M', 'F', 'M', 'M']})
print(df, '\n')
df['Group'] = ['A' if x >=18 else 'B' for x in df['Age']]
print(df)

 
The new column has been created successfully:

Method 2: Create a DataFrame Column Based on Condition Using “Numpy.where()” Method

In Python, the “numpy.where()” method retrieves the element indices in a specified array where the input condition is fulfilled. In the code below, first we create a dictionary with three columns. After that, the “numpy.where()” method is used to construct a new column based on the specified condition. This method takes three arguments, such as a condition, a value to assign if the condition is true, and a value to assign if the condition is false:

import pandas, numpy
df = pandas.DataFrame({'Name':['Lily', 'Joseph', 'Anna', 'Sam', 'Henry'],
                       'Age':[19, 15, 12, 18, 21], 'Sex': ['F', 'M', 'F', 'M', 'M']})
print(df, '\n')
df['Group'] = numpy.where(df['Sex'] == 'M', 'A', 'B')
print(df)

 
The new column has been created with Group values “A” and “B”:

Method 3: Create a DataFrame Column Based on Condition Using “Numpy.select()” Method

The “numpy.select()” method retrieves an array that has been selected from the choice list based on the conditions. Here, the “numpy.select()” method takes three values as an argument, conditions to apply, the value if the condition is satisfied, and the default value where the condition is not satisfied:

import pandas, numpy
df = pandas.DataFrame({'Name':['Lily', 'Joseph', 'Anna', 'Sam', 'Henry'],
                       'Age':[19, 15, 12, 18, 21], 'Sex': ['F', 'M', 'F', 'M', 'M']})
print(df, '\n')
df['Salary'] = numpy.select([(df['Age'] >= 18)& (df['Sex'] == 'M')],[1000], default=500)
print(df)

 
The below output created the new column based on the condition:

Method 4: Create a DataFrame Column Based on Condition Using “Numpy.apply()” Method

According to the below-given code, the “numpy.apply()” method is used along with the specified function to create a new column. The newly created column will show the length of the specified columns:

import pandas
df = pandas.DataFrame({'Name':['Lily', 'Joseph', 'Anna', 'Sam', 'Henry'],
                       'Age':[19, 15, 12, 18, 21], 'Sex': ['F', 'M', 'F', 'M', 'M']})
print(df, '\n')
df['Name Character'] = df['Name'].apply(len)
print(df)

 
The new column has been created successfully:

Method 5: Create a DataFrame Column Based on Condition Using “DataFrame.map()” Method

The “df.map()” method is used to apply the dictionary or function to each element of the series. In this example code, we create a new column called “Group” in the DataFrame by using the “df.map()” method. Here, the dictionary we used will map the column value “M” to “A” and “F” to “B”. The df.map() method returns a new Series object with the mapped values, which is then assigned to the Group column of the DataFrame:

import pandas, numpy
df = pandas.DataFrame({'Name':['Lily', 'Joseph', 'Anna', 'Sam', 'Henry'],
                       'Age':[19, 15, 12, 18, 21], 'Sex': ['F', 'M', 'F', 'M', 'M']})
print(df, '\n')
df['Group'] = df['Sex'].map({'M': 'A', 'F': 'B'})
print(df)

 
The new column has been created successfully:

Conclusion

The “List Comprehension”, “np.where()”, “np.select()”, “np.apply()” and “df.map()” methods are used to create a DataFrame column based on the condition. All of these methods can easily create columns based on the specified single or multiple conditions by applying the function. This tutorial delivered a detailed guide on creating columns based on condition.

Share Button

Source: linuxhint.com

Leave a Reply