| by Arround The Web | No comments

Filter NaN Pandas

While working with a large number of datasets, users often encounter multiple Null or NaN values. The Null values represent the null values in a dataset. In Python, developers also come across NaN values when they are working with Pandas in Python. To filter these Null values, Python includes different functions.

The outcomes from this blog are:

What is “pandas” in Python?

In Python, “pandas” is the most widely used library for working with the float, datetime, string, etc., types of datasets. It has multiple functions for exploring, analyzing, cleaning, and manipulating desired data. In other words, allows users to filter out the rows having the NaN values using the “dataframe” functions, such as “dataframe.dropna()”, and “dataframe.notnull()” functions.

What are NaN Values?

Almost every dataset has null values, the null is a particular floating-point value that stands for “Not a Number”. Data comes in multiple shapes and forms including blank/missing values which are represented as a NaN. Like other development languages, Python also has multiple ways to represent the missing values in the datasets.

How to Filter Particular Data Rows From Dataset Which Contains NaN Value by Utilizing the Pandas DataFrame in Python?

To filter specific rows from the dataset which contains NaN values, first, we will create a dataset containing NaN values. To do so, import the “numpy”, and ”pandas” library modules and create a new dataset. Then, check the newly created dataset:

import pandas as pd
import numpy as np
dataframe = pd.DataFrame({'Authors' : ['Maria', 'Henry', 'Marry', np.nan, 'Alex'],
                          'UserName' : ['fmn018', np.nan, 'fm012', 'mg002', 'ma025' ],
                          'Experience' : ['1 Year', '2 Year', np.nan, '6 Months', '9 Months']
                        })
                 
dataframe

 

As you can see, the created dataset includes multiple NaN values:

Now, use the “notnull()” function to filter the specific row from the particular column which contains NaN values:

dataframe[dataframe['Experience'].notnull()]

 

Output

How to Filter Multiple Data Rows From Dataset Which Contains NaN Value by Utilizing Pandas DataFrame in Python?

Sometimes, users need to filter out the multiple rows from the provided dataset from more than one column. For doing so, specify the desired column names and then, use the “all()” function with the “notnull()” function:

columns = ['Experience','UserName']
dataframe[dataframe[columns].notnull().all(1)]

 

It can be observed that multiple rows are filtered from the dataset that contains NaN values from the specified columns:

How to Filter All Rows From Dataset Which Contains NaN Value Using Pandas DataFrame in Python?

If users want to filter all rows from the dataset which contain NaN values using the Pandas Dataframe in Python, the “dropna()” function can be used:

dataframe.dropna()

 

Output

We have compiled the easiest ways to filter the NaN values in Python.

Conclusion

To filter out the rows having the NaN values in Python, the “dataframe” functions, such as “dataframe.notnull()”, and “dataframe.dropna()” functions are used. This blog provided the different ways to filter the NaN values in Python.

Share Button

Source: linuxhint.com

Leave a Reply