Filter NaN Pandas
While working with a large number of datasets, users often encounter multiple Null or NaN values. The Null values represent the null values in a dataset. In Python, developers also come across NaN values when they are working with Pandas in Python. To filter these Null values, Python includes different functions.
The outcomes from this blog are:
- What is “pandas” in Python?
- What are NaN Values?
- How to Filter Specific Row From Dataset Which Contains NaN Value Using Pandas DataFrame in Python?
- How to Filter Multiple Rows From Dataset Which Contains NaN Value Using Pandas DataFrame in Python?
- How to Filter All Rows From Dataset Which Contains NaN Value Using Pandas DataFrame in Python?
What is “pandas” in Python?
In Python, “pandas” is the most widely used library for working with the float, datetime, string, etc., types of datasets. It has multiple functions for exploring, analyzing, cleaning, and manipulating desired data. In other words, allows users to filter out the rows having the NaN values using the “dataframe” functions, such as “dataframe.dropna()”, and “dataframe.notnull()” functions.
What are NaN Values?
Almost every dataset has null values, the null is a particular floating-point value that stands for “Not a Number”. Data comes in multiple shapes and forms including blank/missing values which are represented as a NaN. Like other development languages, Python also has multiple ways to represent the missing values in the datasets.
How to Filter Particular Data Rows From Dataset Which Contains NaN Value by Utilizing the Pandas DataFrame in Python?
To filter specific rows from the dataset which contains NaN values, first, we will create a dataset containing NaN values. To do so, import the “numpy”, and ”pandas” library modules and create a new dataset. Then, check the newly created dataset:
import numpy as np
dataframe = pd.DataFrame({'Authors' : ['Maria', 'Henry', 'Marry', np.nan, 'Alex'],
'UserName' : ['fmn018', np.nan, 'fm012', 'mg002', 'ma025' ],
'Experience' : ['1 Year', '2 Year', np.nan, '6 Months', '9 Months']
})
dataframe
As you can see, the created dataset includes multiple NaN values:
Now, use the “notnull()” function to filter the specific row from the particular column which contains NaN values:
Output
How to Filter Multiple Data Rows From Dataset Which Contains NaN Value by Utilizing Pandas DataFrame in Python?
Sometimes, users need to filter out the multiple rows from the provided dataset from more than one column. For doing so, specify the desired column names and then, use the “all()” function with the “notnull()” function:
dataframe[dataframe[columns].notnull().all(1)]
It can be observed that multiple rows are filtered from the dataset that contains NaN values from the specified columns:
How to Filter All Rows From Dataset Which Contains NaN Value Using Pandas DataFrame in Python?
If users want to filter all rows from the dataset which contain NaN values using the Pandas Dataframe in Python, the “dropna()” function can be used:
Output
We have compiled the easiest ways to filter the NaN values in Python.
Conclusion
To filter out the rows having the NaN values in Python, the “dataframe” functions, such as “dataframe.notnull()”, and “dataframe.dropna()” functions are used. This blog provided the different ways to filter the NaN values in Python.
Source: linuxhint.com