| by Arround The Web | No comments

Pandas Count Distinct

This article will explore various ways and methods to determine the number of unique items in a Pandas DataFrame.

Sample Data

Before discussing how to determine the number of unique values in a DataFrame, we will need sample data.

An example code is shown below:

# import pandas
import pandas as pd
df = pd.DataFrame({
    'salary': [120000, 100000, 90000, 110000, 120000, 100000, 56000],
    'department': ['game developer', 'database developer', 'front-end developer', 'full-stack developer', 'database developer', 'security researcher', 'cloud-engineer'],
    'rating': [4.3, 4.4, 4.3, 3.3, 4.3, 5.0, 4.4]},
    index=['Alice', 'Michael', 'Joshua', 'Patricia', 'Peter', 'Jeff', 'Ruth'])
df

The code above should create a sample DataFrame that we can use in this tutorial. The resulting tabular form of the data is as shown:

#1 Pandas Unique Method

The unique () function is the first method we can use to determine the number of unique values in a DataFrame.

The function takes a series as the input and returns a list of the unique values.

For example, to calculate the unique items in the salary column, we can do:

print(pd.unique(df['salary']))

The code above should return the unique items in the ‘salary’ column.

[120000 100000  90000 110000  56000]

If you want the number of unique values, you can get the length of the list as shown:

print(f"Unique items: {len(pd.unique(df['salary']))}")

The code above should return:

Unique items: 5

#2 Pandas nunique Function

The nunique() function allows you to get the number of unique values along a specified axis.

An example is as shown:

print(f"[number of unique items/column]\n{df.nunique(axis=0)}")

The code above should return the number of unique items in each column. The resulting output is as shown:

[number of unique items/column]
salary        5
department    6
rating        4
dtype: int64

You can also fetch the number of unique items in a specific column as shown:

print(df.salary.nunique())

The above should return the number of unique items in the salary column.

#3 Pandas value_counts()

Pandas also provides us with the value_count() function. This function returns the number of unique values in a specified column.

An example is as shown:

res = list(df.salary.value_counts())
print(f"unique items: {len(res)}")

The value_counts() function returns the count of each value in the column. We then convert the result into a list and get the length.

This should get the number of unique items in the column:

unique items: 5

Conclusion

This article discussed various methods and techniques we can use to determine the number of unique values in a Pandas DataFrame.

Share Button

Source: linuxhint.com

Leave a Reply