| by Arround The Web | No comments

Pandas Groupby Count Distinct

The “Count Distinct” is a common operation in data analysis that provides the number of unique values within a column. In Python, the “groupby()” function of “Pandas” is used along with other functions such as “nunique()”, “unique()”, and others, to group data by a common value and count the number of unique values in each group.

This Python article will deliver a detailed guide on how to count the distinct value of the Pandas DataFrame group via the below methods:

Method 1: Determine the Count Distinct Values in Pandas DataFrame Group Using the “nunique()” Method

The “nunique()” method is utilized in Python to retrieve the number of unique values in the Pandas DataFrame column. The particular method counts the distinct values of DataFrame groups.

Example 1: Using Single Column Value

The below code is utilized to count the distinct value of the single group of DataFrame:

import pandas
df = pandas.DataFrame({'Name': ['Lily', 'Carry', 'Lily', 'Sybil', 'Lily', 'Lily', 'Sybil'],'Age': [15, 17, 16, 19, 15, 15, 21],'Score': [55, 66, 25, 88, 55, 66, 18]})
print(df)
df1 = df.groupby('Name')['Age'].nunique()
print('\n',df1)

In the above example, the “Pandas” module is imported, and the DataFrame is created with multiple columns. Next, the “df.groupby()” method groups the DataFrame based on a single column “Name”. After grouping, the “nunique()” method is applied to the group value to determine the distinct unique values.

Output

The distinct value of the specified DataFrame group is shown in the above output.

Example 2: Using Multiple Column Value

Let’s utilize the following code to count distinct values of the DataFrame group based on multiple columns:

import pandas
df = pandas.DataFrame({'Name': ['Lily', 'Carry', 'Lily', 'Sybil', 'Lily', 'Lily', 'Sybil'],'Age': [15, 17, 16, 19, 15, 15, 21],'Score': [55, 66, 25, 88, 55, 66, 18]})
print(df)
df1 = df.groupby('Name')[['Age', 'Score']].nunique()
print('\n',df1)

In this code, the “df.groupby()” method is utilized to group the DataFrame of Pandas on a single column. The “nunique()” method is then used to determine the distinct values of the multiple columns.

Output

The distinct values of the multiple columns have been shown.

Method 2: Determine the Count Distinct Values in Pandas DataFrame Group Using the “value_counts()” Method

The “value_counts()” method is used to retrieve the count of the unique value of single or multiple columns. This method calculates the distinct value of a group of DataFrame.

Example 1: Using Single Column Value

Here is an example code to count the distinct value of a single column:

import pandas
df = pandas.DataFrame({'Name': ['Cyndy', 'Carry', 'Lily', 'Sybil', 'Cyndy', 'Lily', 'Sybil'],'Age': [15, 17, 18, 19, 15, 16, 19]})
print(df)
df1 = df.groupby('Name')['Age'].value_counts()
print('\n',df1)

In the above code, the “df.groupby()” method is used along with the “value_counts()” method to count the distinct value of the single column named “Age”.

Output

The total distinct values for the specified group have been shown in the above snippet.

Example 2: Using Multiple Columns Value

Let’s overview this for multiple columns values:

import pandas
df = pandas.DataFrame({'Name': ['Cyndy', 'Carry', 'Lily', 'Sybil', 'Cyndy', 'Lily', 'Sybil'],'Age': [15, 17, 18, 19, 15, 16, 19],'Score': [55, 66, 55, 88, 55, 66, 88]})
print(df)
df1 = df.groupby('Name')[['Age', 'Score']].value_counts()
print('\n',df1)

In the above code, the “df.groupby()” creates a group according to the particular column value. The “value_counts()” method is used to count the distinct value of the multiple columns for the created group.

Output

The total distinct values for the multiple groups have been returned.

Method 3: Determine the Count Distinct Values in Pandas DataFrame Group Using the “unique()” Method

The “unique()” method is used to find the unique data/value of the Pandas DataFrame. We can use the below code to count the distinct values of the DataFrame group:

import pandas
df = pandas.DataFrame({'Name': ['Lily', 'Carry', 'Lily', 'Sybil', 'Lily', 'Lily', 'Sybil'],'Age': [15, 17, 16, 19, 15, 15, 21],'Score': [55, 66, 25, 88, 55, 66, 18]})
print(df)
df1 = df.groupby('Name')['Age'].unique()
print('\n',df1)

Here, in this code, the “df.groupby()” method is used to return the DataFrame having a unique value rather than a count. However, we can determine the distinct value by counting the unique value returned.

Output

The distinct values of the specified column have been returned successfully.

Method 4: Determine the Count Distinct Values in Pandas DataFrame Group Using the “agg()” Method

The agg() method can also be utilized to count the distinct values of the Pandas DataFrame group. Here is an example:

import pandas
df = pandas.DataFrame({'Name': ['Lily', 'Carry', 'Lily', 'Sybil', 'Lily', 'Lily', 'Sybil'],'Age': [15, 17, 16, 19, 15, 15, 21],'Score': [55, 66, 25, 88, 55, 66, 18]})
print(df)
df = df.groupby('Name')[['Age']].agg(['nunique'])
print('\n',df)

In the above code, the “df.groupby()” method is used along with the “agg()” method to return the distinct value of the specified columns according to the specified group.

Output

The total distinct value has been calculated/determined.

Conclusion

The “nunique()”, “value_counts()”, “unique()”, and the “agg()” methods are used to determine the count of distinct values in the Pandas DataFrame group. These methods help us count distinct values of single or multiple DataFrame columns based on the group value. The DataFrame first groups by the specific columns and then applies all of these methods to determine the distinct value. This blog has delivered a detailed guide on counting the distinct value of Pandas DataFrame using numerous examples.

Share Button

Source: linuxhint.com

Leave a Reply