Pandas Groupby Max
In Python, the “Pandas” library supports different modules and methods to perform several data operations such as DataFrame creation, data selection, data extraction, and others. The “groupby()” is one of the Pandas methods that is used in Python to create a group based on column values. To find the maximum value for each specified group the “max()” function is used in Python.
This article will provide you with a detailed guide on how to determine the maximum value of the selected columns group on single or multiple columns. For this, consider the content provided below:
- How to Determine the Max Value From the Grouped Data of Pandas DataFrame?
- Find the Maximum Value From the Grouped Data of the Single Column
- Find the Maximum Value From the Grouped Data of the Multiple Column
- Group Data By a Specific Column and Extract Maximum Value From Multiple Columns
- Determining and Sorting the Maximum Value
How to Determine the Max Value From the Grouped Data of Pandas DataFrame?
To determine the max value from the grouped data, the “df.groupby()” method is used along with the “max()” method. Here is the syntax:
For further understanding of the “df.groupby()” method, you can check this detailed guide.
Now, let’s explore this method using the following examples:
Example 1: Find the Maximum Value From the Grouped Data of the Single Column
Let’s overview the following example:
data = pandas.DataFrame({'Team': ['X', 'X', 'X', 'Y', 'Y', 'Y'],'Players': [10, 20, 30, 5, 22, 33],'Points': [242, 321, 221, 318, 319, 212],'Medals': [4, 5, 2, 5, 2, 1]})
print(data, '\n')
print(data.groupby('Team')['Points'].max())
In the above code:
- The “pd.DataFrame()” function creates/constructs a DataFrame.
- The “groupby()” method is utilized to group the data based on the “Team” column.
- The “max()” method is then applied to the “Points” column of each group to find the maximum number of points scored by each team.
Output
The above output is a new DataFrame object that contains two columns named “Team” and “Points”, where each row represents a team and its maximum score.
Example 2: Find the Maximum Value From the Grouped Data of the Multiple Column
Let’s understand this example by the following code:
data = pandas.DataFrame({'Team': ['X', 'X', 'X', 'Y', 'Y', 'Y'],'Players': ['A', 'B', 'B', 'A', 'B', 'A'],'Points': [242, 321, 221, 318, 319, 212],'Medals': [4, 5, 2, 5, 2, 1]})
print(data, '\n')
print(data.groupby(['Team', 'Players'])['Points'].max())
In the above code:
- The “pd.DataFrame()” function of the “pandas” module is used to create a DataFrame.
- The “groupby()” method groups on multiple columns and the “max()” function is used to determine the maximum value of each group in the selected columns.
Output
The maximum value of the “Points” column has been determined for each group created on multiple columns “Team” and “Players”.
Example 3: Group Data By a Specific Column and Extract Maximum Value From Multiple Columns
Take the following code to understand this example:
data = pandas.DataFrame({'Team': ['X', 'X', 'X', 'Y', 'Y', 'Y'],'Players': [10, 20, 30, 5, 22, 33],'Points': [242, 321, 221, 318, 319, 212],'Medals': [4, 5, 2, 5, 2, 1]})
print(data, '\n')
print(data.groupby('Team')['Points', 'Players'].max())
Here in this code:
- The “groupby()” method groups the data of DataFrame based on the “Team” column.
- The “max()” method is then applied to the “Points” and “Players” columns of each group to find the maximum value.
Output
The maximum value of the multiple columns of the specified group has been displayed.
Example 4: Determining and Sorting the Maximum Value
To sort the maximum value of the specified group data, use the below code:
data = pandas.DataFrame({'Team': ['X', 'X', 'Y', 'Y', 'Z', 'Z'],'Players': [10, 20, 30, 5, 22, 33],'Points': [242, 321, 221, 318, 319, 212],'Medals': [4, 5, 2, 5, 2, 1]})
print(data, '\n')
print(data.groupby('Team')['Points'].max().reset_index().sort_values(['Points'], ascending=True))
In this example:
- The “groupby()” method groups the data based on the “Team” column and the “max()” method determines the maximum value of the selected column “Points”.
- The “reset_index()” method is used to reset the index of the DataFrame and “sort_values()” is used to sort the maximum value in ascending order.
Output
The maximum points of the teams have been sorted in ascending order.
Conclusion
The “DataFrame.groupby()” method is used along with the “max()” function to calculate the max value from the grouped data. The “groupby()” is used to group the data based on single or more than two columns. The “sort_values()” function can also be used with the “groupby()” and “max()” functions to sort the maximum value. This tutorial has presented an extensive guide on Pandas “groupby” max using numerous examples.
Source: linuxhint.com