Pandas GroupBy Index
In Python, the “Pandas” library is utilized to manipulate/modify and organize data. This library can be utilized to group data based on the specified columns or index values and perform operations on it, such as adding, removing, etc. To do this, the “DataFrame.groupby()” method is used in Python. This method can be used to group Pandas data based on the index column.
This blog provides a detailed guide on Pandas “DataFrame.groupby()” method using the below-provided contents:
- What is the “DataFrame.groupby()” Method in Python?
- Pandas DataFrame Group by Single Index
- Pandas DataFrame Group by Multiple Index
- Pandas DataFrame Group by Column and Index
What is the “DataFrame.groupby()” Method in Python?
In Python, the “groupby()” method is utilized to split the data into groups based on some specific criteria by using a mapper or series of columns. Here is the syntax:
In the above syntax:
- The “by” parameter specifies the columns or list of columns’ names we want to group.
- The “axis” parameter specifies the axis along which the grouping is done. It is set to “0” (rows) by default.
- The “sort” and “group_keys” parameters specify whether to sort after a group (Default to True) and add group keys or not.
- The “level” parameter is utilized to group by a specific level or levels in a Multi-index.
- The “as_index” parameter determines whether the grouped columns are included in the output as an index.
- The “observed” parameter is used when working with categorical data, and the “dropna” parameter excludes missing values from the grouping.
Return Value
The “DataFrame.groupby()” method retrieves a “groupby” object that consists of group information.
Example 1: Pandas DataFrame Group by Single Index
This example is used to group the data from the index of Pandas DataFrame:
data1 = {'roll_no':[1804, 1805, 1804, 1806, 1805, 1807],'score_1' :[40, 20, 25, 23, 15, 54],'score_2':[40, 44, 20, 12, 10, 41]}
df = pandas.DataFrame(data1)
df.set_index(['roll_no'], inplace=True)
print(df)
result = df.groupby('roll_no').sum()
print('\n',result)
In the above code:
- The “pandas” module is imported, and the “pd.DataFrame()” function is used to create a DataFrame.
- The “df.set_index()” method takes the specified DataFrame column and sets the index of DataFrame.
- The “df.groupby()” method is used along with the “sum()” method to group the data based on the index and perform the sum operation on the specified columns.
Output
The original DataFrame and the DataFrame by grouping indexes are shown in the above output.
Example 2: Pandas DataFrame Group by Multiple Index
The following example is used to group data based on the multiple indexes of DataFrame:
data1 = {'roll_no':[1804, 1805, 1804, 1806, 1805, 1807],'teams':['A', 'B', 'A', 'C', 'B', 'D'],'score_1' :[40, 20, 25, 23, 15, 54],'score_2':[40, 44, 20, 12, 10, 41]}
df = pandas.DataFrame(data1)
df.set_index(['roll_no', 'teams'], inplace=True)
print(df)
result = df.groupby(['roll_no', 'teams']).sum()
print('\n',result)
Here in this code:
- The “df.groupby()” method takes multiple indexes as arguments and sums the value of columns.
- In this case, the DataFrame is grouped by the “roll_no” and “teams” index values.
Output
The Pandas DataFrame has been grouped by the multiple index value.
Example 3: Pandas DataFrame Group by Column and Index
To group the index and regular column, take the following code:
data1 = {'roll_no':[1804, 1805, 1804, 1806, 1805, 1807],'teams':['A', 'B', 'A', 'C', 'B', 'D'],'score_1' :[40, 20, 25, 23, 15, 54],'score_2':[40, 44, 20, 12, 10, 41]}
df = pandas.DataFrame(data1)
df.set_index(['roll_no', 'teams'], inplace=True)
print(df)
result = df.groupby(['roll_no', 'score_1']).sum()
print('\n',result)
In the above example:
- The “df.groupby()” method takes the “roll_no” index of DataFrame and the “score_1” column of DataFrame as arguments and groups the DataFrame based on those parameters.
Output
The DataFrame data has been groped based on the index and regular column of the DataFrame.
Conclusion
The “groupby()” method in Python is utilized to split the Pandas DataFrame into groups based on the specified index value. This method can be utilized to split the Pandas DataFrame into groups by single and multiple indexes or by the column. This article provided an in-depth guide on the Pandas’ “groupby()” method using numerous examples.
Source: linuxhint.com