| by Arround The Web | No comments

Pandas GroupBy Index

In Python, the “Pandas” library is utilized to manipulate/modify and organize data. This library can be utilized to group data based on the specified columns or index values and perform operations on it, such as adding, removing, etc. To do this, the “DataFrame.groupby()” method is used in Python. This method can be used to group Pandas data based on the index column.

This blog provides a detailed guide on Pandas “DataFrame.groupby()” method using the below-provided contents:

What is the “DataFrame.groupby()” Method in Python?

In Python, the “groupby()” method is utilized to split the data into groups based on some specific criteria by using a mapper or series of columns. Here is the syntax:

DataFrame.groupby(by=None, axis=0, sort=True, group_keys=True, level=None, as_index=True, observed=False, dropna=True)

In the above syntax:

  • The “by” parameter specifies the columns or list of columns’ names we want to group.
  • The “axis” parameter specifies the axis along which the grouping is done. It is set to “0” (rows) by default.
  • The “sort” and “group_keys” parameters specify whether to sort after a group (Default to True) and add group keys or not.
  • The “level” parameter is utilized to group by a specific level or levels in a Multi-index.
  • The “as_index” parameter determines whether the grouped columns are included in the output as an index.
  • The “observed” parameter is used when working with categorical data, and the “dropna” parameter excludes missing values from the grouping.

Return Value

The “DataFrame.groupby()” method retrieves a “groupby” object that consists of group information.

Example 1: Pandas DataFrame Group by Single Index

This example is used to group the data from the index of Pandas DataFrame:

import pandas
data1 = {'roll_no':[1804, 1805, 1804, 1806, 1805, 1807],'score_1' :[40, 20, 25, 23, 15, 54],'score_2':[40, 44, 20, 12, 10, 41]}
df = pandas.DataFrame(data1)
df.set_index(['roll_no'], inplace=True)
print(df)
result = df.groupby('roll_no').sum()
print('\n',result)

In the above code:

  • The “pandas” module is imported, and the “pd.DataFrame()” function is used to create a DataFrame.
  • The “df.set_index()” method takes the specified DataFrame column and sets the index of DataFrame.
  • The “df.groupby()” method is used along with the “sum()” method to group the data based on the index and perform the sum operation on the specified columns.

Output

The original DataFrame and the DataFrame by grouping indexes are shown in the above output.

Example 2: Pandas DataFrame Group by Multiple Index

The following example is used to group data based on the multiple indexes of DataFrame:

import pandas
data1 = {'roll_no':[1804, 1805, 1804, 1806, 1805, 1807],'teams':['A', 'B', 'A', 'C', 'B', 'D'],'score_1' :[40, 20, 25, 23, 15, 54],'score_2':[40, 44, 20, 12, 10, 41]}
df = pandas.DataFrame(data1)
df.set_index(['roll_no', 'teams'], inplace=True)
print(df)
result = df.groupby(['roll_no', 'teams']).sum()
print('\n',result)

Here in this code:

  • The “df.groupby()” method takes multiple indexes as arguments and sums the value of columns.
  • In this case, the DataFrame is grouped by the “roll_no” and “teams” index values.

Output

The Pandas DataFrame has been grouped by the multiple index value.

Example 3: Pandas DataFrame Group by Column and Index

To group the index and regular column, take the following code:

import pandas
data1 = {'roll_no':[1804, 1805, 1804, 1806, 1805, 1807],'teams':['A', 'B', 'A', 'C', 'B', 'D'],'score_1' :[40, 20, 25, 23, 15, 54],'score_2':[40, 44, 20, 12, 10, 41]}
df = pandas.DataFrame(data1)
df.set_index(['roll_no', 'teams'], inplace=True)
print(df)
result = df.groupby(['roll_no', 'score_1']).sum()
print('\n',result)

In the above example:

  • The “df.groupby()” method takes the “roll_no” index of DataFrame and the “score_1” column of DataFrame as arguments and groups the DataFrame based on those parameters.

Output

The DataFrame data has been groped based on the index and regular column of the DataFrame.

Conclusion

The “groupby()” method in Python is utilized to split the Pandas DataFrame into groups based on the specified index value. This method can be utilized to split the Pandas DataFrame into groups by single and multiple indexes or by the column. This article provided an in-depth guide on the Pandas’ “groupby()” method using numerous examples.

Share Button

Source: linuxhint.com

Leave a Reply