Pandas Stack

September 20, 2022January 30, 2023 | by Arround The Web | No comments

The pandas stack is used for stacking the data from the sequence of the columns to the index manner. It returns multiple levels of index displaying in a new DataFrame. We will implement how we can use the pandas stack() function.

Syntax

pandas.DataFrame_object.stack(level = -1, dropna=True)

Parameters

Level – It takes an integer parameter that will specify the stacking level.
1. We can set levels like Level – 0,1, etc.
Dropna – This other parameter, “dropna”, is used to drop rows in the stacked DataFrame having NaN values.

Scenario 1: Single-Level Column

Create a DataFrame with 3 columns and 5 rows. Here, the column names are – [“Exam type”,”Marks”,”Result”].

Now, we will stack the DataFrame:

import pandas

results = pandas.DataFrame([["Internal", 98,"pass"],
["Internal", 45,"fail"],
["External", 89,"pass"],
["External", 67,"pass"],
["External", 18,"fail"]],
columns = ["Exam type","Marks","Result"],
index = ['Ram','Sravan','Govind','Anup', 'Jab']
)

print(results,"\n")

# Apply stack() on single level column

print(results.stack())

Output

Exam type Marks Result
Ram     Internal     98   pass
Sravan Internal     45   fail
Govind External     89   pass
Anup    External     67   pass
Jab     External     18   fail

Ram     Exam type    Internal
Marks              98
Result           pass
Sravan Exam type    Internal
Marks              45
Result           fail
Govind Exam type    External
Marks              89
Result           pass
Anup    Exam type    External
Marks              67
Result           pass
Jab     Exam type    External
Marks              18
Result           fail
dtype: object

Explanation

Now, the Stacked DataFrame is displayed. Let’s discuss this in detail for one Row.

Ram – Exam Type is ‘Internal’, Ram – Marks is 98, and Ram – Result is ‘fail’. Similarly, for all the remaining rows, you can see the values are stacked.

Scenario 2: Multi-Level Column With Level – 0

One of the following ways in Python to create a MultiIndex is by using the MultiIndex.from_tuples() method. It will take column names in the list of tuples as a parameter. Finally, we will pass this to the “columns” parameter in the pandas DataFrame.

Syntax

MultiIndex.from_tuples([('columns,...),...]

Example 1

Create a DataFrame with rows that have MultiIndex. Stack the DataFrame with Level-0.

import pandas

results = pandas.DataFrame([["Internal", 98,"pass"],
["Internal", 45,"fail"],
["External", 89,"pass"],
["External", 89,"pass"],
["External", 45,"fail"]],
index = ['Ram','Sravan','Govind','Anup', 'Jab'],
columns=pandas.MultiIndex.from_tuples( [('Exams', 'Exam Type'),('Marks Secured', 'Total'), ('Status', 'Result')]
))

print(results,"\n")

# Apply stack() with level-0 on multi level column

print(results.stack(level=0))

Output

Exams Marks Secured Status
Exam Type         Total Result
Ram     Internal            98   pass
Sravan Internal            45   fail
Govind External            89   pass
Anup    External            89   pass
Jab     External            45   fail

Exam Type Result Total
Ram    Exams          Internal    NaN    NaN
Marks Secured       NaN    NaN   98.0
Status              NaN   pass    NaN
Sravan Exams          Internal    NaN    NaN
Marks Secured       NaN    NaN   45.0
Status              NaN   fail    NaN
Govind Exams          External    NaN    NaN
Marks Secured       NaN    NaN   89.0
Status              NaN   pass    NaN
Anup   Exams          External    NaN    NaN
Marks Secured       NaN    NaN   89.0
Status              NaN   pass    NaN
Jab    Exams          External    NaN    NaN
Marks Secured       NaN    NaN   45.0
Status              NaN   fail    NaN

Explanation

You can see the multi indices. For the Row:

Ram – For index ‘Exams’ and ‘Exam Type’ – the value is .Internal.
Ram – For index ‘Exams’ and ‘Result’ – the value is NaN (Not a number).
Ram – For index ‘Exams’ and ‘Total’ – the value is NaN.
Ram – For index ‘Marks Secured’ and ‘Exam Type’ – the value is NaN.
Ram – For index ‘Marks Secured’ and ‘Result’ – the value is NaN.
Ram – For index ‘Marks Secured’ and ‘Total’ – the value is 98.0.
Ram – For index ‘Status’ and ‘Exam Type’ – the value is NaN.
Ram – For index ‘Status’ and ‘Result’ – the value is “pass”.
Ram – For index ‘Status’ and ‘Total’ – the value is NaN

Similarly, for all rows stacking happened in the previous format. For missed values, NaN is replaced.

Example 2

Create a DataFrame with rows that have MultiIndex. Stack the DataFrame with Level 2.

import pandas

results = pandas.DataFrame([["Internal", 98,"pass"],
["Internal", 45,"fail"],
["External", 89,"pass"],
["External", 67,"pass"],
["External", 18,"fail"]],
index = ['Ram','Sravan','Govind','Anup', 'Jab'],
columns=pandas.MultiIndex.from_tuples( [('Exams', 'Exam Type'),('Marks Secured', 'Total'), ('Status', 'Result')]
))

# Apply stack() with level-1 on multi level column

print(results.stack(level=1))

Output

Exams Marks Secured Status
Ram    Exam Type Internal            NaN    NaN
Result          NaN            NaN   pass
Total           NaN           98.0    NaN
Sravan Exam Type Internal            NaN    NaN
Result          NaN            NaN   fail
Total           NaN           45.0    NaN
Govind Exam Type External            NaN    NaN
Result          NaN            NaN   pass
Total           NaN           89.0    NaN
Anup   Exam Type External            NaN    NaN
Result          NaN            NaN   pass
Total           NaN           67.0    NaN
Jab    Exam Type External            NaN    NaN
Result          NaN            NaN   fail
Total           NaN           18.0    NaN

Conclusion

Pandas “stack” is an extravagant technique for stacking the level columns into rows (index). In the areas where workers need to work on the rows instead of the columns or may want to have the data in the row manner, but they have done it in columns, this method is for them. It will save their precious time by simply using the method of pandas stack. We have done various ways in which the pandas stack works according to the situation. Every situation has its way of solving the problem to give the desired results in the DataFrame.

Source: linuxhint.com