| by Arround The Web | No comments

Pandas Stack

The pandas stack is used for stacking the data from the sequence of the columns to the index manner. It returns multiple levels of index displaying in a new DataFrame. We will implement how we can use the pandas stack() function.

Syntax

pandas.DataFrame_object.stack(level = -1, dropna=True)

Parameters

  1. Level – It takes an integer parameter that will specify the stacking level.
    1. We can set levels like Level – 0,1, etc.
  2. Dropna – This other parameter, “dropna”, is used to drop rows in the stacked DataFrame having NaN values.

Scenario 1: Single-Level Column

Create a DataFrame with 3 columns and 5 rows. Here, the column names are – [“Exam type”,”Marks”,”Result”].

Now, we will stack the DataFrame:

import pandas

results = pandas.DataFrame([["Internal", 98,"pass"],
                     ["Internal", 45,"fail"],
                     ["External", 89,"pass"],
                     ["External", 67,"pass"],
                    ["External", 18,"fail"]],
                    columns = ["Exam type","Marks","Result"],
                    index = ['Ram','Sravan','Govind','Anup', 'Jab']
                    )

print(results,"\n")

# Apply stack() on single level column

print(results.stack())

Output

        Exam type  Marks Result
Ram     Internal     98   pass
Sravan  Internal     45   fail
Govind  External     89   pass
Anup    External     67   pass
Jab     External     18   fail

 

Ram     Exam type    Internal
        Marks              98
        Result           pass
Sravan  Exam type    Internal
        Marks              45
        Result           fail
Govind  Exam type    External
        Marks              89
        Result           pass
Anup    Exam type    External
        Marks              67
        Result           pass
Jab     Exam type    External
        Marks              18
        Result           fail
dtype: object

Explanation

Now, the Stacked DataFrame is displayed. Let’s discuss this in detail for one Row.

Ram – Exam Type is ‘Internal’, Ram – Marks is 98, and Ram – Result is ‘fail’. Similarly, for all the remaining rows, you can see the values are stacked.

Scenario 2: Multi-Level Column With Level – 0

One of the following ways in Python to create a MultiIndex is by using the MultiIndex.from_tuples() method. It will take column names in the list of tuples as a parameter. Finally, we will pass this to the “columns” parameter in the pandas DataFrame.

Syntax

MultiIndex.from_tuples([('columns,...),...]

Example 1

Create a DataFrame with rows that have MultiIndex. Stack the DataFrame with Level-0.

import pandas

results = pandas.DataFrame([["Internal", 98,"pass"],
                   ["Internal", 45,"fail"],
                   ["External", 89,"pass"],
                   ["External", 89,"pass"],
                  ["External", 45,"fail"]],
                  index = ['Ram','Sravan','Govind','Anup', 'Jab'],
                  columns=pandas.MultiIndex.from_tuples( [('Exams', 'Exam Type'),('Marks Secured', 'Total'), ('Status', 'Result')]
                  ))

print(results,"\n")

# Apply stack() with level-0 on multi level column

print(results.stack(level=0))

Output

Exams Marks Secured Status
Exam Type         Total Result
Ram     Internal            98   pass
Sravan  Internal            45   fail
Govind  External            89   pass
Anup    External            89   pass
Jab     External            45   fail

 

                      Exam Type Result  Total
Ram    Exams          Internal    NaN    NaN
       Marks Secured       NaN    NaN   98.0
       Status              NaN   pass    NaN
Sravan Exams          Internal    NaN    NaN
       Marks Secured       NaN    NaN   45.0
       Status              NaN   fail    NaN
Govind Exams          External    NaN    NaN
       Marks Secured       NaN    NaN   89.0
       Status              NaN   pass    NaN
Anup   Exams          External    NaN    NaN
       Marks Secured       NaN    NaN   89.0
       Status              NaN   pass    NaN
Jab    Exams          External    NaN    NaN
       Marks Secured       NaN    NaN   45.0
       Status              NaN   fail    NaN

Explanation

You can see the multi indices. For the Row:

  1. Ram – For index ‘Exams’ and ‘Exam Type’ – the value is .Internal.
  2. Ram – For index ‘Exams’ and ‘Result’ – the value is NaN (Not a number).
  3. Ram – For index ‘Exams’ and ‘Total’ – the value is NaN.
  4. Ram – For index ‘Marks Secured’ and ‘Exam Type’ – the value is NaN.
  5. Ram – For index ‘Marks Secured’ and ‘Result’ – the value is NaN.
  6. Ram – For index ‘Marks Secured’ and ‘Total’ – the value is 98.0.
  7. Ram – For index ‘Status’ and ‘Exam Type’ – the value is NaN.
  8. Ram – For index ‘Status’ and ‘Result’ – the value is “pass”.
  9. Ram – For index ‘Status’ and ‘Total’ – the value is NaN

Similarly, for all rows stacking happened in the previous format. For missed values, NaN is replaced.

Example 2

Create a DataFrame with rows that have MultiIndex. Stack the DataFrame with Level 2.

import pandas

results = pandas.DataFrame([["Internal", 98,"pass"],
                   ["Internal", 45,"fail"],
                   ["External", 89,"pass"],
                   ["External", 67,"pass"],
                  ["External", 18,"fail"]],
                  index = ['Ram','Sravan','Govind','Anup', 'Jab'],
                  columns=pandas.MultiIndex.from_tuples( [('Exams', 'Exam Type'),('Marks Secured', 'Total'), ('Status', 'Result')]
                  ))

# Apply stack() with level-1 on multi level column

print(results.stack(level=1))

Output

                     Exams  Marks Secured Status
Ram    Exam Type  Internal            NaN    NaN
       Result          NaN            NaN   pass
       Total           NaN           98.0    NaN
Sravan Exam Type  Internal            NaN    NaN
       Result          NaN            NaN   fail
       Total           NaN           45.0    NaN
Govind Exam Type  External            NaN    NaN
       Result          NaN            NaN   pass
       Total           NaN           89.0    NaN
Anup   Exam Type  External            NaN    NaN
       Result          NaN            NaN   pass
       Total           NaN           67.0    NaN
Jab    Exam Type  External            NaN    NaN
       Result          NaN            NaN   fail
       Total           NaN           18.0    NaN

Conclusion

Pandas “stack” is an extravagant technique for stacking the level columns into rows (index). In the areas where workers need to work on the rows instead of the columns or may want to have the data in the row manner, but they have done it in columns, this method is for them. It will save their precious time by simply using the method of pandas stack. We have done various ways in which the pandas stack works according to the situation. Every situation has its way of solving the problem to give the desired results in the DataFrame.

Share Button

Source: linuxhint.com

Leave a Reply