## Pandas Weighted Average

The weighted average is the average of the data that identifies the specific numbers that are more important than the other numbers in the DataFrame. We will be implementing all possible ways in which the Pandas weighted average can be calculated with the help of several examples.

**Formula**

Here, values_column is the numeric column in the Pandas DataFrame that stores the values, and weights_column is the numeric column that will store the weight of each value.

**Method 1: Return Weighted Average **

Let’s use the custom function that computes the weighted average of the Pandas DataFrame. We will use the sum() function to calculate the sum in the following computation:

Here, weight_data is the column in the DataFrame that holds weights for values in the value_data column.

**Example**

In this example, we have a DataFrame named ‘calculations’ with 2 columns of integer type. Now, we will create a custom function, ‘weighted_avg_calculation’, to calculate the weighted average and call the function with these two columns by passing them as arguments.

# Create the dataframe with 2 columns and 5 rows

calculations=pandas.DataFrame.from_dict({'count':[7,8,9,0,4],

'quantity':[2,3,4,5,2]

})

# Display the DataFrame - calculations

print(calculations)

# Custom function that calculates the weighted average

def weighted_avg_calculation(calculations,value_data,weight_data):

return sum(calculations[weight_data] * calculations[value_data])/calculations[weight_data].sum()

print()

# Call the function by passing the DataFrame, 'quantity' as value_data and 'count' as weight_data

print(weighted_avg_calculation(calculations,'quantity','count'))

**Output**

0 7 2

1 8 3

2 9 4

3 0 5

4 4 2

2.9285714285714284

**Explanation**

So, the custom function is:

It will return the weighted average.

So, the weighted average of the above DataFrame is 2.92.

**Method 2: Return Weighted Average in Groups**

Now, we will use the groupby() function to group the rows and return the weighted average in each group. The apply() method is used along with the groupby() that takes the weighted average and columns as parameters.

Here, rows were grouped based on values in the ‘grouping_column’. The weighted_avg_calculation is a custom function that computes the weighted average. The weight_data is the column in the DataFrame that holds weights for values in the value_data column.

**Example**

In this example, we have a DataFrame named ‘calculations’ with 3 columns. Now, we will create a custom function, ‘weighted_avg_calculation’, to calculate the weighted average and call the function with the two columns by passing them as arguments. We will group the rows based on the ‘item’ column and return the weighted average in each group.

# Create the dataframe with 3 columns and 5 rows

calculations=pandas.DataFrame.from_dict({'count':[12,34,56,10,15],

'quantity':[100,200,345,670,50],

'item':['plastic','iron','iron','steel','plastic']

})

# Display the DataFrame - calculations

print(calculations)

# Custom function that calculates the weighted average

def weighted_avg_calculation(calculations,value_data,weight_data):

return sum(calculations[weight_data] * calculations[value_data])/calculations[weight_data].sum()

print()

print(calculations.groupby('item').apply(weighted_avg_calculation,'quantity','count'))

**Output**

0 12 100 plastic

1 34 200 iron

2 56 345 iron

3 10 670 steel

4 15 50 plastic

item

iron 290.222222

plastic 72.222222

steel 670.000000

dtype: float64

**Explanation**

So, the custom function is:

It will return the weighted average.

There are three groups in the calculations DataFrame.

- The weighted average for the ‘iron’ group is 290.22
- The weighted average for the ‘plastic’ group is 72.22
- The weighted average for the ‘steel’ group is 670.00

**Method 3: Return Weighted Average Using NumPy**

NumPy module supports the average() function in which we can pass the values and weights to it and get the weighted average of the pandas DataFrame.

- In the first parameter, we need to pass the values column.
- In the second parameter, we will assign the ‘weight data’ column to weights.

numpy.average(DataFrame_object[‘value_data’],weights=DataFrame_object[‘weight_data’])

**Example**

In this example, we have a DataFrame named ‘calculations’ with 2 columns. We will directly use numpy.average() to calculate the weighted average.

import numpy

# Create the dataframe with 2 columns and 5 rows

calculations=pandas.DataFrame.from_dict({'count':[12,34,56,10,15],

'quantity':[100,200,345,670,50]

})

# Display the DataFrame - calculations

print(calculations)

print()

print(numpy.average(calculations['quantity'],weights=calculations['count']))

**Output:**

0 12 100

1 34 200

2 56 345

3 10 670

4 15 50

273.7795275590551

dtype: float64

**Explanation**

Here, the quantity column will be the value, and the count will be the weights.

The weighted average is 273.77.

**Conclusion**

The Pandas weighted average is a valuable and technical function. We have done the custom function of the Pandas weighted average and the NumPy Pandas weighted average. The average is something we need to calculate in almost everything, even the budgets of small groceries. Thus, when talking about the millions of data, the weighted average Pandas function is a treat for all the users working on the specific data average calculations in their fields.

Source: linuxhint.com