| by Arround The Web | No comments

Pandas Standard Deviation

In mathematics, the “Standard Deviation” is a statistical calculation that quantifies the amount of variation or dispersion in a dataset. In Python, we can compute the standard deviation using various modules, such as statistics, NumPy, or Pandas. To calculate/determine the standard deviation for the DataFrame column values, the “DataFrame.std()” method is used in Python.

This write-up will cover:

What is the “DataFrame.std()” Method in Python?

The “DataFrame.std()” method of the “pandas” module is utilized to retrieve the standard deviation of the given DataFame over the requested axis. The syntax of the “DataFrame.std()” method is shown below:

DataFrame.std(axis=None, skipna=True, ddof=1, numeric_only=False, **kwargs)

 

In the above syntax:

  • The “axis” parameter specifies the axis that needs to be checked, such as “0” for the index and “1” for columns.
  • The “skipna” parameter default value is set/assigned to “True”. The parameter value excludes the NaN or Null values while determining the standard deviation.
  • The “ddof” parameter indicates the delta degree of freedom.
  • The “numeric_only” parameter is used to include only the int, float, or Boolean columns.

Return Type

This “DataFrame.std()” method retrieves the DataFrame or Series based on the particular level.

Example 1: Determining the Standard Deviation of the Single DataFrame Column

The following example is used to determine the standard deviation of the single DataFrame column. The “df.std()” computes the standard deviation of the single column “score_1”. Here is a code:

import pandas
import numpy
df = pandas.DataFrame({'score_1': [22, 29, 10, 38, 10, 23],'score_2': [12, 19, 15, 18, 19, numpy.nan],'score_3': [19, 17, 13, 18, numpy.nan, 10]})
print(df, '\n')
print(df['score_1'].std())

 

This code retrieves the following output to the console:

Example 2: Determining the Standard Deviation of the Multiple DataFrame Column

The below code is used to retrieve the standard deviation of the multiple columns of Pandas DataFrame. In this code, the “df.std()” method computes the standard deviation of the multiple columns “score_1” and “score_3”.

import pandas
import numpy
df = pandas.DataFrame({'score_1': [22, 29, 10, 38, 10, 23],'score_2': [12, 19, 15, 18, 19, numpy.nan],'score_3': [19, 17, 13, 18, numpy.nan, 10]})
print(df, '\n')
print(df[['score_1', 'score_3']].std())

 

After executing the above-provided code, the following output will show on the console:

Example 3: Determining the Standard Deviation of the Entire DataFrame Column

To determine the standard deviation of the entire DataFrame columns, the “df.std()” method is used directly without specifying any column labels. Take the following code to determine the standard deviation:

import pandas
import numpy
df = pandas.DataFrame({'score_1': [22, 29, 10, 38, 10, 23],'score_2': [12, 19, 15, 18, 19, numpy.nan],'score_3': [19, 17, 13, 18, numpy.nan, 10]})
print(df, '\n')
print(df.std())

 

The above-given code generates the following output:

Example 4: Determining the Standard Deviation of the DataFrame by Skipping NaN Values

We can skip the “None” values of DataFrame columns using the “skipna=True” parameter. This parameter is passed to the “df.std()” method to determine the standard deviation by ignoring the NaN values. Here is an example code:

import pandas
import numpy
df = pandas.DataFrame({'score_1': [22, 29, numpy.nan, 38, 10, numpy.nan],'score_2': [numpy.nan, 19, 15, 18, 19, numpy.nan],'score_3': [19, numpy.nan, 13, 18, numpy.nan, 10]})
print(df, '\n')
print(df.std(skipna=True))

 

Here is the output of the above-stated code after execution:

Example 5: Determining the Standard Deviation of the DataFrame Utilizing the “ddof” Parameter

The “ddof” parameter is used to specify the number of elements to be used while determining the standard deviation. To calculate the standard deviation using all the column values, the default value “0” is used, while the “1” value is used to calculate the standard deviation for all the values columns except the last value. Here is an example code:

import pandas
import numpy
df = pandas.DataFrame({'score_1': [22, 29, 10, 38, 10, 23],'score_2': [12, 19, 15, 18, 19, numpy.nan],'score_3': [19, 17, 13, 18, numpy.nan, 10]})
print(df, '\n')
print(df.std(ddof=0), '\n')
print(df.std(ddof=1), '\n')
print(df.std(ddof=2), '\n')

 

The above code will show the following output after execution:

Conclusion

The “DataFrame.std()” method of the “pandas” module is used to compute the standard deviation of the specified DataFame over the requested axis. The “df.std()” method is used to calculate the standard deviation for single, multiple, or entire DataFrame columns. We can also skip None values and specify the delta number of freedoms using the parameter value. This tutorial presented an in-depth guide on determining the standard deviation using numerous examples.

Share Button

Source: linuxhint.com

Leave a Reply