| by Arround The Web | No comments

Pandas Interpolate

Interpolation” is an approach or technique that estimates unspecified data points between two known ones. This method is used to fill the null or none DataFrames or Series values in Python. The “DataFrame.interpolate()” method with multiple parameters is used to perform the interpolation on DataFrame or series data.

This guide will present a guide on Pandas interpolation using the below content:

What is the “DataFrame.interpolate()” Method in Python?

In Python, the “DataFrame.interpolate()” method is utilized to fill the missing or Nan values in a Series or DataFrame. This method replaced the Null or Nan values based on the specified methods.

Syntax

DataFrame.interpolate(method='linear', *, limit_direction=None, limit_area=None,axis=0, limit=None, inplace=False, downcast=_NoDefault.no_default, **kwargs)

Parameters

In the above syntax:

  • The “method” parameter specifies the interpolation approach to use while filling the missing value. Some of the values include “linear”, “pad”, “zero”, “cubic”, “polynomial” and others. Here, each value has a different meaning and effect on how the Nan values are filled.
  • The “axis” parameter is the axis to interpolate along. It can be 0 for index and 1 for columns.
  • The “limit” parameter specifies the highest number of successive Nans to fill.
  • The “inplace” parameter is the “True” or “False” value that specifies whether to update the data in place if possible.
  • The “limit_direction” parameter is the direction in which to fill consecutive Nans if a limit is specified.
  • The “limit_area” parameter is the area in which to fill consecutive Nans if a limit is specified.
  • Lastly, the “downcast” parameter is an optional argument that specifies whether to downcast dtypes if possible.

For further understanding, you can overview this official documentation.

Return Value

The “DataFrame.interpolate()” method retrieves the DataFrame or Series or None of the same shapes interpolated at the NaNs

Example 1: Using “DataFrame.interpolate()” Method to Fill the Missing Value

In the below code, we first imported and created the DataFrame with None values in the columns. Next, the “df.interpolate()” method is used to fill the Nan values with the number between the previous and next row by ignoring the index. The row containing no value in the first row cannot get filled because the filling value direction is forward and there is no previous value.

import pandas
df = pandas.DataFrame({'Team': ['A', 'B', 'C', 'D', 'E', 'F'],
                   'Score_1': [12, 32, None ,None, 45, None],
                   'Score_2': [None, 23, 33, None, 45, 55],
                   'Score_3': [23, 32, 31, None, None, None]})
print(df, '\n')
df1 = df.interpolate(method='linear')
print(df1)

The interpolation of the DataFrame based on the Linear default method is shown below:

Example 2: Using “DataFrame.interpolate()” Method to Fill the Missing Value in Backward Direction

We can also find the interpolation in a backward direction just like we do for the forward direction in the previous example. The “limit_direction=” parameter with the “backward” value is passed to the “DataFrame.interpolate()” method. In the backward direction limit the missing value in the end row cannot get filled as no row is present after that from which the value can be interpolated:

import pandas
df = pandas.DataFrame({'Team': ['A', 'B', 'C', 'D', 'E', 'F'],
                   'Score_1': [12, 32, None ,None, 45, None],
                   'Score_2': [None, 23, 33, None, 45, 55],
                   'Score_3': [23, 32, 31, None, None, None]})
print(df, '\n')
df1 = df.interpolate(method='linear',limit_direction ='backward')
print(df1)

The backward Pandas interpolation on the DataFrame shown below output:

Example 3: Using “DataFrame.interpolate()” Method to Fill the Max Number of Missing Values

We can also specify the maximum number of consecutive missing values to interpolate. If this value is not set then by default all consecutive Nan values will be interpolated. Here in this code, the “df.interpolate()” method takes the “limit=1” parameter as an argument and fills only one consecutive missing value for each column of DataFrame:

import pandas
df = pandas.DataFrame({'Team': ['A', 'B', 'C', 'D', 'E', 'F'],
                   'Score_1': [12, 32, None ,None, 45, None],
                   'Score_2': [None, 23, 33, None, 45, 55],
                   'Score_3': [23, 32, 31, None, None, None]})
print(df, '\n')
df1 = df.interpolate(method='linear', limit=1)
print(df1)

Output

Example 4: Using “DataFrame.interpolate()” Method to Fill the Missing Value by Specify the Area to be Interpolated

In this code, we specify the area of interpolation using the “limit_area=” parameter. The “DataFrame.interpolate()” method takes the limit_area parameter value “inside” to fill only the missing values that are surrounded by existing values in the same column. The “limit_area=outside” is passed to the method to fill only the missing values that are not surrounded by existing values in the same column:

import pandas
df = pandas.DataFrame({'Team': ['A', 'B', 'C', 'D', 'E', 'F'],
                   'Score_1': [12, 32, None ,None, 45, None],
                   'Score_2': [None, 23, 33, None, 45, 55],
                   'Score_3': [23, 32, 31, None, None, None]})
print(df, '\n')
df1 = df.interpolate(limit_area='inside')
print(df1, '\n')
df2 = df.interpolate(limit_area='outside')
print(df2)

The above-code execution will retrieve the below output:

Conclusion

The “DataFrame.interpolate()” method is utilized in Python to fill the DataFrame/Series missing value or Nan values based on the specified method. We can use this method to fill the missing value in a forward or backward direction using the “limit_direction” parameter. We can also limit the maximum number of straight Nan values to be filled while interpolation using the “limit” parameter. This write-up covered a detailed guide on Panda’s interpolation via numerous examples.

Share Button

Source: linuxhint.com

Leave a Reply