| by Arround The Web | No comments

Pandas str Replace

The Pandas “Series” is a 1-dimensional array-like object that can keep various data types, such as int, strings, or Booleans. It is normally used for manipulating text data in series. Sometimes, while dealing with Series data we need to replace the specified substring with another string. The “Series.str.replace()” method is employed to replace the string or text from the series.

This guide delivers a comprehensive overview of Pandas str replaces using multiple examples. Here are the content to get started:

What is the “Series.str.replace()” Method in Python?

The “Series.str.replace()” method is utilized in Python to replace/modify each pattern/regex occurrence in the Series or Index object. This method is very similar to the “string.replace()” or “re.sub()” method in Python.

Syntax

Series.str.replace(pat, repl, n=-1, case=None, flags=0, regex=False)

Parameters

Here in this syntax:

  • The “pat” parameter represents the string or compiled regular expression that we want to replace. For example, the “pat=f” will match any string that starts with “f” and has any character after it.
  • The “repl” parameter is the string or callable that we want to use instead of the matched pattern.
  • The “n=-1” optional parameter represents an integer value that specifies how many replacements to make in each string. The default value “-1” means all pattern occurrences will be replaced/modified.
  • The “case” parameter is a Boolean value that determines if the replacement is case-sensitive or not.
  • The “flags” parameter represents an integer value that is used to specify the regex flags.
  • The “regex” parameter is the Boolean value that indicates whether the given pattern is a regular expression or not.

Return Value

The “Series.str.replace()” method retrieves the new Pandas Series object with the replaced text values.

Example 1: Replaced Specific Substring With Another String

In this code, we imported the “Pandas” module and created the DataFrame. Next, the DataFrame column is selected and the “str.replace()” method is applied to it to replace the specified substring with another string:

import pandas

df = pandas.DataFrame({'Name': ['Joseph 15', 'Anna 22', 'Henry 33']})

df['Name'] = df['Name'].str.replace(' ', '_')

print(df)

The substring has been replaced from the text with another string:

Example 2: Replacing Substring From Series With Another String

We can replace the substring of the string placed in the Series object with another string using the “str.replace()” method. Here in the below code, the “C” substring will be replaced with the “E” substring:

import pandas

ser1 = pandas.Series(["1A", "2B", "3C"])

print(ser1.str.replace("C", "E"))

The below output shows the replacement of the substring from the series string object:

Example 3: Replacing Substring With Another String Using Regex Expressions

We can also pass the regex expressions to the “str.replace()” method to replace the substring with another string:

import pandas

df = pandas.DataFrame({'num': ['5A', '6B', '2C']})

df['num'] = df['num'].str.replace('\d', 'Team-')

print(df)

Note: In the future, the default value of regex will change from true to false.

The digit value found in the column string has been replaced with another string:

Difference Between “Series.str.replace()” vs “DataFrame.replace()” Method

In Python, the “DataFrame.replace()” method replaces the string, regex, dictionary, series, etc. from Pandas DataFrame. While the “Series.str.replace()” method is used to replace values in a Series or DataFrame column that are string type. It cannot be applied to the entire DataFrame because DataFrame does not have str attributes.

If the DataFrame column contains only the string values then the “series.str.replace()” method is used, while if the DataFrame column has other data types then the “DataFrame.replace()” method is a good choice. Here is a code that indicates the working of these methods:

import pandas

df = pandas.DataFrame({'Name': ['Joseph', 10, 'Anna', 20, 'Lily']})

df.replace('Anna', 'Henry', inplace=True)

print(df)

In this code, the DataFrame Column has an integer and string data type. So, we can use the “DataFrame.replace()” method to substitute the string with a particular string.

As you can observe, the substring has been replaced effectively:

Now, we can use the “str.replace()” method:

import pandas

df = pandas.DataFrame({'Name': ['Joseph','Anna', 'Lily']})

df['Name'] = df['Name'].str.replace('Anna', 'Henry')

print(df)

Here, we apply the “series.str.replace()” method on the series or DataFrame column to replace the specified string with another string:

Conclusion

In Python, the “Series.str.replace()” method is utilized to replace/substitute the Series or index pattern/regex occurrences with another string. We can apply this method on Series but not on DataFrame as DataFrame does not have a “str” attribute. However, the “str.replace()” method is applied on the specific DataFrame column to substitute the string with a particular string. This tutorial delivered a detailed guide on Pandas str replace using several examples.

Share Button

Source: linuxhint.com

Leave a Reply