| by Arround The Web | No comments

Find Strings in Pandas

This article will help you understand various methods we can use to search for a string in a Pandas DataFrame.

Pandas Contains Method

Pandas provide us with a contains() function that allows searching if a substring is contained in a Pandas series or DataFrame.

The function accepts a literal string or a regular expression pattern which is then matched against the existing data.

The function syntax is as shown:

1
Series.str.contains(pattern, case=True, flags=0, na=None, regex=True)

The function parameters are expressed as shown:

  1. pattern – refers to the character sequence or regex pattern to search.
  2. case – specifies if the function should obey case sensitivity.
  3. flags – specifies the flags to pass to the RegEx module.
  4. na – fills the missing values.
  5. regex – if True, treats the input pattern as a regular expression.

Return Value

The function returns a series or index of Boolean values indicating if the pattern/substring is found in the DataFrame or series.

Example

Suppose we have a sample DataFrame shown below:

1
2
3
4
5
# import pandas
import pandas as pd

df = pd.DataFrame({"full_names": ['Irene Coleman', 'Maggie Hoffman', 'Lisa Crawford', 'Willow Dennis','Emmett Shelton']})
df

Search a String

To search for a string, we can pass the substring as the pattern parameter as shown:

1
print(df.full_names.str.contains('Shelton'))

The code above checks if the string ‘Shelton’ is contained in the full_names columns of the DataFrame.

This should return a series of Boolean values indicating whether the string is located in each row of the specified column.

An example is as shown:

To get the actual value, you can pass the result of the contains() method as the index of the dataframe.

1
print(df[df.full_names.str.contains('Shelton')])

The above should return:

1
2
full_names
4  Emmett Shelton

Case Sensitive Search

If case sensitivity is important in your search, you can set the case parameter to True as shown:

1
print(df.full_names.str.contains('shelton', case=True))

In the example above, we set the case parameter to True, enabling a case-sensitive search.

Since we search for the lowercase string ‘shelton,’ the function should ignore the uppercase match and return false.

RegEx search

We can also search using a regular expression pattern. A simple example is as shown:

1
print(df.full_names.str.contains('wi|em', case=False, regex=True))

We search for any string matching the patterns ‘ wi’ or ’em’ in the code above. Note that we set the case parameter to false, ignoring case sensitivity.

The code above should return:

Closing

This article covered how to search for a substring in a Pandas DataFrame using the contains() method. Check the docs for more.

Share Button

Source: linuxhint.com

Leave a Reply