| by Arround The Web | No comments

Pandas Filter by Index

Using “filter()”, we can filter a DataFrame based on indexes. With this technique, the DataFrame’s rows or columns will be a subset of the original DataFrame according to the specified labels in the given index.

There are different methods to filter the rows of the DataFrame based on their index. But in this tutorial, our main focus is the filter() function. Let’s check its syntax first so we can use it to filter the data. The method returns an object of the same type as the one that is used as input.

Syntax:

 

DataFrame_object.filter ( items= none, like= none, regex= none, axis= none )

 

Parameters:

    1. items: It requires a list of the axis labels that you want to filter.
    2. like: Keep the information axis where “arg in col == True”. The axis string label that we want to filter is taken.
    3. regex: Keep the info axis where Re.search(regex, col) == True.
    4. axis: The axis on which to filter the {‘index’ or 0, ‘columns’ or 1, None}. By default, this is the information axis. For series, it’s “index”. For DataFrame, it’s “columns”.

Since we have seen the syntax, we demonstrate the filter() function in the following examples:

Example 1: Filter by Numeric Index

Create the DateFrame with 2 columns which contains 5 records and return only the particular rows based on index.

import pandas

hobbies=pandas.DataFrame({'stud_name':['stud 1','stud 2','stud 3','stud 4','stud 5'],
                          'hobbies':['music','singing','dance','play','drink']})

print(hobbies)
print()

# Get only first row
print(hobbies.filter([0],axis=0))
print()

# Get only fifth row
print(hobbies.filter([4],axis=0))

 
Output:

  stud_name  hobbies
0    stud 1    music
1    stud 2  singing
2    stud 3    dance
3    stud 4     play
4    stud 5    drink

  stud_name hobbies
0    stud 1   music

  stud_name hobbies
4    stud 5   drink

 
Explanation:

    1. In the first output, we returned the first row using index-0.
    2. In the second output, we returned the fifth row using index-4.

Example 2: Filter by Multiple Numeric Indices

Create the DateFrame with 2 columns which contains 5 records and return only the particular rows based on the index at a time.

import pandas

hobbies=pandas.DataFrame({'stud_name':['stud 1','stud 2','stud 3','stud 4','stud 5'],
                          'hobbies':['music','singing','dance','play','drink']})

# Get first two rows
print(hobbies.filter(items=[0,1],axis=0))
print()

# Get only second,third and fifth rows
print(hobbies.filter(items=[1,2,4],axis=0))

 
Output:

  stud_name  hobbies
0    stud 1    music
1    stud 2  singing

  stud_name  hobbies
1    stud 2  singing
2    stud 3    dance
4    stud 5    drink

 
Explanation:

    1. In the first output, we returned the first and second rows at a time using index-0 and 1.
    2. In the second output, we returned the second, third, and fifth rows using index-1, 2, and 4.

Example 3: Filter by Non-Numeric Index

Create the DateFrame with 3 columns which contains 4 records and return only the particular rows separately based on index. Here, the index is of “.string” type.

import pandas

journey=pandas.DataFrame({'from':['city 1','city 1','city 3','city 4'],
                          'to':['ap','usa','city 2','city 1'],
                          'distance':[200,500,466,100]},
                          index=['passenger 1','passenger 2','passenger 3','passenger 4'])

print(journey)

print()

# Get the row where index-'passenger 3'.
print(journey.filter(items=['passenger 3'],axis=0))

print()

# Get the row where index-'passenger 1'.
print(journey.filter(items=['passenger 1'],axis=0))

 
Output:

               from      to  distance
passenger 1  city 1      ap       200
passenger 2  city 1     usa       500
passenger 3  city 3  city 2       466
passenger 4  city 4  city 1       100

               from      to  distance
passenger 3  city 3  city 2       466

               from  to  distance
passenger 1  city 1  ap       200

 
Explanation:

    1. In the first output, we returned the third row using index-“passenger 3”.
    2. In the second output, we returned the first row using index-“passenger 1”.

Example 4: Filter by Multiple Non-Numeric Indices

Return the last three rows at a time based on the index.

import pandas

journey=pandas.DataFrame({'from':['city 1','city 1','city 3','city 4'],
                          'to':['ap','usa','city 2','city 1'],
                          'distance':[200,500,466,100]},
                          index=['passenger 1','passenger 2','passenger 3','passenger 4'])

# Get the row where index- 'passenger 2','passenger 3','passenger 4'
print(journey.filter(items=['passenger 2','passenger 3','passenger 4'],axis=0))

 
Output:

               from      to  distance
passenger 2  city 1     usa       500
passenger 3  city 3  city 2       466
passenger 4  city 4  city 1       100

 

Example 5: Filter Using the Like Parameter

Let’s utilize the “like” parameter to return the rows based on the index like – “passenger” and “r 1”, separately.

import pandas

journey=pandas.DataFrame({'from':['city 1','city 1','city 3','city 4'],
                          'to':['ap','usa','city 2','city 1'],
                          'distance':[200,500,466,100]},
                          index=['passenger 1','passenger 2','passenger 3','passenger 4'])

# Get the row where the index is like 'passenger'.
print(journey.filter(like='passenger',axis=0))

print()

# Get the row where the index is like 'r 1'.
print(journey.filter(like='r 1',axis=0))

 
Output:

               from      to  distance
passenger 1  city 1      ap       200
passenger 2  city 1     usa       500
passenger 3  city 3  city 2       466
passenger 4  city 4  city 1       100

               from  to  distance
passenger 1  city 1  ap       200

 
Explanation:

    1. All indices contain “passenger”. So, all rows were returned in the first output.
    2. Only one index is like “r 1”. So, the row with index – “passenger 1” is returned in the second output.

Example 6:

Let’s consider the DataFrame with the indices – [‘sravan’,’ravan’,’pavan’,’Ravi’] and then return the rows with indexes like “n” and “M” separately.

import pandas

journey=pandas.DataFrame({'from':['city 1','city 1','city 3','city 4'],
                          'to':['ap','usa','city 2','city 1'],
                          'distance':[200,500,466,100]},
                          index=['sravan','ravan','pavan','Ravi'])

# Get the row where the index is like 'n'.
print(journey.filter(like='n',axis=0))

print()

# Get the row where the index is like 'M'.
print(journey.filter(like='M',axis=0))

 
Output:

          from      to  distance
sravan  city 1      ap       200
ravan   city 1     usa       500
pavan   city 3  city 2       466

Empty DataFrame
Columns: [from, to, distance]
Index: []

 
Explanation:

    1. There are three rows where the index include “n”.
    2. There is no row where the indexes include “M”. So, the empty DataFrame is returned.

Conclusion

We taught you how to retrieve the DataFrame rows based on their indexes in Pandas. We saw the syntax of the filter() function first to understand its parameters and the working of the filter function. We implemented the different examples to teach you how to filter a DataFrame using the indexes of numerical values and non-numeric values. We also implemented some examples to explain how you can filter a DataFrame for the indexes that contain a particular character or string by passing the like parameter to the filter() function.

Share Button

Source: linuxhint.com

Leave a Reply