Pandas Drop Duplicate Index

January 30, 2023 | by Arround The Web | No comments

Pandas has a method called “Index.drop_duplicates()” that allows us to drop the duplicate indexes from the list of index labels. The “Index.drop_duplicates()” function in Pandas returns an index with the discarded duplicate entries. The function gives the user with the freedom to select which duplicate value should be kept. We have two options: either remove the first and last duplicate entries from the list or remove every duplicate data from the list.

If you want to utilize this function, the following syntax needs to be followed:

Syntax:
pandas.Index.drop_duplicates(keep=’first’)

Parameter:
The “Keep” parameter is used to regulate how to handle the duplicate values. “Keep” is needed. By default, the value is “first”.

When the value is “first”, the program treats the first item as distinct and the other identical values as duplicates. This, with the exception of the first instance, eliminates the duplicates.
If the value is set to “last”, it treats the last entry as unique and the other identical values as duplicates. It then eliminates all duplicates except the last occurrence of that value.
If the “keep” parameter has the “False” value, all identical values are treated as duplicates. It drops all of the duplicate values from the list.

Example 1: Without Parameters
In this example, we have an index named “index1” that holds 10 integers. Let’s remove the duplicates without passing any parameter to the drop_duplicates() function.

import pandas

# Create pandas Index that hold 10 values
index1 = pandas.Index([45,67,45,89,45,89,12,34,67,89])

print("Actual Index: ",index1)
print("Unique Index: ",index1.drop_duplicates())

Output:

Actual Index: Int64Index([45, 67, 45, 89, 45, 89, 12, 34, 67, 89], dtype='int64')
Unique Index: Int64Index([45, 67, 89, 12, 34], dtype='int64')

Explanation:
Unique indices are returned by removing the duplicates.

Example 2: With Keep as False
Let’s have an index that holds 5 strings with duplicates. Now, set the “keep” parameter to False.

import pandas

# Create pandas Index that hold 5 strings
index1 = pandas.Index(['i1','i1','i4','i5','i4'])

print("Actual Index: ",index1)
print("Unique Index: ",index1.drop_duplicates(keep=False))

Output:

Actual Index: Index(['i1', 'i1', 'i4', 'i5', 'i4'], dtype='object')
Unique Index: Index(['i5'], dtype='object')

Explanation:
There is only one unique index – “i5”. It is returned by removing all the duplicates.

Example 3: With Keep as First
Let’s have the “index1” with 10 values and “index2” with 5 strings. Set “keep” to “first” to drop the duplicates without removing the first occurrence.

import pandas

# Create pandas Index that hold 10 values
index1 = pandas.Index([45,67,45,89,45,89,12,34,67,89])

print("Actual Index 1: ",index1)

# Drop duplicates without removing the first occurrence
print("Unique Index 1: ",index1.drop_duplicates(keep ='first'))

# Create pandas Index that hold 5 strings
index2 = pandas.Index(['i1','i1','i4','i5','i4'])

print("Actual Index 2: ",index2)
# Drop duplicates without removing the first occurrence
print("Unique Index 2: ",index2.drop_duplicates(keep='first'))

Output:

Actual Index 1: Int64Index([45, 67, 45, 89, 45, 89, 12, 34, 67, 89], dtype='int64')
Unique Index 1: Int64Index([45, 67, 89, 12, 34], dtype='int64')
Actual Index 2: Index(['i1', 'i1', 'i4', 'i5', 'i4'], dtype='object')
Unique Index 2: Index(['i1', 'i4', 'i5'], dtype='object')

Explanation:

In “index1”, [45, 67, 89, 12, 34] are the first occurrence of unique values.
In “index2”, [‘i1’, ‘i4’, ‘i5’] are the first occurrence of unique values.

Example 4: With Keep as Last
Let’s have the “index1” with 10 values and “index2” with 5 strings. Set “keep” to “first” to drop the duplicates without removing the first occurrence.

import pandas

# Create pandas Index that hold 10 values
index1 = pandas.Index([45,67,45,89,45,89,12,34,67,89])

print("Actual Index 1: ",index1)

# Drop duplicates without removing the last occurrence
print("Unique Index 1: ",index1.drop_duplicates(keep ='last'))

# Create pandas Index that hold 5 strings
index2 = pandas.Index(['i1','i1','i4','i5','i4'])

print("Actual Index 2: ",index2)
# Drop duplicates without removing the last occurrence
print("Unique Index 2: ",index2.drop_duplicates(keep='last'))

Output:

Actual Index 1: Int64Index([45, 67, 45, 89, 45, 89, 12, 34, 67, 89], dtype='int64')
Unique Index 1: Int64Index([45, 12, 34, 67, 89], dtype='int64')
Actual Index 2: Index(['i1', 'i1', 'i4', 'i5', 'i4'], dtype='object')
Unique Index 2: Index(['i1', 'i5', 'i4'], dtype='object')

Explanation:

In “index1”, [45, 12, 34, 67, 89] are the last occurrence of unique values.
In “index2”, [‘i1’, ‘i5’, ‘i4’] are the last occurrence of unique values.

Conclusion

This tutorial is based on the concept of dropping the duplicate indexes using the Pandas module. We utilized the Pandas “Index.drop_duplicates()” method. We provided the syntax for the utilization of this method and also described its parameters. This method gives us three choices for dealing with duplicate values. Every step in this article is explained very clearly and simply.

Source: linuxhint.com